OpenACC Highlights at GTC 2016

GTC 2016, the upcoming GPU Technology Conference (GTC), which is being held in San Jose, CA April 4-7, 2016 contains several OpenACC talks of note. If you are attending, check out the following OpenACC related activities. If not, NVIDIA will put the videos online within a short period after the conference.

S6524 – Enabling the Electronic Structure Program Gaussian on GPGPUs Using OpenACC

Roberto Gomperts Principal Engineer, NVIDIA

In 2011, Gaussian, Inc., PGI, and NVIDIA embarked on a long-term project to enable Gaussian on GPGPUs using a directives-based approach. OpenACC has emerged as the de-facto standard to port complex programs to GPU accelerators. We’ll discuss how we attacked some of the challenges involved in working with a large-scale, feature-rich application like Gaussian. This includes a number of PGI extensions to the OpenACC 2.0 standard that we believe will have a positive impact on other programs. To conclude, we’ll present a sample of GPU-based performance improvements on a variety of theories and methods.

Here is a set of slides from last year: http://on-demand.gputechconf.com/gtc/2014/presentations/S4613-enabling-gaussian-09-on-gpgpus.pdf.

S6709 – Write Once, Parallel Everywhere: OpenACC for GPUs, x86, OpenPOWER, and Beyond

Michael Wolfe Compiler Engineer, NVIDIA Highly-Rated Speaker

Performance portability means the ability to write a single program that runs with high performance across a wide range of target systems, including multicore systems, GPU-accelerated systems, and manycore systems, independent of the instruction set. It’s not a “myth” or a “dream,” as has been claimed recently. It should be demanded by developers and expected from any modern high level parallel programming language. OpenACC was designed five years ago with broad cross-platform performance portability in mind. The current PGI compiler suite delivers on this promise. Come hear about the current capabilities and performance of PGI OpenACC on GPUs, x86 and OpenPOWER, and learn about our plans for new features and even wider platform support.

S6410 – Comparing OpenACC 2.5 and OpenMP 4.5

Jeff Larkin DevTech Engineer, NVIDIA

James Beyer Senior Runtime Engineer, NVIDIA

We’ll compare the current state of two competing accelerator directive sets: OpenACC 2.5 and OpenMP 4.5. As members of both the OpenACC technical committee and the OpenMP language committee, we’ll provide an inside take on the current state of the directives and insight into how to transition between the directive sets.

Jeff Larkin, NVIDIA

S6134 – High Performance and Productivity with Unified Memory and OpenACC: A LBM Case Study

Jiri Kraus Compute Devtech Software Engineer, NVIDIA

Learn how to use unified memory to improve your productivity in accelerating applications with OpenACC. Using a Lattice Boltzmann CFD solver as an example, we’ll explain how a profile-driven approach allows one to incrementally accelerate an application with OpenACC and unified memory. Besides the productivity gain, a primary advantages of this approach is that it is very accessible also for developers new to a project and therefore not familiar with the whole code base.

S6748 – Writing Performance Portable Code, and the Challenges for Upcoming Systems

Fernanda Foretter HPC User Support Specialist, Oak Ridge National Labs

This session will be around writing performance portable code. Best practices and recommendations from Oak Ridge Labs DOE staff. It will cover the CAAR program, what the labs are doing to help code migrate to machines like the upcoming Coral systems, and the advantages that the modern GPU architectures bring in terms of code simplification. The resources available to the domain scientist to ensure a smooth transition to this exciting architecture will be summarized, with suggested follow on activities. We are here to help!

Fernanda Foretter, ORNL