Excellent x86 Performance! - PGI Accelerator Compilers Add OpenACC Support for x86 Multicore CPUs

TechEnablement – OpenACC x86 multi-core performance matches that of the Intel compiler using OpenMP when using the latest 15.10 version of the Portland Group compiler.

Click for more information (Image courtesy PGI and NVIDIA)

Note: two MPI instances per GPU were required to fully utilize the NVIDIA K80 GPU as the problem sizes were too small to fully utilize the GPU with a single instance.

SANTA CLARA, Calif.—Oct. 29, 2015—NVIDIA today announced availability of version 15.10 of the PGI Accelerator™ Fortran, C and C++ compilers, adding support for the OpenACC® directives-based parallel programming standard on x86 architecture multicore microprocessors.

The new PGI compilers deliver performance portability, allowing OpenACC-enabled source code to be compiled for parallel execution on a multicore CPU or a GPU accelerator. This capability provides tremendous flexibility for programmers, enabling them to develop applications that can take advantage of multiple system architectures with a single version of their source code.

“Our goal is to enable HPC developers to easily port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base,” said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. “This capability will be particularly important in the race towards exascale computing in which there will be a variety of system architectures requiring a more flexible application programming approach.”

This new PGI feature compiles OpenACC compute regions for parallel execution across all of the cores in an x86 processor or multi-socket server. The cores are treated in aggregate as a shared-memory accelerator, eliminating all data movement overhead in the resulting OpenACC programs. By default the compiler generates code that uses all the available cores in the system, and several methods exist for programmers to control and fine-tune this behavior.

“We were extremely impressed that we can run OpenACC on a CPU with no code change and get equivalent performance to our OpenMP/MPI implementation, and get 4x faster performance when running on a GPU,” said Wayne Gaudin of the U.K.’s Atomic Weapons Establishment. “From the perspective of performance portability and code future proofing, this is an excellent result.”

Key benefits of running OpenACC on multicore CPUs include:

Effective utilization of all cores of a multicore CPU or multi-socket server for parallel execution
Common programming model across CPUs and GPUs in Fortran, C and C++
·Rapid exploitation of existing multicore parallelism in a program using the KERNELS directive, which enables incremental optimization for parallel execution
Scalable performance across multicore CPUs and GPUs

“Porting HPC applications from one platform to another is one of the most significant costs in the adoption of breakthrough hardware technologies,” said Buddy Bland, project director at Oak Ridge National Laboratory. “OpenACC for multicore x86 CPUs provides continuity and code portability from existing CPU-only and GPU-enabled applications from machines like Titan to all of DOE’s upcoming major systems as well as portability among those systems.”

Growing Momentum for OpenACC

There are more than 10,000 developers using OpenACC today, and several recent developments underscore the continually growing adoption of OpenACC in high performance computing. At recent hackathons conducted worldwide, experts across a variety of scientific domains have been accelerating their scientific applications with accelerators and OpenACC. These include applications in such diverse fields as MRI image reconstruction (PowerGrid), computational fluid dynamics (INCOMP3D, HiPSTAR and Numeca), cosmology and astrophysics (RAMSES, CASTRO and MAESTRO), quantum chemistry (LSDALTON), computational physics (NekCEM) and more.

In addition, Gaussian, Inc. has announced that it is using OpenACC to port the GAUSSIAN computational chemistry application to accelerators. At the recent iCAS2 conference on climate and weather in Annecy, France, Meteosuisse, the Swiss Federal Office of Meteorology and Climatology, announced the deployment of a GPU-accelerated version of COSMO, the world’s first production weather forecasting application running on GPU accelerators.

In a recent poll of 150 OpenACC developers, 94 percent of the respondents reported getting a speedup when running on an accelerator, and over 90 percent of the users would recommend OpenACC.

More information on the PGI Accelerator compilers with OpenACC support is available at http://www.pgroup.com/accelerate. More information on the OpenACC API and standard is at www.openacc.org.

In addition to support for OpenACC on multicore CPUs, PGI version 15.10 includes a pre-production release of the PGI Fortran, C and C++ compilers for OpenPOWER CPUs with support for OpenACC on NVIDIA GPUs.

Availability and Free Trial

PGI 15.10 with support for OpenACC on multicore CPUs is expected to be available this month directly from PGI and authorized resellers. New users can register for a free 90-day trial as part of the NVIDIA OpenACC Toolkit. University students and faculty can apply for a free PGI license.

About PGI Software

An NVIDIA Corporation brand, PGI software includes high-performance parallel Fortran, C and C++ compilers and tools for workstations, servers and clusters based on x64 processors from Intel and AMD, and HPC accelerators from NVIDIA and AMD. More information is available at www.pgroup.com, sales@pgroup.com or by calling (503) 682-2806.

Share this:

Leave a Reply Cancel reply