PGI released their 14.4 and upcoming 14.7 OpenACC 2.0 roadmap. The expectation is that we will see the 14.4 release in early May and the 14.7 release in early July. Note: these are not official PGI dates.
Analysis:
- The 14.4 support of atomic operations will enable many low-wait algorithms such as counters and massively parallel stacks.
- Improved reduction performance in 14.7 should speed many scientific codes including machine learning algorithms and allow OpenACC to deliver similar performance to CUDA such as 13 PF/s on the ORNL Titan supercomputer.
New in PGI 14.4
- PGI Accelerator Features and Enhancements
- Expanded OpenACC C++ Support
- C++ this pointer support
- C++ member functions
- C++ support for the Routine directive
- C++ class member arrays in data clauses
- Expanded OpenACC 2.0 Features
- Loop directive collapse clause on deeply nested loops
- Parallel directive firstprivate clause
- C structs/Fortran derived type member arrays in data clauses
- Partial support for Fortran and C/C++ atomic directives
- Calling C/C++ CUDA-style atomics from OpenACC
- Fortran common block names in OpenACC data clauses
- GPU-side debugging in OpenACC with Allinea DDT
- CUDA Fortran support for CUDA 5.5 batched cuBLAS routines
- Integrated CUDA 6 Toolkit
- PGI Multi-core Features and Enhancements
- 2% improvement in SPEC OMP 2012 performance compared to 14.1 on Intel Sandy Bridge processors
- Support for new AVX2 instructions available on the latest Haswell CPUs from Intel; updated Windows assembler
- New EDG C++ front-end with C++11 support
- Other Features and Enhancements
- Comprehensive support for environment modules
- New tutorials and expanded set of examples
- Prebuilt versions of NetCDF and HDF5
Planned for 14.7
- OpenACC 2.0 Declare directive link clause support
- OpenACC 2.0 Loop directive tile and auto clause support
- Support for CUDA managed data in both OpenACC and CUDA Fortran
- Support for reductions inside Routines
- Support for reductions of COMPLEX data types
- Cache clause tuning
Leave a Reply