IPMACC is a research-grade open-source framework for translating OpenACC source code to CUDA or OpenCL. Binary executables can then be created with OpenCL or CUDA compilers. The authors (Ahmad Lashgar – University of Victoria, Alireza Majidi – Texas A&M University, Amirali Baniasadi – University of Victoria) verified correctness and performance using benchmarks from the Rodinia Benchmark Suit and CUDA SDK. IMPACC is of interest due to the recent demise of CAPS-enterprise who provided a commercial OpenACC to OpenCL source translator. IPMACC can be found in it’s github repository. Also note that gcc will start supporting OpenACC and OpenMP 4.0 pragmas in 2015.
- Currently, parallel directive is not supported. Notice that with a little effort by the programmer, any parallel region construct can be translated into a kernels region.
- Only 1D array can be transfered in-out the region.
- User-defined data types are not supported for data copy clauses.
mainfunction should be prototyped as normal function with the output. e.g.
int main(). Avoid declaring
main()with no return type.
- Clause support:
seqclause for the top-level 1-nested loop is not supported. This is weird case where there is only one loop in the region which is targeted for serial execution.
- There are some issues between NVCC and C’s
restrictkeyword in CUDA 4.0.
- In case the compiler crashed from pycparser.plyparser.ParseError class, check the last line and look for meaningful prompt.
- Limitations on the Reduction/Private clause of loop
- IPMACC assumes the reduction/private variable is not declared inside the loop.
- If the variable is defined as both private() and reduction(), IPMACC assumes reduction which covers private too.
- Reduction/Private on array/subarray is not supported
- Default reduction type is two-level tree reduction . Alternatively for CUDA, atomic reduction is implemented and it is supported only on recent hardwares (compute capability >= 1.3). Proper flag should be passed to underlying NVCC; add -arch=sm_13 compile flag.
- To gurantee the safety, it is necessary to use acc_init() early in the code to avoid potentially runtime errors. This is essential for the OpenCL target devices.
- IPMACC can parallel the iterations of loops with the following increment steps: +, -, ++, –, *, /