• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / CUDA / IPMACC – An Open Source OpenACC to CUDA/OpenCL Translator

IPMACC – An Open Source OpenACC to CUDA/OpenCL Translator

December 23, 2014 by Rob Farber Leave a Comment

IPMACC is  a research-grade open-source framework for translating OpenACC source code to CUDA or OpenCL. Binary executables can then be created with OpenCL or CUDA compilers. The authors (Ahmad Lashgar – University of Victoria, Alireza Majidi – Texas A&M University, Amirali Baniasadi – University of Victoria)  verified correctness and performance using benchmarks from the Rodinia Benchmark Suit and CUDA SDK. IMPACC is of interest due to the recent demise of CAPS-enterprise who provided a commercial OpenACC to OpenCL source translator. IPMACC can be found in it’s github repository. Also note that gcc will start supporting OpenACC and OpenMP 4.0 pragmas in 2015.

IMPACC  translation pipeline and  performance results (images courtesy Arxiv.org)

IMPACC translation pipeline and performance results (images courtesy Arxiv.org)

Limitations

  • Currently, parallel directive is not supported. Notice that with a little effort by the programmer, any parallel region construct can be translated into a kernels region.
  • Only 1D array can be transfered in-out the region.
  • User-defined data types are not supported for data copy clauses.
  • C/C++ main function should be prototyped as normal function with the output. e.g. int main(). Avoid declaring main as main() with no return type.
  • Clause support: seq clause for the top-level 1-nested loop is not supported. This is weird case where there is only one loop in the region which is targeted for serial execution.
  • There are some issues between NVCC and C’s restrict keyword in CUDA 4.0.
  • In case the compiler crashed from pycparser.plyparser.ParseError class, check the last line and look for meaningful prompt.
  • Limitations on the Reduction/Private clause of loop
  • IPMACC assumes the reduction/private variable is not declared inside the loop.
  • If the variable is defined as both private() and reduction(), IPMACC assumes reduction which covers private too.
  • Reduction/Private on array/subarray is not supported
  • Default reduction type is two-level tree reduction [1]. Alternatively for CUDA, atomic reduction is implemented and it is supported only on recent hardwares (compute capability >= 1.3). Proper flag should be passed to underlying NVCC; add -arch=sm_13 compile flag.
  • To gurantee the safety, it is necessary to use acc_init() early in the code to avoid potentially runtime errors. This is essential for the OpenCL target devices.
  • IPMACC can parallel the iterations of loops with the following increment steps: +, -, ++, –, *, /

Share this:

  • Twitter

Filed Under: CUDA, News, News, News, openacc, OpenCL Tagged With: CUDA, openacc, OpenCL

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • MultiOS Gaming, Media, and OpenCL Using XenGT Virtual Machines On Shared Intel GPUs
  • Intel Xeon Phi Study Guide
  • Free Intermediate-Level Deep-Learning Course by Google
  • High Performance Ray Tracing With Embree On Intel Xeon Phi

Archives

© 2025 · techenablement.com