• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Analysis / Run CUDA without Recompilation on x86, AMD GPUs, and Intel Xeon Phi with gpuOcelot

Run CUDA without Recompilation on x86, AMD GPUs, and Intel Xeon Phi with gpuOcelot

April 28, 2014 by Rob Farber Leave a Comment

Various pathways exist to run CUDA on a variety of different architectures. The freely available gpuOcelot project is unique in that it currently allows CUDA binaries to run on NVIDIA GPUs, AMD GPUs, x86 and Intel Xeon Phi at full speed without recompilation. It works by dynamically analyzing  and recompiling the PTX instructions of the CUDA kernels so they can run on the destination device. Sound too good to be true? Udacity has prepared a tutorial to run CUDA codes without a GPU under Linux (link). The tutorial also provides links to using gpuOcelot on Windows and Mac.

The gpuOcelot is one of several pathways to write massively parallel code so it can run on a variety of non-NVIDIA architectures.

In my classes, I teach students to “make their lives easier” and use the highest level API first. Only delve down into the lower level APIs when you need some function or to get better performance than can be achieved via the higher level API.

The various APIs to run GPU code on different architectures – in ranked order from the highest level to the lowest – are:

  • OpenACC: Both PGI and CAPS enterprise have demonstrated the ability to recompile the same OpenACC source code to run on x86, ARM, AMD GPUs, Intel Xeon Phi, and NVIDIA GPUs.
  • The CUDA Thrust API: Adding a flag to the compilation line allows Thrust to be built using GPUs, OpenMP, or Intel’s TBB (Thread Building Blocks). Experience has shown that the TBB performance can be surprisingly good.
  • CUDA-x86: PGI has the CUDA-x86 compiler that can be used instead of nvcc to compile CUDA source code for x86 processors. PGI notes that performance on at least once kernel is equivalent to an OpenMP program compiled with the Intel icc compiler.
  • The gpuOcelot project: The subject of this article.
  • LLVM: The Nvidia nvcc compiler now uses the open source LLVM compiler infrastructure,

Share this:

  • Twitter

Filed Under: Analysis, CUDA, Featured tutorial, News, News Tagged With: ARM, CUDA, Intel Xeon Phi, Tegra, x86

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Recovering Speech from a Potato-chip Bag Viewed Through Soundproof Glass - Even With Commodity Cameras!
  • DARPA Goals, Requirements, and History of the SyNAPSE Project
  • Call for Papers: Women in HPC at Supercomputing 2014 due July 31
  • HTML5 Progress - Confirmed Netflix Works With Chrome And Ubuntu 14.04LTS
  • TechEnablement Becomes an SC14 Media Partner

Archives

© 2025 · techenablement.com