• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / OpenCL Study Guide

OpenCL Study Guide

Rob Farber

I currently have nine OpenCL tutorials on The Code Project. OpenCL is quickly evolving and the new 1.2 specification is running on GPU, multicore, and Intel Xeon Phi devices.

To bring more timely information to the on-line community, I have created  study guides on the techEnablement website. Check back often as this guide will be updated.

Also, note that techEnablement tutorials will utilize new web technology (see the Web Dev section) for more color coded source code, HTML5, videos, and more.

Following are my online print OpenCL tutorials:

  • Part 9: OpenCL Extensions and Device Fission: Learn about OpenCL extensions  that provide programmers with additional capabilities such as double-precision arithmetic and Device Fission. (Device Fission provides an interface to subdivide a single OpenCL device into multiple devices – each with a separate asynchronous command queues.)
  • Part 8: Heterogeneous workflows using OpenCL:  Incorporate OpenCL into heterogeneous workflows via a general-purpose “click together tools” framework that can stream arbitrary messages (vectors, arrays, and arbitrary, complex nested structures) within a single workstation, across a network of machines, or within a cloud computing framework. The ability to create scalable workflows is important because data handling and transformation can be as complex and time consuming as the computational problem used to generate a desired result.
  • Part 7 OpenCL plugins: Demonstrates how to create C/C++ plugins that can be dynamically loaded at runtime to add massively parallel OpenCL capabilities to an already running application.
  • Part 6 Primitive restart and OpenGL interoperability: OpenGL and OpenCL interoperability can greatly accelerate both data generation as well as data visualization. Basically, the OpenCL application maps the OpenGL buffers so they can be modified by massively-parallel kernels running on the GPU. This keeps the data on the GPU and avoids costly PCIe bus transfers.
  • Part 5 OpenCL buffers and memory affinity: The example source code from part 4 was adapted to queue a user specified number of tasks split amongst multiple CPU and GPU command queues.  The source code in this article continues to use a simple yet useful preprocessor capability to pass C++ template types to an OpenCL kernel.
  • Part 4 Coordinating Computations with OpenCL Queues: Discusses the OpenCL™ runtime and demonstrate how to perform concurrent computations among the work queues of heterogeneous devices.
  • Part 3 Work-Groups and Synchronization: Introduces the OpenCL™ execution model and discuss how to coordinate computations among the work items in a work group.
  • Part 2 OpenCL Memory Spaces: Implicit in the OpenCL memory model is the idea that the kernel resides in a separate memory space. Each work item can use private memory, local memory, constant memory,and global memory.
  • Part 1 OpenCL Portable Parallelism: The big idea behind OpenCL is a portable execution model that allows a kernel to execute at each point in a problem domain.

Here are two examples  showing the performance difference between OpenCL rendering a surface using and AMD GPU and the CPU using Primitive Restart. (You can play both simultaneously to really compare the speed difference.)

OpenCL rendering on a CPU using Primitive Restart:. Note 100% utilization of all six CPU cores.

OpenCL rendering on a GPU. Note the dramatic increase in speed because there is no PCI bus limitation!

 

Share this:

  • Twitter

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Face It: AI Gets Personal to Make You Look Better!
  • CUDA Study Guide
  • Apache Spark Claims 10x to 100x Faster than Hadoop MapReduce
  • PyFR: A GPU-Accelerated Next-Generation Computational Fluid Dynamics Python Framework
  • Paper Compares AMD, NVIDIA, Intel Xeon Phi CFD Turbulent Flow Mesh Performance Using OpenMP and OpenCL

Archives

© 2023 · techenablement.com