• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Featured article / Intel Posts OpenCL 2.0 QuickSort Tutorial (Compare to TE CUDA Version)

Intel Posts OpenCL 2.0 QuickSort Tutorial (Compare to TE CUDA Version)

February 5, 2015 by Rob Farber Leave a Comment

Intel Engineer Robert Ioffe has posted an OpenCL QuickSort tutorial that utilizes nested parallelism and Workgroup-scan functions. In particular, the tutorial shows how to use the OpenCL™ 2.0 enqueue_kernel functions that queue kernels from the device without host intervention (Much like dynamic parallelism) plus work_group_scan_exclusive_add  and  work_group_scan_inclusive_add, two of a new set of work-group functions that were added to OpenCL 2.0 to facilitate scan and reduce operations across work-items of a work-group.

Robert Ioffe (Intel)'s picture

Robert Iofee (Courtesy Intel Corp)

 

Full source code and discussion can be found on The Code Project.

A strong-scaling across GPUs version of bitonic sort in CUDA can be found in the TechEnablement article, “Part 2: No Idle Time CUDA Task Parallelism Across Eight GPUs”

Note the faster performance (4.5 ms vs 246.9 ms) of bitonic sort on small problems achieved by eliminating recursive calls

 

 

Share this:

  • Twitter

Filed Under: Featured article, Featured tutorial, News, OpenCL, Tutorials Tagged With: OpenCL

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Guide to Get Ubuntu 14.10 Running Natively on Nvidia Shield Tablet
  • Learn how to program IBM's 'Deep-Learning' SyNAPSE chip
  • Face It: AI Gets Personal to Make You Look Better!
  • NASA Charts Path For CFD To 2030 - Projects Future Computer Technology!
  • Seven10 Storage Software Intelligently Manages Seamless Data Migration to the Cloud

Archives

© 2023 · techenablement.com