• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / CUDA / ORNL Introductory Tutorials On Concurrent Kernels

ORNL Introductory Tutorials On Concurrent Kernels

January 1, 2015 by Rob Farber Leave a Comment

The OLCF at Oakridge National Laboratory (ORNL) is working to educate  users about how to best use their computing resources. As part of that process, the OLCF has published two very introductory tutorials to teach how to utilize concurrent kernels on their systems. Part 1 (concurrent kernels) and Part 2 (batched library calls) teach how to launch concurrent kernels using CUDA and OpenACC with C and Fortran.

OLCF

TechEnablement readers will also find our tutorials that discuss key aspects of task-based parallelism including:

  1. Why task-parallelism can be a more efficient approach that pure loop parallelism for a multitude of reasons including memory consumption.
  2. How to load-balance parallel tasks on a multitude of devices with a single OpenMP schedule dynamic look.
  3. Demonstrations that task-parallelism can achieve strong scaling both within a GPU and across a number of devices (a 7.4x speedup was achieved in a single computational node containing eight GPUs).

Please see our articles:

  • Part 1: Load-Balanced, Strong-Scaling Task-Based Parallelism on GPUs
  • Part 2: No Idle Time CUDA Task Parallelism Across Eight GPUs

Oakridge has other introductory articles at https://www.olcf.ornl.gov/support/tutorials/.

Share this:

  • Twitter
  • Email
  • Google

Filed Under: CUDA, Featured article, Featured tutorial, News, openacc, Tutorials, Tutorials Tagged With: CUDA, HPC, openacc

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Newsletter

Recent Posts

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

NVIDIA GTC 2018 Shrewdly Incremental to Position NVDA Stock for Massive Growth

NVIDIA GTC 2018 Shrewdly Incremental to Position NVDA Stock for Massive Growth

April 3, 2018 By Rob Farber Leave a Comment

Face It: AI Gets Personal to Make You Look Better!

Face It: AI Gets Personal to Make You Look Better!

March 12, 2018 By admin Leave a Comment

SURFsara Achieves Accuracy and Performance Breakthroughs for Both Deep Learning and Wide Network Training

SURFsara Achieves Accuracy and Performance Breakthroughs for Both Deep Learning and Wide Network Training

November 10, 2017 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Face It: AI Gets Personal to Make You Look Better!
  • Guide to Get Ubuntu 14.10 Running Natively on Nvidia Shield Tablet
  • PyFR: A GPU-Accelerated Next-Generation Computational Fluid Dynamics Python Framework
  • ACM Paper Observes FPGA, GPU, CPU Energy Efficiency Hierarchy
  • Yobi3D Finds 3D Meshes For Animations And 3D Printing
  • Google+
  • Linkedin
  • Twitter

Archives

© 2019 · techenablement.com

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.