• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / CUDA Study Guide

CUDA Study Guide

Rob FarberI currently have 28 CUDA tutorials on the Dr. Dobb’s website spanning 7 years of CUDA development starting with CUDA 2.0. NVIDIA recently released CUDA 6.0. Obviously some of the Dr. Dobbs CUDA material is dated plus my tutorials had to introduce new (and needed) features as CUDA developed. Thus, following the progression of Part 1, Part 2, …, Part 28 is not necessarily the best tutorial sequence for a student.

My book,  “CUDA Application Design and Development” in English  and Chinese versions , does provide a coherent study plan taking the student from “Hello World” to exascale capable deep-learning and real time video processing.

English CUDA Application Design and Development

Chinese CUDA Application Design and Development

To bring more timely information to the on-line community, I have created  study guides on the techEnablement website. Check back often as this guide will be updated.

Also, note that techEnablement tutorials will utilize new web technology (see the Web Dev section) for nicely color coded source code, HTML5, videos, and more.

For the moment, here is a list of the existing CUDA tutorials:

  • CUDA, Supercomputing for the Masses: Part 28: A Massively Parallel Stack that includes bulk Data Allocation
  • CUDA, Supercomputing for the Masses: Part 27: A Robust Histogram for Massive Parallelism
  • CUDA, Supercomputing for the Masses: Part 26: CUDA: Unifying Host/Device Interactions with a Single C++ Macro
  • CUDA, Supercomputing for the Masses: Part 25: Atomic Operations and Low-Wait Algorithms in CUDA
  • CUDA, Supercomputing for the Masses: Part 24: Intel’s 50+ core MIC architecture: HPC on a Card or Massive Co-Processor?
  • CUDA, Supercomputing for the Masses: Part 23: Click-together tools that utilize CUDA/C/C++ as a scripting language!
  • CUDA, Supercomputing for the Masses: Part 22: Running CUDA Code Natively on x86 Processors
  • CUDA, Supercomputing for the Masses: Part 21: The Fermi architecture and CUDA
  • CUDA, Supercomputing for the Masses: Part 20: Parallel Nsight Part 2: Using the Parallel Nsight Analysis capabilities
  • CUDA, Supercomputing for the Masses: Part 19: Parallel Nsight Part 1: Configuring and Debugging Applications
  • CUDA, Supercomputing for the Masses: Part 18: Using Vertex Buffer Objects with CUDA and OpenGL
  • CUDA, Supercomputing for the Masses: Part 17: CUDA 3.0 provides expanded capabilities and makes development easier (2)
  • CUDA, Supercomputing for the Masses: Part 16: CUDA 3.0 provides expanded capabilities (1)
  • CUDA, Supercomputing for the Masses: Part 15: Using Pixel Buffer Objects with CUDA and OpenGL
  • CUDA, Supercomputing for the Masses: Part 14: Debugging CUDA and using CUDA-GDB
  • CUDA, Supercomputing for the Masses: Part 13: Using texture memory in CUDA
  • CUDA, Supercomputing for the Masses: Part 12: CUDA 2.2 Changes the Data Movement Paradigm
  • CUDA, Supercomputing for the Masses: Part 11: Revisiting CUDA memory spaces
  • CUDA, Supercomputing for the Masses: Part 10: CUDPP, a powerful data-parallel CUDA library  
  • CUDA, Supercomputing for the Masses: Part 9: Extending High-level Languages with CUDA
  • CUDA, Supercomputing for the Masses: Part 8: Using libraries with CUDA
  • CUDA, Supercomputing for the Masses: Part 7: Double the fun with next-generation CUDA hardware
  • CUDA, Supercomputing for the Masses: Part 6: Global memory and the CUDA profiler
  • CUDA, Supercomputing for the Masses: Part 5: Understanding and using shared memory (2)
  • CUDA, Supercomputing for the Masses: Part 4: Understanding and using shared memory (1)
  • CUDA, Supercomputing for the Masses: Part 3: Error handling and global memory performance limitations
  • CUDA, Supercomputing for the Masses: Part 2: A first kernel  
  • CUDA, Supercomputing for the Masses: Part 1: CUDA lets you work with familiar programming concepts while developing software that can run on a GPU 

Share this:

  • Twitter

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Face It: AI Gets Personal to Make You Look Better!
  • CUDA Study Guide
  • Apache Spark Claims 10x to 100x Faster than Hadoop MapReduce
  • PyFR: A GPU-Accelerated Next-Generation Computational Fluid Dynamics Python Framework
  • Paper Compares AMD, NVIDIA, Intel Xeon Phi CFD Turbulent Flow Mesh Performance Using OpenMP and OpenCL

Archives

© 2023 · techenablement.com