I currently have 28 CUDA tutorials on the Dr. Dobb’s website spanning 7 years of CUDA development starting with CUDA 2.0. NVIDIA recently released CUDA 6.0. Obviously some of the Dr. Dobbs CUDA material is dated plus my tutorials had to introduce new (and needed) features as CUDA developed. Thus, following the progression of Part 1, Part 2, …, Part 28 is not necessarily the best tutorial sequence for a student.
My book, “CUDA Application Design and Development” in English and Chinese versions , does provide a coherent study plan taking the student from “Hello World” to exascale capable deep-learning and real time video processing.
To bring more timely information to the on-line community, I have created study guides on the techEnablement website. Check back often as this guide will be updated.
Also, note that techEnablement tutorials will utilize new web technology (see the Web Dev section) for nicely color coded source code, HTML5, videos, and more.
For the moment, here is a list of the existing CUDA tutorials:
- CUDA, Supercomputing for the Masses: Part 28: A Massively Parallel Stack that includes bulk Data Allocation
- CUDA, Supercomputing for the Masses: Part 27: A Robust Histogram for Massive Parallelism
- CUDA, Supercomputing for the Masses: Part 26: CUDA: Unifying Host/Device Interactions with a Single C++ Macro
- CUDA, Supercomputing for the Masses: Part 25: Atomic Operations and Low-Wait Algorithms in CUDA
- CUDA, Supercomputing for the Masses: Part 24: Intel’s 50+ core MIC architecture: HPC on a Card or Massive Co-Processor?
- CUDA, Supercomputing for the Masses: Part 23: Click-together tools that utilize CUDA/C/C++ as a scripting language!
- CUDA, Supercomputing for the Masses: Part 22: Running CUDA Code Natively on x86 Processors
- CUDA, Supercomputing for the Masses: Part 21: The Fermi architecture and CUDA
- CUDA, Supercomputing for the Masses: Part 20: Parallel Nsight Part 2: Using the Parallel Nsight Analysis capabilities
- CUDA, Supercomputing for the Masses: Part 19: Parallel Nsight Part 1: Configuring and Debugging Applications
- CUDA, Supercomputing for the Masses: Part 18: Using Vertex Buffer Objects with CUDA and OpenGL
- CUDA, Supercomputing for the Masses: Part 17: CUDA 3.0 provides expanded capabilities and makes development easier (2)
- CUDA, Supercomputing for the Masses: Part 16: CUDA 3.0 provides expanded capabilities (1)
- CUDA, Supercomputing for the Masses: Part 15: Using Pixel Buffer Objects with CUDA and OpenGL
- CUDA, Supercomputing for the Masses: Part 14: Debugging CUDA and using CUDA-GDB
- CUDA, Supercomputing for the Masses: Part 13: Using texture memory in CUDA
- CUDA, Supercomputing for the Masses: Part 12: CUDA 2.2 Changes the Data Movement Paradigm
- CUDA, Supercomputing for the Masses: Part 11: Revisiting CUDA memory spaces
- CUDA, Supercomputing for the Masses: Part 10: CUDPP, a powerful data-parallel CUDA library
- CUDA, Supercomputing for the Masses: Part 9: Extending High-level Languages with CUDA
- CUDA, Supercomputing for the Masses: Part 8: Using libraries with CUDA
- CUDA, Supercomputing for the Masses: Part 7: Double the fun with next-generation CUDA hardware
- CUDA, Supercomputing for the Masses: Part 6: Global memory and the CUDA profiler
- CUDA, Supercomputing for the Masses: Part 5: Understanding and using shared memory (2)
- CUDA, Supercomputing for the Masses: Part 4: Understanding and using shared memory (1)
- CUDA, Supercomputing for the Masses: Part 3: Error handling and global memory performance limitations
- CUDA, Supercomputing for the Masses: Part 2: A first kernel
- CUDA, Supercomputing for the Masses: Part 1: CUDA lets you work with familiar programming concepts while developing software that can run on a GPU