Tutorials Archives - Page 2 of 2

MultiOS Gaming CUDA & OpenCL Via a Virtual Machine

May 19, 2014 by Rob Farber Leave a Comment

Update 12/1/14: Intel now offers through the Xen project full GPU virtualization for Intel 4th generation devices. Operating system virtualization is a convenient way to run multiple operating systems at the same time, on the same hardware, without requiring rebooting. There are several technologies that allow sharing of the GPU by both the host (native) and guest … [Read more...]

Inside NVIDIA’s Unified Memory: Multi-GPU Limitations and the Need for a cudaMadvise API Call

April 21, 2014 by Rob Farber Leave a Comment

The CUDA 6.0 Unified Memory offers a “single-pointer-to-data” model that is similar to CUDA’s zero-copy mapped memory. Both make it trivially easy for the programmer to access memory on the CPU or GPU, but applications that use mapped memory have to perform a PCI bus transfer occur every time a memory access steps outside of a cache line while a kernel running in a Unified … [Read more...]

Part 3 of CUDA Supercomputing for the masses

April 14, 2014 by Rob Farber Leave a Comment

Error handling and global memory performance limitations. This article is reprinted from Dr. Dobbs (http://www.ddj.com/hpc-high-performance-computing/207603131). It is still valid as an introductory article. Congratulations! Thanks to Part 1 and Part 2 of this series on CUDA (short for "Compute Unified Device Architecture"), you are now a CUDA-enabled programmer with the … [Read more...]

Part 2 of CUDA Supercomputing for the Masses

April 14, 2014 by Rob Farber Leave a Comment

A first CUDA kernel. Reprinted from Dr. Dobbs April 29, 2008 (link) Comment: This article is still valid as it shows how to write a simple code to move data to/from the GPU and operate on it with a CUDA kernel. In Part 1 of this article series, I presented a simple first CUDA (short for "Compute Unified Device Architecture") program called moveArrays.cu to familiarize … [Read more...]

« Previous Page