Pragma-based programming can be described as a "negotiation" with the compiler where the compiler has to assume corner-cases that are not apparent to the programmer. So why does the loop count in the OpenMP and OpenACC article, "A First Transparent OpenACC C++ Class" have to be assigned to a separate variable to generate a parallel … [Read more...]
A First Transparent OpenACC C++ Class
This article provides a simple yet complete working example that demonstrates how OpenACC 2.0 pragmas can be used in the constructor and destructor of a C++ class to allocate and free memory on both the host and device and to transparently move data in the C++ class to support C++ class methods that run on both the host and device. Key to the transparent use of C++ classes in … [Read more...]
PGI 14.4 is now released with lots of OpenACC C++ Goodness!
PGI 14.4 is now released with lots of OpenACC C++ goodness. Give it a try! Here is the link for or those with existing licenses. If need be, get a 15 day trial license and use some of my OpenACC tutorials. PGI Trial keys Trial license keys are used for evaluating PGI software. They are valid for fifteen days. If you haven't already done so, you … [Read more...]
PGI 14.4 Release Contains Much OpenACC C++ Goodness
PGI released their 14.4 and upcoming 14.7 OpenACC 2.0 roadmap. The expectation is that we will see the 14.4 release in early May and the 14.7 release in early July. Note: these are not official PGI dates. Analysis: The 14.4 support of atomic operations will enable many low-wait algorithms such as counters and massively parallel stacks. Improved reduction performance in … [Read more...]
TechEnablement Adds Study Guides for CUDA, OpenACC, OpenCL, and Intel Xeon Phi
Today techEnablement.com has provided study guides to help students "learn to change the world" with supercomputing for the masses . The study guides cover: CUDA OpenACC OpenCL Intel Xeon Phi … [Read more...]
Intel Xeon Phi for CUDA Programmers
Both GPU and Xeon Phi coprocessors provide high degrees of parallelism that can deliver excellent application performance. For the most part, CUDA programmers with existing application code have already written their software so it can run well on Phi coprocessors. The key to performance lies in understanding the differences between these two architectures. Author's note: To … [Read more...]
Pragmatic Parallelism Part 1: Introducing OpenACC 1.0
OpenACC lets you program in parallel C/C++ and Fortran in a manner that is concise and where the same source code can be recompiled to run on AMD GPUs, NVIDIA GPUs, Intel Xeon Phi, x86, and ARM. View at Dr. Dobbs (http://www.drdobbs.com/parallel/easy-gpu-parallelism-with-openacc/240001776) This is the first in a series of articles by Rob Farber on OpenACC directives, … [Read more...]
Farber teaches massively parallel computing to grade 6 – 12 students in Saudi Arabia
My book, “CUDA Application Design and Development” [English][Chinese] and Doctor Dobbs tutorials coupled with the rapid adoption of GPU computing have given me the opportunity to speak and teach around the world. This January, I had the pleasure of traveling to Jeddah, Saudi Arabia to speak and teach a short course on OpenACC and CUDA at KAUST (the King Abdullah University of … [Read more...]






