Reports are now coming in about performance boosts that are the result of the CUDA 6.5 production release. The Blender project reports faster rendering time with CUDA-6.5. As can be seen in the graphs below that report performance on the farbopt deep-learning teaching code, CUDA-6.5 with the NVIDIA 340.29 driver have increased performance on linear problems (PCA analysis from 680 GF/s to 991 GF/s) and even more so for non-linear problems (NLPCA analysis from 682 GF/s to 1086 GF/s). Note that prior to CUDA-6.5 the K40c was delivering approximately the same performance and the K20c. This translates to 100s of GF/s in increased performance.
TechEnablement has been working with NVIDIA on a compiler performance regression issue that dropped performance on our machine-learning code since the CUDA 5.0 release. CUDA 6.5 incorporates that compiler fix. The extra performance that resulted from upgrading to the NVIDIA 340.29 driver was a surprise. Apparently this driver also helps the performance of reductions as well.
Please check out our GTC 2013 videos to understand why the Farber machine-learning mapping and the CUDA strong-scaling execution model delivers higher performance on the more complex non-linear problem:
- GTC 2013 “Simplifying Portable Killer Apps with OpenACC and CUDA-5 Concisely and Efficiently” (video, pdf)
- GTC 2013 “Clicking GPUs into a Portable, Persistent and Scalable Massive Data Framework” (video,pdf)
- Deep-learning Teaching Code Achieves 13 PF/s on the ORNL Titan Supercomputer
Key points:
- The new 340.29 driver are recommended upgrades!
- Test show that the performance increase is mainly from the 340.29 driver.
- It is expected the new driver will also increase OpenCL performance as well.
- Don’t forget that NVIDIA now supports Ubuntu 14.04 LTS. Say goodbye to the crufty 12.04 LTS!
The following graphs clearly show the performance boost. (Note prior to CUDA-6.5 the K40c delivered roughly the same performance as the K20c.)
NLPCA

CUDA-6.5 significantly boost performance on a non-linear NLPCA, machine-learning, and deep-learning codes
PCA
Try the code yourself at farbopt on github. (We are currently working on a performance issue with the OpenACC Kepler reduction. Once fixed, we expect OpenACC to deliver similar performance.)
Leave a Reply