CUDA 340.29 Driver Significantly Boosts GPU Performance (100s GF/s For Machine-Learning)

Reports are now coming in about performance boosts that are the result of the CUDA 6.5 production release. The Blender project reports faster rendering time with CUDA-6.5. As can be seen in the graphs below that report performance on the farbopt deep-learning teaching code, CUDA-6.5 with the NVIDIA 340.29 driver have increased performance on linear problems (PCA analysis from 680 GF/s to 991 GF/s) and even more so for non-linear problems (NLPCA analysis from 682 GF/s to 1086 GF/s). Note that prior to CUDA-6.5 the K40c was delivering approximately the same performance and the K20c. This translates to 100s of GF/s in increased performance.

TechEnablement has been working with NVIDIA on a compiler performance regression issue that dropped performance on our machine-learning code since the CUDA 5.0 release. CUDA 6.5 incorporates that compiler fix. The extra performance that resulted from upgrading to the NVIDIA 340.29 driver was a surprise. Apparently this driver also helps the performance of reductions as well.

Please check out our GTC 2013 videos to understand why the Farber machine-learning mapping and the CUDA strong-scaling execution model delivers higher performance on the more complex non-linear problem:

GTC 2013 “Simplifying Portable Killer Apps with OpenACC and CUDA-5 Concisely and Efficiently” (video, pdf)
GTC 2013 “Clicking GPUs into a Portable, Persistent and Scalable Massive Data Framework” (video,pdf)
Deep-learning Teaching Code Achieves 13 PF/s on the ORNL Titan Supercomputer

Key points:

The new 340.29 driver are recommended upgrades!
- Test show that the performance increase is mainly from the 340.29 driver.
It is expected the new driver will also increase OpenCL performance as well.
Don’t forget that NVIDIA now supports Ubuntu 14.04 LTS. Say goodbye to the crufty 12.04 LTS!

The following graphs clearly show the performance boost. (Note prior to CUDA-6.5 the K40c delivered roughly the same performance as the K20c.)

NLPCA

CUDA-6.5 significantly boost performance on a non-linear NLPCA, machine-learning, and deep-learning codes

PCA

CUDA-6.5 significantly boost performance on PCA, machine-learning, and deep-learning codes

Try the code yourself at farbopt on github. (We are currently working on a performance issue with the OpenACC Kepler reduction. Once fixed, we expect OpenACC to deliver similar performance.)

NLPCA

PCA

Share this:

Leave a Reply Cancel reply