• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / CUDA / CUDA 340.29 Driver Significantly Boosts GPU Performance (100s GF/s For Machine-Learning)

CUDA 340.29 Driver Significantly Boosts GPU Performance (100s GF/s For Machine-Learning)

August 24, 2014 by Rob Farber Leave a Comment

Reports are now coming in about  performance boosts that are the result of the CUDA 6.5 production release. The Blender project reports faster rendering time with CUDA-6.5. As can be seen in the graphs below that report performance on the farbopt deep-learning teaching code, CUDA-6.5 with the NVIDIA 340.29 driver have increased performance on linear problems (PCA analysis from 680 GF/s to 991 GF/s) and even more so for non-linear problems (NLPCA analysis  from 682 GF/s to 1086 GF/s). Note that prior to CUDA-6.5 the K40c was delivering approximately the same performance and the K20c. This translates to 100s of GF/s in increased performance. 

TechEnablement has been working with NVIDIA on a compiler performance regression issue that dropped performance on our machine-learning code since the CUDA 5.0 release. CUDA 6.5  incorporates that compiler fix. The extra performance that resulted from upgrading to the NVIDIA 340.29 driver was a surprise. Apparently this driver also helps the performance of reductions as well.

Please check out our GTC 2013 videos to understand why the Farber machine-learning mapping and the CUDA strong-scaling execution model delivers higher performance on the more complex non-linear problem:

  • GTC 2013 “Simplifying Portable Killer Apps with OpenACC and CUDA-5 Concisely and Efficiently” (video, pdf)
  • GTC 2013 “Clicking GPUs into a Portable, Persistent and Scalable Massive Data Framework” (video,pdf)
  • Deep-learning Teaching Code Achieves 13 PF/s on the ORNL Titan Supercomputer

Key points:

  • The new 340.29 driver are recommended upgrades!
    • Test show that the performance increase is mainly from the 340.29 driver.
  • It is expected the new driver will also increase OpenCL performance as well.
  • Don’t forget that NVIDIA now supports Ubuntu 14.04 LTS. Say goodbye to the crufty 12.04 LTS!

The following graphs clearly show the performance boost. (Note prior to CUDA-6.5 the K40c delivered roughly the same performance as the K20c.)

 

NLPCA

CUDA-6.5 significantly boost performance on a non-linear NLPCA, machine-learning, and deep-learning codes

CUDA-6.5 significantly boost performance on a non-linear NLPCA, machine-learning, and deep-learning codes

PCA

CUDA-6.5 significantly boost performance on PCA, machine-learning, and deep-learning codes

CUDA-6.5 significantly boost performance on PCA, machine-learning, and deep-learning codes

Try the code yourself at farbopt on github. (We are currently working on a performance issue with the OpenACC Kepler reduction. Once fixed, we expect OpenACC to deliver similar performance.)

Share this:

  • Twitter

Filed Under: CUDA, Featured news, News, News, News, OpenCL Tagged With: CUDA, GPU, NVIDIA, Nvidia Tesla

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • MultiOS Gaming, Media, and OpenCL Using XenGT Virtual Machines On Shared Intel GPUs
  • High Performance Ray Tracing With Embree On Intel Xeon Phi
  • Intel Xeon Phi Study Guide
  • Free Intermediate-Level Deep-Learning Course by Google

Archives

© 2025 · techenablement.com