• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / CUDA / GPUs Power Over 90% of ImageNet Deep-Learning Visual Recognition Challenge Entries

GPUs Power Over 90% of ImageNet Deep-Learning Visual Recognition Challenge Entries

September 7, 2014 by Rob Farber Leave a Comment

Over 90 percent of the participating teams and three of the four winners in the prestigious 2014 ImageNet Large Scale Visual Recognition Challenge used GPUs to enable their deep learning work. Deep learning is a fast-growing segment of machine learning that involves the creation of sophisticated, multi-level or “deep” neural networks. These networks enable powerful computer systems to learn to recognize patterns, objects and other items by analyzing massive amounts of training data.

To accelerate research development in deep-learning, NVIDIA has released the cuDNN machine-learning library  to CUDA registered developers. The cuDNN library can be downloaded from the cuDNN website,

The cuDNN library is supported as part of UC Berkeley’s Caffe with integrations into other popular machine learning frameworks on the way. Please see Caffe’s documentation for enabling cuDNN with Berkeley’s framework.

The NVIDIA website provides the following plots show the speedup on a few example problems:

cuDNN performance against MKL

cuDNN performance against MKL

Other projects such as RaPyDLI by Jack Dongarra (University of Tennessee) and Geoffrey Fox (Indiana University) along with Andrew Ng (Stanford, Baidu and Coursera)  also provide a convenient interface for deep-learning. RaPyDLI provides a Python interface to run Deep-learning problems on both CPUs, GPUs, and Intel Xeon Phi.

The freely available farbopt deep-learning teaching code that runs on CPUs, GPU, and Intel Xeon Phi using CUDA, OpenACC, OpenMP, Intel Native, and Intel Xeon Phi offload versions provides comparative performance. The farbopt code also exhibits near-linear scaling beyond tens of thousands of devices (such as 16,384 GPUs on the ORNL Titian and 128k processors on two CM-200 connection machines.)

While not an apples-to-apples comparison, the Farber teaching code does deliver over a TF/s per device on both linear and nonlinear deep-learning problems as can be seen in the following per-device performance plots from the article “Deep-learning Teaching Code Achieves 13 PF/s on the ORNL Titan Supercomputer“. (I developed this high-performance mapping in the early 1980s while in the Theoretical Division at Los Alamos Laboratory and a member of the external faculty at the Santa Fe Institute. It was the first program I ran on NVIDIA GPUs and was the  performance motivation for NVIDIA GPUs in my  2008 Dr. Dobbs article, “CUDA, Supercomputing for the Masses: Part 1“.)

Farber High-Performance and Scalable Deep-learning mapping

Farber High-Performance and Scalable Deep-learning mapping

 

The renewed popularity of deep-learning coupled with the modern TF/s devices is powerful. In addition, deep-learning can be structured to learn tasks and potentially neural subsystems.

The comparison indicates that the cuDNN benchmarks are not showing the full performance of the K4o hardware for some reason.

CUDA-6.5 significantly boost performance on a non-linear NLPCA, machine-learning, and deep-learning codes

CUDA-6.5 significantly boost performance on a non-linear NLPCA, machine-learning, and deep-learning codes

CUDA-6.5 significantly boost performance on PCA, machine-learning, and deep-learning codes

CUDA-6.5 significantly boost performance on PCA, machine-learning, and deep-learning codes

Share this:

  • Twitter

Filed Under: CUDA, Featured news, News, News Tagged With: CUDA, deep-learning, GPU, HPC, machine-learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • PyFR - Python/GPU Combustion Code Shortlisted for Several HPCWire Readers Choice Awards
  • Learn how to program IBM's 'Deep-Learning' SyNAPSE chip
  • OpenACC Adoption Continues to Gain Momentum in 2016
  • Dynamic Load Balancing using OpenMP 4.0
  • Register For Lustre's Brent Gorda Parallel Storage and Big Data HP-Cast

Archives

© 2026 · techenablement.com