• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / CUDA / Biadu Small NVIDIA-Powered Cluster for ‘Most Accurate’ Near Human ImageNet Recognition Results

Biadu Small NVIDIA-Powered Cluster for ‘Most Accurate’ Near Human ImageNet Recognition Results

January 16, 2015 by Rob Farber Leave a Comment

Baidu Research utilized a small 36-node NVIDIA-powered cluster to attain the best computer vision ImageNet classification result to date with a 5.98% error vs. GoogleNet’s 6.66%. These results are very close to the human error rate of 5.1%. Key to the Baidu performance is their mix of model- and data-parallelism as well as the use of higher-resolution images (512×512 vs 256×256) plus the incorporation of additional synthetic data derived from the ImageNet images. The paper, “Deep Image: Scaling up Image Recognition“, describes the Baidu approach. It is available on arxiv.org.

In particular, Baidu augmented the ImageNet images with various effects such as color-casting, vignetting and lens distortion. The goal was to let the system take in more features of smaller objects and to learn what objects look like without being thrown off by editing choices, lighting situations or other extraneous factors.

Image courtesy Calisa Cole Baidu

The small NVIDIA-powered server is well within the reach of most universities and small companies. It is comprised of 36 server nodes, each with 2 six-core Intel Xeon E5-2620 processors. Each sever contains 4 Nvidia Tesla K40m GPUs and one FDR InfiniBand (56Gb/s) which is a high-performance low-latency interconnection and supports RDMA. The peak single precision floating point performance of each GPU is 4.29TFlops and each GPU has 12GB of memory.

Andrew Ng of Baidu taught the following deep-learning course in 2012

TechEnablement also makes our exascale-capable deep-learning mapping available on github. You can read more about our approach in this article, “Deep-learning Teaching Code Achieves 13 PF/s on the ORNL Titan Supercomputer“.

Share this:

  • Twitter

Filed Under: CUDA, Featured article, Featured news, News, News Tagged With: deep-learning, NVIDIA

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • High Performance Ray Tracing With Embree On Intel Xeon Phi
  • MultiOS Gaming, Media, and OpenCL Using XenGT Virtual Machines On Shared Intel GPUs
  • Intel Xeon Phi Study Guide
  • Free Intermediate-Level Deep-Learning Course by Google

Archives

© 2025 · techenablement.com