• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Analysis / NVIDIA – “[Intel] Should Get Their Facts Straight” on Machine Learning Benchmarks

NVIDIA – “[Intel] Should Get Their Facts Straight” on Machine Learning Benchmarks

August 16, 2016 by Rob Farber Leave a Comment

NVIDIA responds to the machine learning benchmark results presented by Intel at ISC’16, “It’s great that Intel is now working on deep learning. This is the most important computing revolution with the era of AI upon us and deep learning is too big to ignore. But they should get their facts straight.” (Source: NVIDIA)

NVIDIA notes further that, “While we can correct each of their wrong claims, we think deep learning testing against old Kepler GPUs and outdated software versions are mistakes that are easily fixed in order to keep the industry up to date.” (Source: NVIDIA)

We expect to see many more benchmark comparisons now that  NVIDIA Pascal and the newest Intel Xeon Phi are available to 3rd parties.  Sign up for access to the newest Intel Xeon Phi on the Intel Machine Learning Portal. Now that Pascal is in the wild, we expect 3rd party NVIDIA Pascal benchmarks to soon become available as well.

The Intel benchmarks were announced at ISC’16.

Intel Furthers Machine Learning Capabilities

NVIDIA Information

Fresh vs Stale Caffe

Intel used Caffe AlexNet data that is 18 months old, comparing a system with four Maxwell GPUs to four Xeon Phi servers. With the more recent implementation of Caffe AlexNet, publicly available here, Intel would have discovered that the same system with four Maxwell GPUs delivers 30% faster training time than four Xeon Phi servers.

In fact, a system with four Pascal-based NVIDIA TITAN X GPUs trains 90% faster and a single NVIDIA DGX-1 is over 5x faster than four Xeon Phi servers.

38% Better Scaling

Intel is comparing Caffe GoogleNet training performance on 32 Xeon Phi servers to 32 servers from Oak Ridge National Laboratory’s Titan supercomputer. Titan uses four-year-old GPUs (Tesla K20X) and an interconnect technology inherited from the prior Jaguar supercomputer. Xeon Phi results were based on recent interconnect technology.

Using more recent Maxwell GPUs and interconnect, Baidu has shown that their speech training workload scales almost linearly up to 128 GPUs.

NVIDIA ML scaling

Source: Persistent RNNs: Stashing Recurrent Weights On-Chip, G.Diamos

Scalability relies on the interconnect and architectural optimizations in the code as much as the underlying processor. GPUs are delivering great scaling for customers like Baidu.

Strong-Scaling to 128 Nodes

Intel claims that 128 Xeon Phi servers deliver 50x faster performance compared with a single Xeon Phi server, while no such scaling data exists for GPUs. As noted above, Baidu already published results showing near-linear scaling up to 128 GPUs.

For strong-scaling, we believe strong nodes are better than weak nodes. A single strong server with numerous powerful GPUs delivers superior performance than lots of weak nodes, each with one or two sockets of less-capable processors, like Xeon Phi. For example, a single DGX-1 system offers better strong-scaling performance than at least 21 Xeon Phi servers (DGX-1 is 5.3x faster than 4 Xeon Phi servers).

For more information, see the NVIDIA blog:

Correcting Some Mistakes

Intel information

In comparison, TechEnablement has been running a sponsored series by Intel showing the benefits of the Intel Scalable System Framework (that includes Intel Xeon Phi). These articles are:

Faster Deep Learning with the Intel® Scalable System Framework: Next Generation Processors

How the Intel® OPA Fabric Facilitates Distributed Training

How Lustre and DAOS Enable Faster Deep Learning

How Intel® MPI Enables Scalable Distributed Machine Learning

 

Share this:

  • Twitter

Filed Under: Analysis, Featured article, Featured news, News, News, Xeon Phi Tagged With: Intel Xeon Phi, NVIDIA Intel GPU

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • DARPA Goals, Requirements, and History of the SyNAPSE Project
  • Recovering Speech from a Potato-chip Bag Viewed Through Soundproof Glass - Even With Commodity Cameras!
  • Paper Compares AMD, NVIDIA, Intel Xeon Phi CFD Turbulent Flow Mesh Performance Using OpenMP and OpenCL
  • Lustre Delivers 10x the Bandwidth of NFS on Intel Xeon Phi
  • South Africa Team Wins Their Second Student Supercomputing Competition At ISC14

Archives

© 2025 · techenablement.com