NVIDIA responds to the machine learning benchmark results presented by Intel at ISC’16, “It’s great that Intel is now working on deep learning. This is the most important computing revolution with the era of AI upon us and deep learning is too big to ignore. But they should get their facts straight.” (Source: NVIDIA)
NVIDIA notes further that, “While we can correct each of their wrong claims, we think deep learning testing against old Kepler GPUs and outdated software versions are mistakes that are easily fixed in order to keep the industry up to date.” (Source: NVIDIA)
We expect to see many more benchmark comparisons now that NVIDIA Pascal and the newest Intel Xeon Phi are available to 3rd parties. Sign up for access to the newest Intel Xeon Phi on the Intel Machine Learning Portal. Now that Pascal is in the wild, we expect 3rd party NVIDIA Pascal benchmarks to soon become available as well.
The Intel benchmarks were announced at ISC’16.
Fresh vs Stale Caffe
Intel used Caffe AlexNet data that is 18 months old, comparing a system with four Maxwell GPUs to four Xeon Phi servers. With the more recent implementation of Caffe AlexNet, publicly available here, Intel would have discovered that the same system with four Maxwell GPUs delivers 30% faster training time than four Xeon Phi servers.
38% Better Scaling
Intel is comparing Caffe GoogleNet training performance on 32 Xeon Phi servers to 32 servers from Oak Ridge National Laboratory’s Titan supercomputer. Titan uses four-year-old GPUs (Tesla K20X) and an interconnect technology inherited from the prior Jaguar supercomputer. Xeon Phi results were based on recent interconnect technology.
Using more recent Maxwell GPUs and interconnect, Baidu has shown that their speech training workload scales almost linearly up to 128 GPUs.
Scalability relies on the interconnect and architectural optimizations in the code as much as the underlying processor. GPUs are delivering great scaling for customers like Baidu.
Strong-Scaling to 128 Nodes
Intel claims that 128 Xeon Phi servers deliver 50x faster performance compared with a single Xeon Phi server, while no such scaling data exists for GPUs. As noted above, Baidu already published results showing near-linear scaling up to 128 GPUs.
For strong-scaling, we believe strong nodes are better than weak nodes. A single strong server with numerous powerful GPUs delivers superior performance than lots of weak nodes, each with one or two sockets of less-capable processors, like Xeon Phi. For example, a single DGX-1 system offers better strong-scaling performance than at least 21 Xeon Phi servers (DGX-1 is 5.3x faster than 4 Xeon Phi servers).
For more information, see the NVIDIA blog:
In comparison, TechEnablement has been running a sponsored series by Intel showing the benefits of the Intel Scalable System Framework (that includes Intel Xeon Phi). These articles are: