Amid a stronger-than-expected quarter (NVDA revenue $1.94B, datacenters sales of $409M), Jensen Huang (CEO/cofounder) Nvidia provided a comprehensive set of announcements during his GTC’17 keynote to position Nvidia for growth in the datacenter, cloud, embedded computing, plus IoT with low power domain level hardware (not a processor). Pre-emptive Intel Skylake comparative results were also announced although Huang noted that these are paper results as Nvidia has not ran on a Skylake processor as yet. In a post-keynote conversation with TechEnablement, Huang said that the agility of NVIDIA corporation was “due to the remarkable people who work there”. (View the Jensen Keynote here.) In particular, read further to understand the emphasis on ‘specialized’.
Volta was the expected key announcement. Volta is the next generation GPU (after Pascal) that will go into the CORAL collaboration supercomputers. NVIDA and IBM are working in partnership to deliver this hardware.
Even with major deliveries to the US government pending:
- Customers can pre-order the Volta-series DGX-1 box now for $149,000. The Pascal-equipped version is available for $129,000. Jensen noted in his keynote that if ordered now customers will be delivered with Pascal and upgraded to Volta for free when the GPUs are available. NVIDIA notes, “Customers can choose between getting DGX-1 with P100 now and receiving an upgrade to V100 as soon it’s available, or getting a DGX-1 with V100 when it starts shipping. DGX-1 service and support at additional cost.” *
- Customers can also order the DGX Station now, which is touted by NVIDIA to be a “Personal AI Supercomputer”. The NVIDIA website notes the DGX Station contains “next generation NVLink™ and new Tensor Core architecture.”*
The DGX product line shows NVIDIA expanding their efforts as a platform company with both workstation and server options.
The ‘soup-to-nuts’ solution
Briefly, NVIDIA showed thier agility and aggressive moves to dominate what is becoming billed as the AI market and yet far more. (Note in the section below are a few of our highlighted growth and potential gap-shortfall signals. Contact us to arrange a comprehensive, in-depth analysis briefing).
Jensen covered so much material during the keynote that we provide the key information in annotated outline form below.
- Project holodeck:
- Many applications including video communications and marketing (plus possibly help train robots as discussed later in this article).
- Photorealistic models and interactive physics
- Early access in September
- Specialized Deep-learning:
- Volta contains a tensor core to accelerate deep learning. Think of the tensor core on the Volta GPU as an MMX instruction set for a very specific (yet widely applicable) set of deep-learning applications.
- Very specifically (but not too technically), Ian Buck (VP Accelerated Computing business unit, NVIDIA) clarified during our press briefing that NVIDIA is not accelerating machine learning in-general but rather deep-learning applications. However deep-learning is generally considered as training a neural network that has many “hidden” layers between the input and output neurons. Buck was clear that NVIDIA does not accelerate the general class of deep-learning applications. Instead, the reduced precision NVIDIA ‘tensor core’ hardware only accelerates very, very specific domain applications much like MMX instructions accelerated multimedia applications in the Pentium processor line.
- Don’t get confused that NVIDIA is accelerating the general class of deep-learning applications.
- Think instead that they are accelerating a subset of a subset of machine-learning and deep-learning applications that can benefit from the reduced FP16 precision capabilities of the tensor core hardware. Thus, the tensor core will help some, but not most customers and data scientists.
- For the general class of machine- and deep-learning applications, NVIDIA relies on the GPU acceleration of floating-point intensive applications. However, this is an area that is now receiving significant competition from vendors like Intel and AMD.
- For this reason, TechEnablement views Volta as a very exciting cinematic and future-gaming GPU wrapped in the guise of a deep-learning product. The autoencoder portion of Jensen’s keynote is an indicator for NVIDIA future activities to accelerate ray-tracing.
- Shown in the slide below is a noisy ray-traced image that was accelerated using this autoencoder approach and massive parallelism to more quickly render a ‘photo realistic’ image.
- In contrast, CPU-based rendering uses log() runtime algorithms that also accelerate ray-tracing. The competition is if big memory log() runtime algorithms or massive parallelism can deliver better ray-tracing performance.
- Reduced precision HPC?
- Researchers are investigating other ways to capitalize on the reduced precision capabilities. Piotr Luszczek (Research Director, University of Tennessee) presented results in S7676 “Half Precision Benchmarking for HPC” that were exciting but showed that FP16 reduced precision solvers can run 2x faster than fp32 and 4x faster than fp64, but unfortunately slow convergence can make FP16 based computations actually run slower overall (e.g. time-to-solution) plus they require well conditioned matrices or the reduced precision methods will not find a solution. Further research is in process. Thus “not ready for prime time” is the current status of reduced precision for HPC.
- NVIDIA as a platform provider:
- NVIDIA is producing workstations and servers for NVIDIA accelerated deep-learning applications plus the HGX-1 for hyperscalers. Thus the keynote shows NVIDIA positioning itself more and more as a platform provider.
- Jensen noted relationships with all the major cloud providers. A cautionary note about a potential GAP pullback: As discussed previously don’t confuse machine learning and ‘deep-learning’ with NVIDIA-specific accelerated deep-learning using reduced-precision.
- NVIDIA cloud:
- Bill Gates showed the world that if you control the interface, you control the market. NVIDIA cloud uses containers (specifically Docker) and optimizations throughout the stack to “own” the interface to AI in the cloud.
- Look also for the NVIDIA stable of pre-trained AI solutions as a clearing house for valuable AI models that can be fine-tuned with customer data.
- Nvidia cloud offerings can run on AWS and Azure as highlighted by cameo appearances by Matt Wood (GM Deep Learning and AI, AWS) and Jason Zander (CVP, Microsoft Azure, Microsoft) and other cloud platforms.
- NVIDIA auto:
- NVIDIA IoT via the Xavier and Deep Learning Accelerator (DLA):
- DLAs are not processors but very specific application chips (Think ASIC but this may be inaccurate.) Jensen likened them several times during the keynote to TPUs in reference to Google’s Tensor Flow ASIC. The Xavier domain level development platform is open source with early access in July and general release in September.
- No NVIDIA person could comment on the process from Xavier development code (e.g. functional blocks and wiring) to a deliverable that could be used in IoT devices. We suspect Jensen might have outpaced his technology and PR teams when he announced this.
- Specialized deep-learning works nicely for speech and vision, but it also works for robotics (as well as self-driving cars). To help build training sets for robotics, NVIDIA has created a physics based robotic simulator called Isaac. We suspect there is cross fertilization with the Holodeck project as the core functionality appears to be similar. No NVIDIA employee was able to comment on availability or access to Isaac.
Considerations for an NVDA share pullback:
- NVIDIA engineers pointed out that the 25x faster than Skylake is a theoretical peak number based on reduced precision (e.g. a marketing number). NVIDIA engineers indicated a more realistic factor of 2.5x or so but these numbers are unconfirmed and Intel has not had a chance to respond.
- Reduced precision and tensor core is very specialized. Think of it like MMX for popular AI applications (but not general AI applications).
- Big news is the Volta L1 cache. Instead of a 10x slowdown that occurs on Pascal and previous generation GPUs when the number of model parameters exceeds the register file space (per TechEnablement benchmarks like farbOpt), the improved Volta L1 cache may limit the performance drop to a factor of 2x-3x. Still comparative numbers are needed for Volta against CPU and previous generation offerings.
- It’s not clear that massive parallelism will beat log() runtime ray tracing algorithms.
- NVIDIA has some tough delivery requirements to meet with Volta. Production will affect share price. Expect possible scheduling issues as other big companies have hot products that may consume fab time to NVDA detriment.
Other big news mentioned in the keynote or at GTC:
- Arterys Receives FDA Clearance For The First Zero-Footprint Medical Imaging Analytics Cloud Software With Deep Learning For Cardiac MRI. FDA approval is a big step forward for machine-learning technology in the medical field.
- NVIDIA accelerating SAP AI for Business
- First commercial AI offerings from SAP
- Brand Impact, Service Ticketing, Invoice-to-Record applications
- Powered by NVIDIA GPUs on DGX-1 and AWS
- NVIDIA creates NVIDIA Inception. This program can give them first crack at the most promising AI startups.
This was not a comprehensive analysis or coverage of GTC’17 announced capabilities. NVIDIA is moving quickly and doing some wonderful work, but the marketing is also introducing confusion. Contact us to arrange a comprehensive, in-depth analysis briefing.
* Apologies to Tiffany Trader at HPCWire. An error on our part caused the original two bullets marked by this asterisk about the cost and availability of the DGX-1 and DGX Station to be included without attribution. Her bullets were used as placeholder comments that were not removed before the article went live. We replaced the duplicate bullets immediately after Tiffany brought this to our attention using the exact descriptions of the products and pricing as provided by NVIDIA, as was originally intended when the article was written. The corrected bullets also reflect the fact that NVIDIA is taking orders now and not next quarter as originally indicated in the Jensen keynote.