NVIDIA CEO Jensen Huang noted for the press at GTC 2018 that NVIDIA is now a platform company, not just a chip vendor. This message was clearly conveyed by the content of Jensen’s keynote that occupied two hours and spanned a number of topics from an NVlink hardware switch to the Xavier platform and self-driving cars.
Succinctly, NVIDIA provided two new key hardware designs and some brilliant software investments which they are in the process of very shrewdly using to position NVIDIA for massive growth in roughly two to three years. This timeline is approximate and depends somewhat on the interpretation of Jensen’s comment during the keynote about being in production for self-driving cars in “two to three years”.
The NVlink switch
The first new hardware design is an NVlink switch, which is a full crossbar switch that allows all connected NVlink devices on the switch to communicate at full bandwidth to any other device on the switch. This means all the interconnected GPUs can share each others memory, albeit at a lower bandwidth than directly accessing on-board memory.
This switch is the basis of the DGX-2 deep-learning platform, which we view as the bridge product for this year.
Expect DGX-2 platforms to be a popular and fast selling product because the $399k price for two tensorcore petaflops per second of dedicated training performance was previously unheard of at that price point. NVIDIA claims Alexnet can be trained in 18 minutes on a DGX-2. The original Alexnet run took 15 days to train in 2012.
To get a sense of why the DGX-2 system will sell fast, consider that the life of a data scientist is mostly consumed with creating a clean training set and then “trying” a number of different artificial neural network (ANN) architectures and initial parameter settings in the hope of finding a “good enough” solution.
The DGX-2 represents a huge increase in deep-learning training performance (which heavily utilizes the tensorcore capabilities) and thus can make those expensive data scientists much more effective because the DGX-2 can deliver trained ANNs much, much faster. Do the math for the cost of a data scientist and then consider if the DGX-2 makes them 2x, 5x, 10x, or 20x more effective due to faster training performance.
The NVIDIA DLA core: from self-driving cars to ARM
The NVIDIA DLA core is basically a low-power yet high tensorcore performance ASIC that greatly speeds inference operations. This core is used in the Xavier self-driving car hardware and further, NVIDIA contributed this core and associated software stack to the open ARM library.
Deepu Talla (CP and General Manager for Tegra, NVIDIA) noted that the hardware design for the DLA was the easy part. The software was the challenge.
This highlights the repositioning of NVIDIA as a platform company because NVIDIA “gets” that software is also key to hardware success.
The many faces of DLA
From self-driving cars, to robotics, to ARM IoT devices and mobile phones, DLA is a brilliant play.
Easy-Peasy: TensorRT is now part of the TensorFlow default distribution
To “make it easy” and to give everyone a reason to use DLA as well as the NVIDIA Data Center GPUs such as the P4, Jensen announced that TensorRT is now part of the TensorFlow distribution. (TensorRT optimizes trained ANNs for inferencing.) TensorFlow is now a dominant machine learning package, so this means seamless and easy inference on NVIDIA hardware for most data scientists in the world. Preemptive and in a word, “Brilliant”.
Jensen pointed during his keynote that he received a lot of criticism when he showed-off the NVIDIA self-driving car module at a previous GTC. “It’s so big!” was the comment. He then said that that module did not provide enough inference performance given all the sensors and other work that needs to be performed. In short, the DLA core is the focus of providing sufficient inference performance in a small enough power envelope. The Xavier self-driving car platform is the next step, but Jensen noted this will be further integrated into a smaller package in the future.
The future of self-driving cars, and their eventual success, will be regulatory. Thus NVIDIA is doing the hard work of getting such a complicated computer and associated software stack ASIL-D certified. From Wikipedia, “ASIL D, an abbreviation of Automotive Safety Integrity Level D, refers to the highest classification of initial hazard (injury risk) defined within ISO 26262 and to that standard’s most stringent level of safety measures to apply for avoiding an unreasonable residual risk.”
Jensen also noted that statistically there are 770 accidents per billion miles driven. However, it is not possible to achieve billions of miles of driving with physical self-driving cars. For this reason, NVIDIA created a simulator to help generate the statistics needed for regulatory approval, and (while not mentioned by Jensen) to provide to the insurance industry that self-driving cars are as-safe or safer than human drivers. More about this and associated benefits of the self-driving car effort in the following three sections on other market verticals.
This uniquely positions NVIDIA as a self-driving car platform. Unless other vendors follow suit quickly, NVIDIA will be uniquely positioned to corner the market in the $10T (according to NVIDIA) automotive market.
Regarding the recent unfortunate fatality caused by an Uber self-driving car, Jensen noted that:
- NVIDIA hardware and software was not involved.
- Safety is paramount at NVIDIA.
- NVIDIA follows good engineering practices, so they suspended testing until the cause of the fatality was determined. In this way, NVIDIA can get a good data point about something they may need to consider in their platform.
Potentially hitting the need to retool in a multi-billion unit market
NVIDIA contributed the DLA core to the ARM distribution. This means any ARM designer can use a DL inferencing engine with full software support by NVIDIA. Think of the billions of mobile phones out there and the rapidly increasing need for local inference on a battery restricted device. High-end phones such as Apple are already adding inference hardware. ASIL-D certification for the tensor core, incorporation in the default TensorFlow distribution, and the focus of a billion dollar company won’t hurt in getting into the mass market.
However NIH (Not Invented Here) will likely be an issue. Further, DL acceleration in the ARM – particularly the ARM mobile phone space – is going to be bloody. We will see how well this plays out, but it is a shrewd play.
Robotics and IoT
We love Deepu Talla’s statement that, “Self-driving cars are easy. All they have to do is go from point A to point B without touching anything. Robots are hard because they need to touch things”.
Self-driving cars are easy. All they have to do is go from point A to point B without touching anything. Robots are hard because they need to touch things – Deepu Tallia (VP and GM for Tesla, NVIDIA)
The DLA core is central to the NVIDIA robotics and IoT strategy. This will certainly play out in the coming months and next 2-3 years. It does not hurt to be designing with ASIL-D certified hardware that can run on ARM, in the data-center, in self-driving cars and is compact and low-power yet high performance.
As shown in last year’s keynote, Jensen is also using the NVIDIA simulator to help generate data for robotics researchers and commercial efforts. Thus roboticists can work in the simulator before committing to hardware. It just makes sense.
Not just a one-trick pony, the DLA core is also used in generating photorealistic images in real-time. At last year’s GTC, Jensen showed an autoencoder neural network that took a crudely ray-traced image and generated a believable and pretty photorealistic image. That technology has been greatly improved. Specifically:
- Jensen showed photorealistic ray-traced videos generated at high-resolutions in realtime. Jensen said this has been a goal at NVIDIA for 25 years. The initial target is the cinematic and remote sales content generators.
- We expect – especially with better DLA integration – for the price of realtime photorealistic rendering to drop precipitously. Integration in gaming related standards like DirectX and Vulcan signal a potential move into real-time photorealistic gaming once design and production costs are recouped so prices can drop.
- Further, Jensen showed a flat colored video of the view driving down a roadway that only used a few (say six) colors. This was then converted into a lovely photorealistic video in realtime. Can we say wonderful photorealistic image encoding? Think how the MIDI standard allows music to be incorporated into games while consuming a small amount of space. It’s easy for musicians to generate content this way. Now think of Jensen’s example as a very highly compressed, easy to generate technology for game authors to use to create scenes that can be photorealistically rendered. For game authors, no more image pre-baking (sometimes called pre-rendering)!
Jensen showed the capability, now it’s a question of cost.
There is much more behind the NVIDIA announcements including increasing the memory capacity of Volta GPUs to 32GB and other work.
Contact TechEnablement for one-on-one analysis and education. Like NVIDIA, we make it easy, just email us.