The Missing Link in NVlink, or "Hello Pascal" bye-bye PCI bus limitations!

Say hello to NVlink, a new technology by NVIDIA that is not constrained by PCIe bandwidth and latency limitations, but you will have to wait for the Pascal generation of 2016 GPUs to get it. NVlink is NVIDIA’s properitary “DRAM speed and latency” class interface for CPU to GPU and GPU to GPU point-to-point communications. The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Multiple lanes can be tied together for higher bandwidth or connect individually to run many GPUs in a single system. Special CPUs with proprietary silicon on-chip interfaces will be able to communicate via NVlink to entirely bypass the PCI bus. Currently, NVlink products are targeted at HPC and enterprise customers. NVIDIA’s CEO and co-founder Jen Hsun noted at GTC 2014 that ARM and IBM CPU interfaces will become available while various non-technical issues need to be addressed before an x86 NVlink-capable CPU can be built.

When connected to a CPU that does not support NVLink, the interconnect can be wholly devoted to peer GPU-to-GPU connections (image is linked to the NVIDIA dev blog.) In this case the CPU communicates across the PCIe bus.

With the appropriate silicon support, NVLink can communicate to the GPUs in a manner similar similar to AMD’s Hypertransport, or Intel’s Quick Path Interconnect (QPI). This includes the NUMA aspects of not necessarily having every processor connected to every other processor.

The proprietary NVlink module (seen below, image linked to a techreport.com NVlink article) is quite small, about 1/3 the size of the standard PCIe boards used for GPUs today. NVIDIA claims that NVlink represents a “fundamental breakthrough” in energy efficiency that differentiates NVLink from PCIe. Connectors at the bottom of the NVlink module enable it to be plugged into the motherboard, improving system design and signal integrity. Techreport speculated that the second generation NVlink will be capable of maintaining cache coherency between multiple chips much like Intel’s QPI.

In terms of the all-important connector for NVlink, we have been told it looks very similar to a DRAM connector, but we will have to wait to see what production requirements affect the final design.

As I speculate in my insideHPC interview byRich Brueckner, it is possible that some enterprising startup will design a DRAM module that provides the “missing link” from any CPU to NVlink. Perhaps my prediction that GPU systems will evolve to become a direct instatation of Amdahl’s Law will finally come true:

GPUs will become the system component that contains most, if not all of the systems memory.
CPUs will steal bandwidth from the GPUs via some form of dual-ported memory interface to run any sequential sections of code.
All parallel operations will occur on the GPUs – no data movement required.

Share this:

Leave a Reply Cancel reply