• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / News / The Missing Link in NVlink, or “Hello Pascal” bye-bye PCI bus limitations!

The Missing Link in NVlink, or “Hello Pascal” bye-bye PCI bus limitations!

May 13, 2014 by Rob Farber Leave a Comment

Say hello to NVlink, a new technology by NVIDIA that is not constrained by PCIe bandwidth and latency limitations, but you will have to wait for the Pascal generation of 2016 GPUs to get it.  NVlink is NVIDIA’s properitary “DRAM speed and latency” class  interface for CPU to GPU and GPU to GPU point-to-point communications. The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Multiple lanes can be tied together for higher bandwidth or connect individually to run many GPUs in a single system. Special CPUs with proprietary silicon on-chip interfaces will be able to communicate via NVlink to entirely bypass the PCI bus. Currently, NVlink products are targeted at HPC and enterprise customers. NVIDIA’s CEO and co-founder Jen Hsun noted at GTC 2014 that ARM and IBM CPU interfaces will become available while various non-technical issues need to be addressed before an x86 NVlink-capable CPU can be built.

Pascal

When connected to a CPU that does not support NVLink, the interconnect can be wholly devoted to peer GPU-to-GPU connections (image is linked to the NVIDIA dev blog.) In this case the CPU communicates across the PCIe bus.

nvlink_quad

With the appropriate silicon support, NVLink can communicate to the GPUs in a manner similar similar to AMD’s Hypertransport, or Intel’s Quick Path Interconnect (QPI). This includes the NUMA aspects of not necessarily having every processor connected to every other processor.

nvlink_single_dual

The proprietary NVlink module (seen below, image linked to a techreport.com NVlink article) is quite small, about 1/3 the size of the standard PCIe boards used for GPUs today. NVIDIA claims that NVlink represents a “fundamental breakthrough” in energy efficiency that differentiates NVLink from PCIe. Connectors at the bottom of the NVlink module enable it to be plugged into the motherboard, improving system design and signal integrity. Techreport speculated that the second generation NVlink will be capable of maintaining cache coherency between multiple chips much like Intel’s QPI. 

In terms of the all-important connector for NVlink, we have been told it looks very similar to a DRAM connector, but we will have to wait to see what production requirements affect the final design.

As I speculate in my  insideHPC interview byRich Brueckner, it is possible that some enterprising startup will design a DRAM module that provides the “missing link” from any CPU to NVlink. Perhaps my prediction that GPU systems will evolve to become a direct instatation of Amdahl’s Law will finally come true:

  • GPUs will become the system component that contains most, if not all of the systems memory.
  • CPUs will steal bandwidth from the GPUs via some form of dual-ported memory interface to run any sequential sections of code.
  • All parallel operations will occur on the GPUs – no data movement required.

Share this:

  • Twitter

Filed Under: CUDA, Featured news, News, News, News, News, openacc, OpenCL Tagged With: GPU, NVIDIA, OpenCL, x86

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • PyFR: A GPU-Accelerated Next-Generation Computational Fluid Dynamics Python Framework
  • Multiple OpenACC Hackathons Scheduled Around the World
  • DARPA Blue Wolf - program to build fast submarines
  • NVIDIA GTC 2015 keynote - Near-term Roadmap is Deep-Learning
  • From ‘Correct’ to ‘Correct & Efficient’: a Hydro2D case study with Godunov’s scheme

Archives

© 2026 · techenablement.com