• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Analysis / Nvidia Talks About ARM64 and 64-bit K1 SoC

Nvidia Talks About ARM64 and 64-bit K1 SoC

August 13, 2014 by Rob Farber Leave a Comment

The Hot Chips 2014 conference conveyed some hot information this week about Nvidia’s 64-bit Tegra K1 -the first 64-bit ARM processor for Android devices that pairs the dual-core “Project Denver” CPU with Nvidia’s 192-core Kepler GPU  (a ceepee geepee). The ARM-based Denver CPU was custom designed by Nvidia and is compatible with ARM’s 64-bit ARMv8-A architecture. The chip is also pin-compatible with the 32-bit Tegra K1 chip, enabling products that will run either chip without any tweaking needed. Think ARM64 Jetson boards, Shield Tablets, and Chromebooks. Nvidia claims exceptional performance and superior energy efficiency over other ARM-based mobile processors.

Originally intended for servers, Nvidia launched  Project Denver  between 3-5 years ago. The recent Cirrascale server RM1905D is one instantiation of that effort that combines ARM64 with GPUs as shown in the graphic below .

Cirrascale RM1905D Development Platform

click to view more information

In June the Wall Street Journal reported that Nvidia (along with Samsung) have decided to shy away from the server chip battle and are now focusing the latest 64-bit Tegra chips (Tegra K1) on smartphones, tablets, cars and other embedded devices. According to seekingalpha.com, Nvidia believes the mobile K1 chip could make it to microservers, byt has scrapped any near-term plan to develop a specialized server CPU. Nvidia derives over 10% of its valuation from the Tegra division with  Tegra processors revenue expected to reach $1.5 billion driven by three key growth drivers – mobile devices, automotive electronics and gaming systems. Rather than building its own server processors, Nvidia will supplement ARM processor-based servers with high-performance GPUs. The Cirrascale offering appears to confirm this strategy and is in-line with the 2013 comment by Sumit Gupta, General Manager of the Tesla Accelerated Computing business unit at NVIDIA that, “We think GPU accelerators are going to be, in effect, the floating-point units for ARM processors.”

Qualcomm has been working on server chips but has not announced any plans to introduce them. Other companies that have announced plans for ARM-based chips for servers include Applied Micro Circuits, AMD, Broadcom, Cavium, Texas Instruments and Marvell Technology Group.

Nvidia expects its partners to launch mobile devices based on its 64-bit Tegra K1 later this year. The company is currently developing the next version of Android L – which caters  directly to enterprise concerns for Android– on the 64-bit Tegra K1. The combination of low-power, high-performance, 64-bit capability means Nvidia will be able to further expand into a multitude of large markets where visual computing matters, such as auto navigation systems, TV set top boxes and new desktop form factors like all-in-ones, clip-on,  smart monitors, and others.

The Nvidia blog contains more details of the Nvidia ARM64 processor.

  • Each of the two Denver cores implements a 7-way superscalar microarchitecture (up to 7 concurrent micro-ops can be executed per clock), and includes a 128KB 4-way L1 instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2 cache, which services both cores.

  • Denver implements an innovative process called Dynamic Code Optimization, which optimizes frequently used software routines at runtime into dense, highly tuned microcode-equivalent routines. These optimized routines are stored in a dedicated, 128MB main-memory-based optimization cache where they are executed, re-fetched and executed from the instruction cache as long as needed and capacity allows. Nvidia claims, “Dynamic Code Optimization works with all standard ARM-based applications, requiring no customization from developers, and without added power consumption versus other ARM mobile processors.“

In our opinion, Dynamic Code Optimization sounds suspiciously like the generation of microcode “kernels” that can potentially be executed in parallel either on ARM Cores or, depending on the magic of LLVM, potentially on the Kepler GPU. Time will tell if that suspicion is bourne out in truth!

Tegra K1 Denver

click image to read blog post

NVIDIA Tegra K1 64-bit Denver CPU

click image to read blog post

 

 

Share this:

  • Twitter

Filed Under: Analysis, CUDA, Featured article, Featured news, News, News Tagged With: ARM, CUDA, HPC, NVIDIA

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • A Try-Before-You-Code Linear Regression Method Claims 32% Error Predicting GPU Perf
  • Free Book From Altera Includes OpenCL on FPGA Section
  • Register For Lustre's Brent Gorda Parallel Storage and Big Data HP-Cast
  • Pre-order Your NVIDIA Shield Tablet Now! (available July 29 in US)
  • Research Job: Imperial College London to Port Python-code to OpenCL

Archives

© 2026 · techenablement.com