The good folks at Tom's Hardware are lending credibility to the Antutu benchmarks of a K1 powered NVIDIA Shield 2 (link). It is not surprising that the NVIDIA Shield would be one of the first platforms to contain the newest NVIDIA Tegra chip. The claimed specs for the Shield-2 appear reasonable: A screen resolution of 1440 x 810, 4 GB of RAM 16 GB of internal … [Read more...]
PGI 14.4 Release Contains Much OpenACC C++ Goodness
PGI released their 14.4 and upcoming 14.7 OpenACC 2.0 roadmap. The expectation is that we will see the 14.4 release in early May and the 14.7 release in early July. Note: these are not official PGI dates. Analysis: The 14.4 support of atomic operations will enable many low-wait algorithms such as counters and massively parallel stacks. Improved reduction performance in … [Read more...]
Intel Mobile Reports $929M 1Q14 loss and $3.15B 2013 loss
Mobile is big money unless you are playing catch up. Sean Hollister at The Verge relays a report that the Intel Mobile division lost $3.15 Billion in 2013 and that losses in Q1 2014 are already $929 million. In my article, "Mobile Tech between a Rock and a Hard Place", I noted: Intel’s chief executive Brian Krzanich admitted at the firm’s November 2013 annual investor … [Read more...]
(4/24 update) Signals from Nvidia’s Sumit Gupta
Sumit Gupta is a busy man. Named by HPCwire as a 2013 "Person to Watch", Sumit does not idly take time to create a blog post unless it conveys a message about the NVIDIA Tesla development and marketing effort. His recent blog, "Fostering an Explosion of Innovation in the Data Center", posted by Steve Hamm, recognizes how the data-center is going to be supporting mobile … [Read more...]
Calling all Android Developers Interested in Using Microsoft Visual Studio on NVIDIA Tegra
The NVIDIA Developer Tools team is conduction a survey to assess interest in Microsoft Visual Studio as a development tool for Tegra Android devices. Not a registered developer? No problem! click here. NVIDIA is working on making development on Tegra platforms the best possible environment for Android application development. We are conducting interest from our Tegra developer … [Read more...]
Proof-of-Concept WebCL Chrome Browser Available from AMD
AMD has been working on implementing WebCL inside a Chrome browser to enable web programmer's access to OpenCL acceleration plus WebCL and WebGL interoperability. (Firefox, Chrome and Safari all have some form of WebCL support.) The following video shows the potential: http://youtu.be/dGD9NpipcrE Hands on experience can be found through the Chromium-WebCL github project, … [Read more...]
Inside NVIDIA’s Unified Memory: Multi-GPU Limitations and the Need for a cudaMadvise API Call
The CUDA 6.0 Unified Memory offers a “single-pointer-to-data” model that is similar to CUDA’s zero-copy mapped memory. Both make it trivially easy for the programmer to access memory on the CPU or GPU, but applications that use mapped memory have to perform a PCI bus transfer occur every time a memory access steps outside of a cache line while a kernel running in a Unified … [Read more...]
Battery Powered Supercomputing for the Masses: First Impression of the NVIDIA Jetson TK1 board
GTC 2014 demonstrated that we have now entered the "Battery Powered Supercomputing for the Masses" era. I had the opportunity to experience a Jetson TK1 board running ubuntu 13.04 at the hands-on lab. First impressions were very positive with a snappy response to the Ubuntu window system.. The GTC hands-on labs are oriented for techies and not the press. They provide a very … [Read more...]
Micron’s New Automata Processor
Adding computation to memory is a fantastic way to accelerate applications and real-time solutions. Content addressable memory (CAM) is a widespread and compelling example of how hardware can speed table lookups. (Most virtual memory computers utilize CAM to perform page lookups.) Micron recently announced the Automata Processor (AP) that implements an NFA (Non-deterministic … [Read more...]
Deep-learning Teaching Code Achieves 13 PF/s on the ORNL Titan Supercomputer
The deep-learning teaching code described in my book, "CUDA Application Design and Development" [Chapters 2, 3, and 9] plus online tutorials achieved 13 PF/s average sustained performance using 16,384 GPUs on the Oakridge Titan supercomputer. Full source code for my teaching code can be found on github in the farbopt directory. Nicole Hemsoth at HPCwire noted these CUDA … [Read more...]









