Baidu Research utilized a small 36-node NVIDIA-powered cluster to attain the best computer vision ImageNet classification result to date with a 5.98% error vs. GoogleNet's 6.66%. These results are very close to the human error rate of 5.1%. Key to the Baidu performance is their mix of model- and data-parallelism as well as the use of higher-resolution images (512x512 vs … [Read more...]
CUDA 7 For Registered Developers – LAPACK Dense Solvers 3-6x faster than MKL
The CUDA Toolkit 7.0 Release Candidate (RC) is now available to members of NVIDIA’s free registered developer program. Especially interesting is the claim of 3-6x faster LAPACK dense solvers over MKL (The Intel Math Kernel Library). C++11 support makes it easier for C++ developers to accelerate their applications Write less code with ‘auto’ and ‘lambda’, especially when … [Read more...]
ORNL Introductory Tutorials On Concurrent Kernels
The OLCF at Oakridge National Laboratory (ORNL) is working to educate users about how to best use their computing resources. As part of that process, the OLCF has published two very introductory tutorials to teach how to utilize concurrent kernels on their systems. Part 1 (concurrent kernels) and Part 2 (batched library calls) teach how to launch concurrent kernels using CUDA … [Read more...]
IPMACC – An Open Source OpenACC to CUDA/OpenCL Translator
IPMACC is a research-grade open-source framework for translating OpenACC source code to CUDA or OpenCL. Binary executables can then be created with OpenCL or CUDA compilers. The authors (Ahmad Lashgar - University of Victoria, Alireza Majidi - Texas A&M University, Amirali Baniasadi - University of Victoria) verified correctness and performance using benchmarks from … [Read more...]
NVIDIA K80 1.8x Faster and “Highest Energy Efficiency to Date” for Financial Applications
STAC, the financial industry benchmarking organization, released performance testing results on the new NVIDIA Tesla K80 Dual-GPU Accelerator. In the STAC-A2 benchmark, which helps financial institutions and banks better manage risk, the NVIDIA Tesla K80 GPU set new performance records. The test code only used two threads on the host processor plus the K80 CUDA code was … [Read more...]
Students Convert Sign Language to Text with 250 Lines of Code on an NVIDIA K1
Three Princeton students (Ethan Gordon ’17, David Liu ’17 and Jeffrey Han ’17) used an NVIDIA Jetson plus OpenCV – an open-source real-time computer vision library – to build a system able to interpret sign language letters from a video feed in 250 lines of code. The system response time was reported to be "snappy" as the GPU-accelerated edge detection and least-squares … [Read more...]
Comparing Managed Memory Between GPUs and Intel Xeon Phi
Managed memory greatly simplifies programming GPUs and Intel Xeon Phi coprocessors (when used in offload mode) because data can be utilized on either the host or the device without having to perform explicit device transfers. Instead the device(s) and host interact through the device driver to transparently migrate data as needed. As a result, application codes tend to be … [Read more...]
Late-Breaking NVIDIA Call For GTC 2015 Posters Opens
NVIDIA has opened a call for late-breaking posters for GTC 2015. This call is specifically designed to let bring the results hot-off-the-press results to GTC. Submissions can be made here. The call closes Monday January 12, 2015. Poster Criteria The purpose of your poster is to convey to a wide audience a research project's significance to scholars in the field and its … [Read more...]
Inside The IBM NVIDIA Volta plus NVlink 2017 Delivery for $325M DOE Procurements
The U.S. Department of Energy unveiled plans to build two GPU-accelerated leadership class supercomputers (Summit at ORNL and Sierra at LLNL) in a combined $325M USD procurement to be installed in 2017 that will be based on next-generation IBM POWER servers incorporating NVIDIA® Volta GPU accelerators plus NVLink™ high-speed GPU interconnect technology. The announcement by U.S. … [Read more...]
The MSI WS60 As A Mobile Workstation Teaching Tool
After nearly a month of utilizing the MSI WS60 Mobile Workstation I have to admit I am spoiled by the speed and balance of this system. The clear IPS display is a pleasure to use when mobile and I love the color images on my Dell U3011 screen, courtesy of the NVIDIA K2100M GPU plus the Optimus technology preserves battery life. Speed The following animated gif (repeated 10 … [Read more...]









