The PGI OpenACC compiler beat the performance of a CUDA 7.0 NVIDIA nvcc compiled deep-learning based PCA (Principal Components Analysis) example by 200 GF/s on a K40c using an ILP (Instruction Level Parallelism) loop structure taught in the TechEnablement classes and forthcoming Farber OpenACC book. PCA is an important data analysis tool utilized by data scientists. Sign up for … [Read more...]
CUDA 7 Released
NVIDIA released CUDA 7 for all to use! Download here for Windows, Linux x86, Linux Power 8, and MacOSX: Productivity and Performance Improvements C++11 support makes it easier for C++ developers to accelerate their applications Write less code with ‘auto’ and ‘lambda’, especially when using the Thrust template library. New cuSOLVER library of dense and sparse direct … [Read more...]
NVIDIA Titan X Powers Games and Virtual Reality
NVIDIA CEO Jen-Hsun Huang announced NVIDIA's latest GPU, the Titan X, in a surprise appearance at the 2015 Game Developers Conference. Jen-Hsun claims it is the most powerful GPU on the planet. The announcement followed a presentation by Epic Games' co-founder Tim Sweeney about the convergence of photorealistic imagery, film, video games, architecture, industrial design, and … [Read more...]
Multiple OpenACC Hackathons Scheduled Around the World
OakRidge National Laboratory has announced three GPU Hackathons for 2015. The first will be hosted April 20-24 by the National Center for Supercomputing Applications on the UIUC Campus. The second will be hosted by the Swiss National Supercomputing Centre in Lugano, Switzerland from July 6-10. The final one will be hosted by the Oak Ridge Leadership Computing Facility on … [Read more...]
TACC Accepting Summer Internship Applications
TACC is now accepting applications for the 2015 Research Experience for Undergraduates (REU) from June 1 to August 1, 2015. This summer, 10 undergraduate students from across the United States majoring in science and engineering will be immersed in training at UT Austin to become the next generation of ‘game changers.' Participants will explore grand challenges including … [Read more...]
Facebook Open-Sources Torch for Deep-Learning Neural Networks
Facebook has made Torch, an open source development environment for numerics, machine learning, and computer vision, with a particular emphasis on deep learning and convolutional nets available to everyone. The latest release includes GPU-optimized modules for large convolutional nets (ConvNets), as well as networks with sparse activations that are commonly used in Natural … [Read more...]
Biadu Small NVIDIA-Powered Cluster for ‘Most Accurate’ Near Human ImageNet Recognition Results
Baidu Research utilized a small 36-node NVIDIA-powered cluster to attain the best computer vision ImageNet classification result to date with a 5.98% error vs. GoogleNet's 6.66%. These results are very close to the human error rate of 5.1%. Key to the Baidu performance is their mix of model- and data-parallelism as well as the use of higher-resolution images (512x512 vs … [Read more...]
CUDA 7 For Registered Developers – LAPACK Dense Solvers 3-6x faster than MKL
The CUDA Toolkit 7.0 Release Candidate (RC) is now available to members of NVIDIA’s free registered developer program. Especially interesting is the claim of 3-6x faster LAPACK dense solvers over MKL (The Intel Math Kernel Library). C++11 support makes it easier for C++ developers to accelerate their applications Write less code with ‘auto’ and ‘lambda’, especially when … [Read more...]
IPMACC – An Open Source OpenACC to CUDA/OpenCL Translator
IPMACC is a research-grade open-source framework for translating OpenACC source code to CUDA or OpenCL. Binary executables can then be created with OpenCL or CUDA compilers. The authors (Ahmad Lashgar - University of Victoria, Alireza Majidi - Texas A&M University, Amirali Baniasadi - University of Victoria) verified correctness and performance using benchmarks from … [Read more...]
NVIDIA K80 1.8x Faster and “Highest Energy Efficiency to Date” for Financial Applications
STAC, the financial industry benchmarking organization, released performance testing results on the new NVIDIA Tesla K80 Dual-GPU Accelerator. In the STAC-A2 benchmark, which helps financial institutions and banks better manage risk, the NVIDIA Tesla K80 GPU set new performance records. The test code only used two threads on the host processor plus the K80 CUDA code was … [Read more...]