CUDA Archives - TechEnablement

Webinars Showing How to GPU Accelerate Python With Numba

November 24, 2015 by Rob Farber Leave a Comment

Register to attend a webinar about accelerating Python programs using the integrated GPU on AMD Accelerated Processing Units (APUs) using Numba, an open source just-in-time compiler, to generate faster code, all with pure Python. This webinar will be presented by Stanley Seibert from Continuum Analytics, the creators of the Numba project. This webinar is tailored to an … [Read more...]

AMD to Support CUDA Compatibility in 2016

November 23, 2015 by Rob Farber Leave a Comment

TechEnablment spoke with the AMD engineers at SC15 about the HIP (Heterogeneous-compute Interface for Portability) tool for porting CUDA-based applications to a common C++ programming model that can run on AMD FirePro™ graphics processing units (GPUs). AMD demonstrated the potential for HIP by running the CUDA-generated Rodinia benchmark suite on AMD GPUs. The tagline is "It's … [Read more...]

PyFR – Python/GPU Combustion Code Shortlisted for Several HPCWire Readers Choice Awards

September 26, 2015 by Rob Farber Leave a Comment

PyFR, the Python-based GPU accelerated CFD solver PyFR managed by TechEnablement contributor Peter Vincent has been shortlisted for several HPCWire Readers Choice Awards this year: 12. Best HPC Software Product or Technology 18. Best HPC Collaboration Between Academia & Industry 20. Top 5 New Products or Technologies to Watch If you would like to support them, … [Read more...]

Port Some CUDA Codes To Intel Xeon Phi Simply and Efficiently

May 15, 2015 by Rob Farber Leave a Comment

This tutorial shows that it relatively easy to port many CUDA C/C++ source codes to OpenMP. In the past, such efforts were not generally considered worthwhile because of the large performance difference between multicore processors (that use OpenMP) and GPUs. The introduction of teraflop/s Intel Xeon Phi coprocessors eliminated that performance difference, which makes it much … [Read more...]

PGI Compiled OpenACC ILP Loop Beats CUDA-7 by 200 GF/s on Deep-learning PCA Example

March 23, 2015 by Rob Farber Leave a Comment

The PGI OpenACC compiler beat the performance of a CUDA 7.0 NVIDIA nvcc compiled deep-learning based PCA (Principal Components Analysis) example by 200 GF/s on a K40c using an ILP (Instruction Level Parallelism) loop structure taught in the TechEnablement classes and forthcoming Farber OpenACC book. PCA is an important data analysis tool utilized by data scientists. Sign up for … [Read more...]

CUDA 7 Released

March 20, 2015 by Rob Farber Leave a Comment

NVIDIA released CUDA 7 for all to use! Download here for Windows, Linux x86, Linux Power 8, and MacOSX: Productivity and Performance Improvements C++11 support makes it easier for C++ developers to accelerate their applications Write less code with ‘auto’ and ‘lambda’, especially when using the Thrust template library. New cuSOLVER library of dense and sparse direct … [Read more...]

CUDA 7 For Registered Developers – LAPACK Dense Solvers 3-6x faster than MKL

January 13, 2015 by Rob Farber Leave a Comment

The CUDA Toolkit 7.0 Release Candidate (RC) is now available to members of NVIDIA’s free registered developer program. Especially interesting is the claim of 3-6x faster LAPACK dense solvers over MKL (The Intel Math Kernel Library). C++11 support makes it easier for C++ developers to accelerate their applications Write less code with ‘auto’ and ‘lambda’, especially when … [Read more...]

ORNL Introductory Tutorials On Concurrent Kernels

January 1, 2015 by Rob Farber Leave a Comment

The OLCF at Oakridge National Laboratory (ORNL) is working to educate users about how to best use their computing resources. As part of that process, the OLCF has published two very introductory tutorials to teach how to utilize concurrent kernels on their systems. Part 1 (concurrent kernels) and Part 2 (batched library calls) teach how to launch concurrent kernels using CUDA … [Read more...]

IPMACC – An Open Source OpenACC to CUDA/OpenCL Translator

December 23, 2014 by Rob Farber Leave a Comment

IPMACC is a research-grade open-source framework for translating OpenACC source code to CUDA or OpenCL. Binary executables can then be created with OpenCL or CUDA compilers. The authors (Ahmad Lashgar - University of Victoria, Alireza Majidi - Texas A&M University, Amirali Baniasadi - University of Victoria) verified correctness and performance using benchmarks from … [Read more...]

Inside The IBM NVIDIA Volta plus NVlink 2017 Delivery for $325M DOE Procurements

November 24, 2014 by Rob Farber Leave a Comment

The U.S. Department of Energy unveiled plans to build two GPU-accelerated leadership class supercomputers (Summit at ORNL and Sierra at LLNL) in a combined $325M USD procurement to be installed in 2017 that will be based on next-generation IBM POWER servers incorporating NVIDIA® Volta GPU accelerators plus NVLink™ high-speed GPU interconnect technology. The announcement by U.S. … [Read more...]