Register to attend a webinar about accelerating Python programs using the integrated GPU on AMD Accelerated Processing Units (APUs) using Numba, an open source just-in-time compiler, to generate faster code, all with pure Python. This webinar will be presented by Stanley Seibert from Continuum Analytics, the creators of the Numba project. This webinar is tailored to an … [Read more...]
AMD to Support CUDA Compatibility in 2016
TechEnablment spoke with the AMD engineers at SC15 about the HIP (Heterogeneous-compute Interface for Portability) tool for porting CUDA-based applications to a common C++ programming model that can run on AMD FirePro™ graphics processing units (GPUs). AMD demonstrated the potential for HIP by running the CUDA-generated Rodinia benchmark suite on AMD GPUs. The tagline is "It's … [Read more...]
PyFR – Python/GPU Combustion Code Shortlisted for Several HPCWire Readers Choice Awards
PyFR, the Python-based GPU accelerated CFD solver PyFR managed by TechEnablement contributor Peter Vincent has been shortlisted for several HPCWire Readers Choice Awards this year: 12. Best HPC Software Product or Technology 18. Best HPC Collaboration Between Academia & Industry 20. Top 5 New Products or Technologies to Watch If you would like to support them, … [Read more...]
Port Some CUDA Codes To Intel Xeon Phi Simply and Efficiently
This tutorial shows that it relatively easy to port many CUDA C/C++ source codes to OpenMP. In the past, such efforts were not generally considered worthwhile because of the large performance difference between multicore processors (that use OpenMP) and GPUs. The introduction of teraflop/s Intel Xeon Phi coprocessors eliminated that performance difference, which makes it much … [Read more...]
PGI Compiled OpenACC ILP Loop Beats CUDA-7 by 200 GF/s on Deep-learning PCA Example
The PGI OpenACC compiler beat the performance of a CUDA 7.0 NVIDIA nvcc compiled deep-learning based PCA (Principal Components Analysis) example by 200 GF/s on a K40c using an ILP (Instruction Level Parallelism) loop structure taught in the TechEnablement classes and forthcoming Farber OpenACC book. PCA is an important data analysis tool utilized by data scientists. Sign up for … [Read more...]
CUDA 7 Released
NVIDIA released CUDA 7 for all to use! Download here for Windows, Linux x86, Linux Power 8, and MacOSX: Productivity and Performance Improvements C++11 support makes it easier for C++ developers to accelerate their applications Write less code with ‘auto’ and ‘lambda’, especially when using the Thrust template library. New cuSOLVER library of dense and sparse direct … [Read more...]
CUDA 7 For Registered Developers – LAPACK Dense Solvers 3-6x faster than MKL
The CUDA Toolkit 7.0 Release Candidate (RC) is now available to members of NVIDIA’s free registered developer program. Especially interesting is the claim of 3-6x faster LAPACK dense solvers over MKL (The Intel Math Kernel Library). C++11 support makes it easier for C++ developers to accelerate their applications Write less code with ‘auto’ and ‘lambda’, especially when … [Read more...]
ORNL Introductory Tutorials On Concurrent Kernels
The OLCF at Oakridge National Laboratory (ORNL) is working to educate users about how to best use their computing resources. As part of that process, the OLCF has published two very introductory tutorials to teach how to utilize concurrent kernels on their systems. Part 1 (concurrent kernels) and Part 2 (batched library calls) teach how to launch concurrent kernels using CUDA … [Read more...]
IPMACC – An Open Source OpenACC to CUDA/OpenCL Translator
IPMACC is a research-grade open-source framework for translating OpenACC source code to CUDA or OpenCL. Binary executables can then be created with OpenCL or CUDA compilers. The authors (Ahmad Lashgar - University of Victoria, Alireza Majidi - Texas A&M University, Amirali Baniasadi - University of Victoria) verified correctness and performance using benchmarks from … [Read more...]
Inside The IBM NVIDIA Volta plus NVlink 2017 Delivery for $325M DOE Procurements
The U.S. Department of Energy unveiled plans to build two GPU-accelerated leadership class supercomputers (Summit at ORNL and Sierra at LLNL) in a combined $325M USD procurement to be installed in 2017 that will be based on next-generation IBM POWER servers incorporating NVIDIA® Volta GPU accelerators plus NVLink™ high-speed GPU interconnect technology. The announcement by U.S. … [Read more...]