• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Archives for CUDA / Tutorials

Learn to Make Windows 10 Apps with Free Microsoft Course Then Add GPU Acceleration!

October 1, 2015 by Rob Farber Leave a Comment

Free Windows courses by themselves are not newsworthy, but those who wish to create Windows 10 apps for the Windows Marketplace - AND exploit the power of CUDA and OpenCL computing via C# should find the Free Microsoft course in combination with the TechEnablement tutorial "Combine C-Sharp With CUDA and OpenCL On Linux, iOS, Android and Windows" an enabling pair of … [Read more...]

Port Some CUDA Codes To Intel Xeon Phi Simply and Efficiently

May 15, 2015 by Rob Farber Leave a Comment

This tutorial shows that it relatively easy to port many CUDA C/C++ source codes to OpenMP. In the past, such efforts were not generally considered worthwhile because of the large performance difference between multicore processors (that use OpenMP) and GPUs. The introduction of teraflop/s Intel Xeon Phi coprocessors eliminated that performance difference, which makes it much … [Read more...]

Kriging Interpolation Exhibits Strong Scaling Across GPUs

January 17, 2015 by Rob Farber Leave a Comment

Geostatistical interpolation (Kriging) can be useful in a great number of applications where high fidelity models are required for mapping spatial effects and making predictions based on observations. It is widely utilized in the domain of spatial analysis and computer experiments and heavily used by the US  Air Force and GIS services. The following images by Yang, et. al. … [Read more...]

ORNL Introductory Tutorials On Concurrent Kernels

January 1, 2015 by Rob Farber Leave a Comment

The OLCF at Oakridge National Laboratory (ORNL) is working to educate  users about how to best use their computing resources. As part of that process, the OLCF has published two very introductory tutorials to teach how to utilize concurrent kernels on their systems. Part 1 (concurrent kernels) and Part 2 (batched library calls) teach how to launch concurrent kernels using CUDA … [Read more...]

Comparing Managed Memory Between GPUs and Intel Xeon Phi

December 15, 2014 by Rob Farber Leave a Comment

Managed memory greatly simplifies programming GPUs and Intel Xeon Phi coprocessors (when used in offload mode) because data can be utilized on either the host or the device without having to perform explicit device transfers. Instead the device(s) and host interact through the device driver to transparently migrate data as needed. As a result, application codes tend to be … [Read more...]

SC14 Technical Program and Registration – XSEDE/TACC Resources for Farber Tutorial

July 28, 2014 by Rob Farber Leave a Comment

Register early for Supercomputing 2014 in New Orleans and save up to $275. View the Technical Program online (and register for our tutorial!) The Technical Program fee includes  admission to all conference sessions, exhibits, the Monday night Exhibits opening event, Thursday night event, and one copy of the SC14 proceedings. Click here to view the grid showing access to … [Read more...]

Part 2: No Idle Time CUDA Task Parallelism Across Eight GPUs

July 25, 2014 by Rob Farber Leave a Comment

Part 1 in this tutorial series showed that task-based parallelism using concurrent kernels can accelerate applications simply by plugging more GPUs into a system - just as the GPU strong scaling execution model can accelerate applications simply by installing a newer GPU containing more SMX (Streaming Multiprocessors). No recompilation required! NVIDIA nvvp timelines show very … [Read more...]

Part 1: Load-Balanced, Strong-Scaling Task-Based Parallelism on GPUs

July 9, 2014 by Rob Farber Leave a Comment

Achieve a 7.4x speedup with 8 GPUs over the performance of a single GPU through the use of task-based parallelism and concurrent kernels! Traditional GPU programming  typically views the GPU as a monolithic device that runs a single parallel kernel across the entire device. This approach is fantastic when one kernel can provide enough work to keep the GPU busy. The conundrum is … [Read more...]

Farber to Teach All-Day Tutorial At Supercomputing Nov 16 2014

June 25, 2014 by Rob Farber Leave a Comment

Supercomputing 2014 recently approved my proposal for an all-day class "From 'Hello World' to Exascale Using x86, GPUs and Intel Xeon Phi Coprocessors" (tut106s1), at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC14). I hope to see you on Sunday November 16, 2014 in New Orleans,! Abstract Both GPUs and Intel Xeon Phi … [Read more...]

Combine C-Sharp With CUDA and OpenCL On Linux, iOS, Android and Windows

June 17, 2014 by Rob Farber Leave a Comment

Google Protobufs (via protobuf-net) in combination with the click-together framework taught in my CUDA and OpenCL tutorials allows C# and .NET programmers to include Linux and Windows GPU and Intel Xeon Phi codes in their workflows. Mono The freely available opensource mono-project creates C# executables that can run unchanged on both Linux and Windows - just copy the … [Read more...]

Next Page »

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Recovering Speech from a Potato-chip Bag Viewed Through Soundproof Glass - Even With Commodity Cameras!
  • DARPA Goals, Requirements, and History of the SyNAPSE Project
  • Sparse matrix-vector multiplication: parallelization and vectorization
  • Gender Diversity Study "In Science, It Matters That Women Come Last"
  • Paper Compares AMD, NVIDIA, Intel Xeon Phi CFD Turbulent Flow Mesh Performance Using OpenMP and OpenCL

Archives

© 2025 · techenablement.com