• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Archives for Tutorials

Port Some CUDA Codes To Intel Xeon Phi Simply and Efficiently

May 15, 2015 by Rob Farber Leave a Comment

This tutorial shows that it relatively easy to port many CUDA C/C++ source codes to OpenMP. In the past, such efforts were not generally considered worthwhile because of the large performance difference between multicore processors (that use OpenMP) and GPUs. The introduction of teraflop/s Intel Xeon Phi coprocessors eliminated that performance difference, which makes it much … [Read more...]

PGI Compiled OpenACC ILP Loop Beats CUDA-7 by 200 GF/s on Deep-learning PCA Example

March 23, 2015 by Rob Farber Leave a Comment

The PGI OpenACC compiler beat the performance of a CUDA 7.0 NVIDIA nvcc compiled deep-learning based PCA (Principal Components Analysis) example by 200 GF/s on a K40c using an ILP (Instruction Level Parallelism) loop structure taught in the TechEnablement classes and forthcoming Farber OpenACC book. PCA is an important data analysis tool utilized by data scientists. Sign up for … [Read more...]

OpenCL SPIR Tutorial Teaches Portability Without Shipping Kernel Source

March 1, 2015 by Rob Farber Leave a Comment

Intel has released an OpenCL tutorial showing how developers can use SPIR (Standard Portable Intermediate Representation) to preserve vendor and device portability without having to ship OpenCL kernel source code. For more information about how SPIR enables commercial OpenCl applications, see our article, "Commercial OpenCL! SPIR 2.0 Protects IP Yet Allows Powerful, Portable, … [Read more...]

Tutorial on the OpenCL 2.0 Generic Address Space

February 11, 2015 by Rob Farber Leave a Comment

Adam Lake and  Robert Ioffe posted a nice tutorial on the Intel website about the new OpenCL 2.0 generic address space. The OpenCL 2.0 generic address space makes writing OpenCL programs easier by removing the requirement of decorating all pointers with a points to address space. Instead, OpenCL programmers just use pointers as they would in standard C. Utilizing this new … [Read more...]

Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors

February 2, 2015 by Rob Farber Leave a Comment

Andrey Vladimirov at ColFax International has posted source code and a paper, "Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices" on the ColFax site. Andrey notes, "Benchmarks show that the discussed optimizations improve the application performance on the coprocessor by a factor of 2.8 compared to the unoptimized … [Read more...]

Kriging Interpolation Exhibits Strong Scaling Across GPUs

January 17, 2015 by Rob Farber Leave a Comment

Geostatistical interpolation (Kriging) can be useful in a great number of applications where high fidelity models are required for mapping spatial effects and making predictions based on observations. It is widely utilized in the domain of spatial analysis and computer experiments and heavily used by the US  Air Force and GIS services. The following images by Yang, et. al. … [Read more...]

CreativeC GPU And Intel Xeon Phi Cluster For SC14 Class Runs Mobile In Van

November 14, 2014 by Rob Farber Leave a Comment

Our all-day class at SC14 on Sunday November 16, “From ‘Hello World’ to Exascale Using x86, GPUs and Intel Xeon Phi Coprocessors” (tut106s1) received more than double our expected enrollment! Students will be able to run on both Intel Xeon Phi and GPU supercomputers at TACC via an Xsede allocation (thank you very much) and on a CreativeC supercomputer and visualization cluster … [Read more...]

Mix OpenACC and CUDA (including Thrust)

September 4, 2014 by Rob Farber Leave a Comment

The NVIDIA Parallel ForAll blog shows how to mix OpenACC and CUDA (including Thrust)  with the host_data construct, the deviceptr clause, and the acc_map_data() API function. … [Read more...]

Shared Memory is Simple on Intel Xeon Phi – supports STL!

September 2, 2014 by Rob Farber Leave a Comment

Shared memory on Intel Xeon Phi, in OpenCL, and CUDA (via managed memory) greatly simplifies programming by eliminating the need to explicitly define all data transfers between host and device memory. Once these implementations mature, it is likely they will become the standard API that programmers use to access data on both Intel Xeon Phi and GPUs. (They also naturally support … [Read more...]

SC14 Technical Program and Registration – XSEDE/TACC Resources for Farber Tutorial

July 28, 2014 by Rob Farber Leave a Comment

Register early for Supercomputing 2014 in New Orleans and save up to $275. View the Technical Program online (and register for our tutorial!) The Technical Program fee includes  admission to all conference sessions, exhibits, the Monday night Exhibits opening event, Thursday night event, and one copy of the SC14 proceedings. Click here to view the grid showing access to … [Read more...]

« Previous Page
Next Page »

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Guide to Get Ubuntu 14.10 Running Natively on Nvidia Shield Tablet
  • NASA Charts Path For CFD To 2030 - Projects Future Computer Technology!
  • Seven10 Storage Software Intelligently Manages Seamless Data Migration to the Cloud
  • NVIDIA GTC 2015 keynote - Near-term Roadmap is Deep-Learning
  • Learn how to program IBM's 'Deep-Learning' SyNAPSE chip

Archives

© 2023 · techenablement.com