• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Archives for Xeon Phi / Tutorials

Learn to Make Windows 10 Apps with Free Microsoft Course Then Add GPU Acceleration!

October 1, 2015 by Rob Farber Leave a Comment

Free Windows courses by themselves are not newsworthy, but those who wish to create Windows 10 apps for the Windows Marketplace - AND exploit the power of CUDA and OpenCL computing via C# should find the Free Microsoft course in combination with the TechEnablement tutorial "Combine C-Sharp With CUDA and OpenCL On Linux, iOS, Android and Windows" an enabling pair of … [Read more...]

Intel Xeon Phi Optimization Part 1 of 3: Multi-Threading and Parallel Reduction

June 2, 2015 by admin Leave a Comment

This tutorial begins a 3-part series of educational publications on performance optimization in applications for Intel Xeon Phi coprocessors. In this publication, Ryo Asai (a Researcher at Colfax International) and Andrey Vladimirov (Head of HPC Research at Colfax International) will focus on some aspects of thread parallelism implementation in the OpenMP … [Read more...]

Port Some CUDA Codes To Intel Xeon Phi Simply and Efficiently

May 15, 2015 by Rob Farber Leave a Comment

This tutorial shows that it relatively easy to port many CUDA C/C++ source codes to OpenMP. In the past, such efforts were not generally considered worthwhile because of the large performance difference between multicore processors (that use OpenMP) and GPUs. The introduction of teraflop/s Intel Xeon Phi coprocessors eliminated that performance difference, which makes it much … [Read more...]

Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors

February 2, 2015 by Rob Farber Leave a Comment

Andrey Vladimirov at ColFax International has posted source code and a paper, "Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices" on the ColFax site. Andrey notes, "Benchmarks show that the discussed optimizations improve the application performance on the coprocessor by a factor of 2.8 compared to the unoptimized … [Read more...]

Comparing Managed Memory Between GPUs and Intel Xeon Phi

December 15, 2014 by Rob Farber Leave a Comment

Managed memory greatly simplifies programming GPUs and Intel Xeon Phi coprocessors (when used in offload mode) because data can be utilized on either the host or the device without having to perform explicit device transfers. Instead the device(s) and host interact through the device driver to transparently migrate data as needed. As a result, application codes tend to be … [Read more...]

The Unabridged Chapter 1 Introduction To High Performance Parallelism Pearls

October 3, 2014 by Rob Farber Leave a Comment

Following is the full, unabridged text of the chapter 1 introduction  (written by James Reinders) to High Performance Parallelism Pearls. Thanks to Morgan Kaufmann, James Reinders, and Jim Jeffers for giving permission so TechEnablment can make this available. After reading what James wrote, you will see that summarizing the introduction would simply have left out too much … [Read more...]

Shared Memory is Simple on Intel Xeon Phi – supports STL!

September 2, 2014 by Rob Farber Leave a Comment

Shared memory on Intel Xeon Phi, in OpenCL, and CUDA (via managed memory) greatly simplifies programming by eliminating the need to explicitly define all data transfers between host and device memory. Once these implementations mature, it is likely they will become the standard API that programmers use to access data on both Intel Xeon Phi and GPUs. (They also naturally support … [Read more...]

Farber to Teach All-Day Tutorial At Supercomputing Nov 16 2014

June 25, 2014 by Rob Farber Leave a Comment

Supercomputing 2014 recently approved my proposal for an all-day class "From 'Hello World' to Exascale Using x86, GPUs and Intel Xeon Phi Coprocessors" (tut106s1), at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC14). I hope to see you on Sunday November 16, 2014 in New Orleans,! Abstract Both GPUs and Intel Xeon Phi … [Read more...]

Pragma Puzzler – Ambiguous Loop Trip Count in OpenMP and OpenACC

May 23, 2014 by Rob Farber Leave a Comment

Pragma-based programming can be described as a "negotiation" with the compiler where the compiler has to assume corner-cases that are not apparent to the programmer. So why does the loop count in the OpenMP and OpenACC article, "A First Transparent OpenACC C++ Class"  have to be assigned to a separate variable to generate a parallel … [Read more...]

Intel Xeon Phi for CUDA Programmers

April 16, 2014 by Rob Farber Leave a Comment

Both GPU and Xeon Phi coprocessors provide high degrees of parallelism that can deliver excellent application performance. For the most part, CUDA programmers with existing application code have already written their software so it can run well on Phi coprocessors. The key to performance lies in understanding the differences between these two architectures. Author's note: To … [Read more...]

Next Page »

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Face It: AI Gets Personal to Make You Look Better!
  • CUDA Study Guide
  • Apache Spark Claims 10x to 100x Faster than Hadoop MapReduce
  • PyFR: A GPU-Accelerated Next-Generation Computational Fluid Dynamics Python Framework
  • Paper Compares AMD, NVIDIA, Intel Xeon Phi CFD Turbulent Flow Mesh Performance Using OpenMP and OpenCL

Archives

© 2023 · techenablement.com