• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Tutorials / Programming Intel’s Xeon Phi: A Jumpstart Introduction

Programming Intel’s Xeon Phi: A Jumpstart Introduction

April 15, 2014 by Rob Farber Leave a Comment

Reaching one teraflop on Intel’s new 60-core coprocessor requires a little know-how

 First printed December 10, 2012 on Dr. Dobbs ( link )

Developers can reach a teraflop/s of  number crunching power via one of several routes:

  • Using pragmas to augment existing codes so they offload work from the host processor to the Intel Xeon Phi coprocessors(s)
  • Recompiling source code to run directly on coprocessor as a separate many-core Linux SMP compute node
  • Accessing the coprocessor as an accelerator through optimized libraries such as the Intel MKL (Math Kernel Library)
  • Using each coprocessor as a node in an MPI cluster or, alternatively, as a device containing a cluster of MPI nodes.

From this list, experienced programmers will recognize that the Phi coprocessors support the full gamut of modern and legacy programming models. Most developers will quickly find that they can program the Phi in much the same manner that they program existing x86 systems. The challenge lies in expressing sufficient parallelism and vector capability to achieve high floating-point performance, as the Intel Xeon Phi coprocessors provide more than an order of magnitude increase in core count over the current generation quad-core processors. Massive vector parallelism is the path to realize that high performance.

The focus of this first article is to get up and running on Intel Xeon Phi as quickly as possible. Complete working examples will show that only a single offload pragma is required to adapt an OpenMP square-matrix multiplication example to run on a Phi coprocessor. Performance comparisons demonstrate that both the pragma-based offload model and using Intel Xeon Phi as an SMP processor compare favorably against the MKL library optimized for the host, and that the optimized Phi MKL library can easily deliver over a teraflop.

Share this:

  • Twitter

Filed Under: Tutorials, Tutorials, Xeon Phi

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • High Performance Ray Tracing With Embree On Intel Xeon Phi
  • MultiOS Gaming, Media, and OpenCL Using XenGT Virtual Machines On Shared Intel GPUs
  • Intel Xeon Phi Study Guide
  • Free Intermediate-Level Deep-Learning Course by Google

Archives

© 2025 · techenablement.com