• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Featured article / Teaching The World About Intel Xeon Phi

Teaching The World About Intel Xeon Phi

September 30, 2014 by Rob Farber Leave a Comment

The newest book by James Reinders and Jim Jeffers, “High Performance Parallelism Pearls” distills the experience of sixty-nine HPC experts into twenty-eight chapters designed to teach the world about the performance capabilities of the massively-parallel Intel® Xeon Phi™ family of products. Source code for numerous working examples selected for their educational content, applicability and success – along with all figures – can be downloaded for self-study or use in the classroom. Published by Morgan Kaufmann, the Reinders/Jeffers book will be available by November 13, 2014 for purchase at Supercomputing 2014, online, and via retail outlets.

Cover3D-fs8

Learning from others is what “High Performance Parallelism Pearls” is all about.  Not content with merely writing about their research, the chapter authors also provide working examples from their field of research that readers can build and run on both Intel Xeon Phi and processors. Over the next few weeks TechEnablement will highlight individual chapters along with the reasoning why each chapter is considered a “pearl”. The book spans a wide-range of applied application areas including Finance, Finite Element Analysis, Computational Fluid Dynamics, Computational Chemistry, Deep-learning, N-body simulation, and topics like power consumption, performance tuning, and code optimization. The idea is to teach as many people as possible about writing highly scalable programs that are useful for multicore processors and many-core devices (Intel Xeon Phi coprocessors).

The penetration into the HPC arena by the Knights Corner-based Intel Xeon Phi coprocessors has been staggering since they were introduced in November 2012. As noted in the first chapter,

“By mid—2013, the cumulative number of FLOPs contributed by Intel Xeon Phi coprocessors in TOP 500 machines exceeded the combined FLOPs contributed by all the graphics processing units (GPUs) installed as floating-­‐point accelerators in the TOP 500 list. In fact, the only device type contributing more FLOPs to TOP 500 supercomputers were Intel Xeon processors.”

Announced procurements such as the Cori supercomputer at NERSC and the 42 PF/s NNSA Trinity supercomputer indicate that the forthcoming Knights Landing-based Intel Xeon Phi family will likely be even more successful.

High Performance Parallelism Pearls, 1st Edition

  1. Introduction
  2. Towards an efficient Godunov’s scheme  on Phi
  3. Better Concurrency and SIMD on HBM
  4. Case Study: Analyzing and Optimizing Concurrency
  5. Plesiochronous Phasing Barriers
  6. Parallel Evaluation of Fault Tree Expressions
  7. Deep-learning and Numerical Optimization
  8. Optimizing Gather/Scatter Patterns
  9. A Many-Core Implementation of the Direct N-body Problem
  10. N-body Methods on Intel® Xeon Phi™ Coprocessors
  11. Dynamic Load Balancing using OpenMP 4.0
  12. Concurrent Kernel Offloading
  13. Heterogeneous Computing with MPI
  14. Power Analysis on the Intel® Xeon Phi™ Coprocessor
  15. Integrating Intel Xeon Phis into a Cluster
  16. Native File systems
  17. NWChem: Quantum Chemistry Simulations at Scale
  18. Efficient nested parallelism on large scale system
  19. Performance optimization of Black-Scholes pricing
  20. Host and Coprocessor Data Transfer through the COI
  21. High Performance Ray Tracing with Embree
  22. Portable and Perform with OpenCL
  23. Characterization And Optimization Methodology Applied To Stencil Computations
  24. Profiling-guided optimization of cache performance
  25. Heterogeneous MPI optimization with ITAC
  26. Scalable Out-of-core Solvers on a Cluster
  27. Sparse matrix-vector multiplication: parallelization and vectorization
  28. Morton Order Improves Performance

CERN Honorary Staff Member and Industry Luminary Sverre Jarp wrote in the Foreword:

“You might ask: Why we [CERN] were so enthusiastic?  What we saw in the design of the Intel Xeon Phi included an excellent vector instruction set architecture (ISA).  By having vector mask registers separate from the vector data registers, the architecture was able to handle both data flow and control flow in programs in a much more optimal way.  Intel has been pleased by strong community enthusiasm; their recent announcement of the AVX-­‐512 instruction set will push this vector architecture into their multicore processors as well. “

Sverre Jarp

Sverre’s words are prophetic as AVX-512 moves into new Intel Xeon designs while it is likely that plug-and-play Intel Xeon Phi devices will appear that only require a network (or IB) connection plus power to “Voilà!” simultaneously bring online a standalone SMP system and MPI compute node.

Bookmark this article and watch our twitter and RSS feeds as the TechEnablement review of each chapter will be hyperlinked here after publication.

Chapter Authors

James Reinders

James Reinders, Parallel Programming Evangelist

James is involved in multiple engineering, research and educational efforts to increase use of parallel programming throughout the industry. He joined Intel Corporation in 1989, and has contributed to numerous projects including the world’s first TeraFLOP/s supercomputer (ASCI Red) and the world’s first TeraFLOP/s microprocessor (Intel® Xeon Phi™ coprocessor).  James been an author on numerous technical books, including VTune™ Performance Analyzer Essentials (Intel Press, 2005), Intel® Threading Building Blocks (O’Reilly Media, 2007), Structured Parallel Programming (Morgan Kaufmann, 2012), Intel® Xeon Phi™ Coprocessor High Performance Programming (Morgan Kaufmann, 2013), Multithreading for Visual Effects (A K Peters/CRC Press, 2014), and High Performance Parallelism Pearls – Multicore and Many-core Programming Approaches (Morgan Kaufmann, 2014).

Jim Jeffers

Jim Jeffers

Jim Jeffers is a Principal Engineer and Engineering Manager in Intel’s Technical Computing Group.  Jim joined Intel in 2008 focusing on many-core Intel® Xeon Phi™ product family development and parallel computing.  Jim has over 25 years software design and technical leadership experience for high performance computing, visual computing, digital television, and data communications.    His prior work includes high performance graphics device driver development, contributions to Microsoft’s DirectX design, video streaming and conferencing products, and development of the virtual image insertion television technology behind American football’s “Electronic First Down Line”.  Jim is the coauthor, with James Reinders, of “Intel® Xeon Phi™ Coprocessor High Performance Programming” (Morgan Kaufmann, 2013) and “High Performance Parallelism Pearls – Multicore and Many-core Programming Approaches” (Morgan Kaufmann, 2014). Jim currently leads Intel’s technical computing visualization engineering team.  Jim has 3 granted US patents.

Share this:

  • Twitter

Filed Under: Featured article, Featured news, News, News, Xeon Phi Tagged With: HPC, Intel, Intel Xeon Phi, NLPCA, OpenCL, OpenMP, x86

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Recovering Speech from a Potato-chip Bag Viewed Through Soundproof Glass - Even With Commodity Cameras!
  • DARPA Goals, Requirements, and History of the SyNAPSE Project
  • Call for Papers: Women in HPC at Supercomputing 2014 due July 31
  • HTML5 Progress - Confirmed Netflix Works With Chrome And Ubuntu 14.04LTS
  • TechEnablement Becomes an SC14 Media Partner

Archives

© 2025 · techenablement.com