The newest book by James Reinders and Jim Jeffers, “High Performance Parallelism Pearls” distills the experience of sixty-nine HPC experts into twenty-eight chapters designed to teach the world about the performance capabilities of the massively-parallel Intel® Xeon Phi™ family of products. Source code for numerous working examples selected for their educational content, applicability and success – along with all figures – can be downloaded for self-study or use in the classroom. Published by Morgan Kaufmann, the Reinders/Jeffers book will be available by November 13, 2014 for purchase at Supercomputing 2014, online, and via retail outlets.
Learning from others is what “High Performance Parallelism Pearls” is all about. Not content with merely writing about their research, the chapter authors also provide working examples from their field of research that readers can build and run on both Intel Xeon Phi and processors. Over the next few weeks TechEnablement will highlight individual chapters along with the reasoning why each chapter is considered a “pearl”. The book spans a wide-range of applied application areas including Finance, Finite Element Analysis, Computational Fluid Dynamics, Computational Chemistry, Deep-learning, N-body simulation, and topics like power consumption, performance tuning, and code optimization. The idea is to teach as many people as possible about writing highly scalable programs that are useful for multicore processors and many-core devices (Intel Xeon Phi coprocessors).
The penetration into the HPC arena by the Knights Corner-based Intel Xeon Phi coprocessors has been staggering since they were introduced in November 2012. As noted in the first chapter,
“By mid—2013, the cumulative number of FLOPs contributed by Intel Xeon Phi coprocessors in TOP 500 machines exceeded the combined FLOPs contributed by all the graphics processing units (GPUs) installed as floating-‐point accelerators in the TOP 500 list. In fact, the only device type contributing more FLOPs to TOP 500 supercomputers were Intel Xeon processors.”
Announced procurements such as the Cori supercomputer at NERSC and the 42 PF/s NNSA Trinity supercomputer indicate that the forthcoming Knights Landing-based Intel Xeon Phi family will likely be even more successful.
High Performance Parallelism Pearls, 1st Edition
- Introduction
- Towards an efficient Godunov’s scheme on Phi
- Better Concurrency and SIMD on HBM
- Case Study: Analyzing and Optimizing Concurrency
- Plesiochronous Phasing Barriers
- Parallel Evaluation of Fault Tree Expressions
- Deep-learning and Numerical Optimization
- Optimizing Gather/Scatter Patterns
- A Many-Core Implementation of the Direct N-body Problem
- N-body Methods on Intel® Xeon Phi™ Coprocessors
- Dynamic Load Balancing using OpenMP 4.0
- Concurrent Kernel Offloading
- Heterogeneous Computing with MPI
- Power Analysis on the Intel® Xeon Phi™ Coprocessor
- Integrating Intel Xeon Phis into a Cluster
- Native File systems
- NWChem: Quantum Chemistry Simulations at Scale
- Efficient nested parallelism on large scale system
- Performance optimization of Black-Scholes pricing
- Host and Coprocessor Data Transfer through the COI
- High Performance Ray Tracing with Embree
- Portable and Perform with OpenCL
- Characterization And Optimization Methodology Applied To Stencil Computations
- Profiling-guided optimization of cache performance
- Heterogeneous MPI optimization with ITAC
- Scalable Out-of-core Solvers on a Cluster
- Sparse matrix-vector multiplication: parallelization and vectorization
- Morton Order Improves Performance
CERN Honorary Staff Member and Industry Luminary Sverre Jarp wrote in the Foreword:
“You might ask: Why we [CERN] were so enthusiastic? What we saw in the design of the Intel Xeon Phi included an excellent vector instruction set architecture (ISA). By having vector mask registers separate from the vector data registers, the architecture was able to handle both data flow and control flow in programs in a much more optimal way. Intel has been pleased by strong community enthusiasm; their recent announcement of the AVX-‐512 instruction set will push this vector architecture into their multicore processors as well. “
Sverre’s words are prophetic as AVX-512 moves into new Intel Xeon designs while it is likely that plug-and-play Intel Xeon Phi devices will appear that only require a network (or IB) connection plus power to “Voilà!” simultaneously bring online a standalone SMP system and MPI compute node.
Bookmark this article and watch our twitter and RSS feeds as the TechEnablement review of each chapter will be hyperlinked here after publication.
Chapter Authors
James is involved in multiple engineering, research and educational efforts to increase use of parallel programming throughout the industry. He joined Intel Corporation in 1989, and has contributed to numerous projects including the world’s first TeraFLOP/s supercomputer (ASCI Red) and the world’s first TeraFLOP/s microprocessor (Intel® Xeon Phi™ coprocessor). James been an author on numerous technical books, including VTune™ Performance Analyzer Essentials (Intel Press, 2005), Intel® Threading Building Blocks (O’Reilly Media, 2007), Structured Parallel Programming (Morgan Kaufmann, 2012), Intel® Xeon Phi™ Coprocessor High Performance Programming (Morgan Kaufmann, 2013), Multithreading for Visual Effects (A K Peters/CRC Press, 2014), and High Performance Parallelism Pearls – Multicore and Many-core Programming Approaches (Morgan Kaufmann, 2014).
Jim Jeffers is a Principal Engineer and Engineering Manager in Intel’s Technical Computing Group. Jim joined Intel in 2008 focusing on many-core Intel® Xeon Phi™ product family development and parallel computing. Jim has over 25 years software design and technical leadership experience for high performance computing, visual computing, digital television, and data communications. His prior work includes high performance graphics device driver development, contributions to Microsoft’s DirectX design, video streaming and conferencing products, and development of the virtual image insertion television technology behind American football’s “Electronic First Down Line”. Jim is the coauthor, with James Reinders, of “Intel® Xeon Phi™ Coprocessor High Performance Programming” (Morgan Kaufmann, 2013) and “High Performance Parallelism Pearls – Multicore and Many-core Programming Approaches” (Morgan Kaufmann, 2014). Jim currently leads Intel’s technical computing visualization engineering team. Jim has 3 granted US patents.
Leave a Reply