This chapter in High Performance Parallelism Pearls by Andrey Vladimirov focuses on the use of Intel VTune Amplifier XE reports to understand where to apply optimization on matrix transposition, a small and self-contained workload of great practical value. The optimization process applied to the code relies exclusively on programming in a high-level language plus utilization of the OpenMP framework. The result is portable code that can run on both CPU (processor) and MIC (coprocessor) architectures, and can be recompiled for future generations of Intel architectures.
Through VTune, the performance monitoring functionality of Intel Xeon Phi coprocessors is showcased to not only detect bottlenecks, but also point out overall performance issues and possible resolution methods. Based on the reports from the profiling tool, the programmer learns where to make changes in the source code to improve application performance. In short, this chapter demonstrates the familial relationship between Intel processors and coprocessors plus the Intel software development tools.
Andrey Vladimirov, PhD, is the Head of HPC Research at Colfax International. His primary research interest is the application of modern computing technologies to computationally demanding scientific problems. Prior to joining Colfax, Andrey was involved in theoretical astrophysics research at the Ioffe Institute (Russia), North Carolina State University, and Stanford University (USA), where he studied cosmic rays, collisionless plasmas and the interstellar medium using computer simulations.
Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of High Performance Parallelism Pearls.