Author Kerry Evans writes in his High Performance Parallelism Pearls chapter, “There are many facets to performance optimization but three issues to deal with right from the beginning are memory access, vectorization, and parallelization. Unless we can optimize these, we cannot achieve peak performance.” Specifically, this chapter examines a method of mapping multidimensional data into a single dimension while maintaining locality using Morton, or Z-curve ordering and the effects it has on performance of two common linear algebra problems: matrix transpose and matrix multiply. Also, the transpose and multiply codes are tuned to take advantage of the Intel Xeon and Intel Xeon Phi coprocessor cache, vector hardware, and threading.
A Morton ordering maps multidimensional data to one dimension while preserving locality of the data points. It was introduced in 1966 by G. M. Morton.
Matrix transpose results
Matrix multiply results
Kerry is a software engineer working primarily with customers on optimization of medical imaging software on Intel Xeon processors and Intel Xeon Phi coprocessors.
Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of High Performance Parallelism Pearls.