• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Featured article / Morton Order Improves Performance

Morton Order Improves Performance

November 11, 2014 by Rob Farber Leave a Comment

Author Kerry Evans writes in his High Performance Parallelism Pearls  chapter, “There are many facets to performance optimization but three issues to deal with right from the beginning are memory access, vectorization, and parallelization. Unless we can optimize these, we cannot achieve peak performance.” Specifically, this chapter examines a method of mapping multidimensional data into a single dimension while maintaining locality using Morton, or Z-curve ordering and the effects it has on performance of two common linear algebra problems: matrix transpose and matrix multiply. Also, the transpose and multiply codes are tuned to take advantage of the Intel Xeon and Intel Xeon Phi coprocessor cache, vector hardware, and threading.

Cover3D-fs8

A Morton ordering maps multidimensional data to one dimension while preserving locality of the data points. It was introduced in 1966 by G. M. Morton.

c28_morton_order

Morton order for 8 × 8 2D grid, subpartitioned into 4 × 4 blocks. (Figure courtesy Morgan Kaufmann)

  Matrix transpose results

C28_matrix_transpose_MIC

Intel Xeon Phi Matrix transpose results. (Figure courtesy Morgan Kaufmann)

c28_matrix_transpose_Xeon

Xeon Matrix transpose results (Courtesy Morgan Kaufmann)

  Matrix multiply results

Intel Xeon Phi matrix multiplication results. (Figure courtesy Morgan Kaufmann)

Intel Xeon Phi matrix multiplication results. (Figure courtesy Morgan Kaufmann)

Xeon matrix multiply performance results. (Figure courtesy Morgan Kaufmann)

Xeon matrix multiply performance results. (Figure courtesy Morgan Kaufmann)

Chapter Author

Kerry Evans

Kerry Evans

Kerry is a software engineer working primarily with customers on optimization of medical imaging software on Intel Xeon processors and Intel Xeon Phi coprocessors.

Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of High Performance Parallelism Pearls.

Share this:

  • Twitter

Filed Under: Featured article, Featured news, News Tagged With: HPC, Intel, Intel Xeon Phi, x86

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Acer K1-powered Chromebook $279 for Pre-Order - Dual-boot Linux?
  • Tenure Track Position in Computer Science at Cal Poly San Luis Obispo
  • Unrestricted Pixar Renderman Free for Non-commercial Use - $495 Otherwise
  • Learn to Make Windows 10 Apps with Free Microsoft Course Then Add GPU Acceleration!
  • Monetizing Image Recognition By Looking at the Background

Archives

© 2026 · techenablement.com