• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Featured article / Power Profiling Shows Simple Changes To Save Megawatts of Power On Leadership Supercomputers

Power Profiling Shows Simple Changes To Save Megawatts of Power On Leadership Supercomputers

February 10, 2015 by Rob Farber Leave a Comment

A challenge with profiling applications lies in how to interpret the profile results. In particular, most programmers do not give the power profile plots more than a cursory glance. Following is an example waterfall plot showing the power utilization for an NWChem run on Intel Xeon Phi coprocessors:

waterfall_plot-fs8

My recent column in Scientific Computing, “Using Profile Information for Optimization, Energy Savings and Procurements“ notes that profiling is a big-data task, but one where the rewards can be significant — including potentially saving megawatts of power on a leadership class system and/or reducing the time to solution so more scientists can utilize these precious resources.

For example, the current fastest supercomputer in the world, the 33 PF/s RMAX (54 PF/s RPEAK) Tianhe-2 supercomputer, achieves its number one ranking through the use of 48,000 Intel Xeon Phi 31S1P coprocessors. This system has a peak energy consumption of 24 megawatts (million watts). Instrumenting such systems at scale is important to understanding application power efficiency as even simple application configuration and software changes have the potential to save literally megawatts of power. Even a 20 watt energy savings per Intel Xeon Phi coprocessor translates to a megawatt of power savings for applications that use all the Tianhe-2 devices.

A number of power saving studies on Intel Xeon Phi exist in the literature.

For example the paper, “Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi“, by Lawson et. al. measured energy consumption as a function of thread affinity and number of threads on an Intel Xeon Phi. Thread affinity and thread count on Intel Xeon Phi coprocessors is easily accomplished by simply defining a couple shell environment variables – no code modifications are required. The Lawson paper showed that “varying thread affinity may improve both performance and energy, which is the most apparent under the compact affinity tests when the number of threads is larger than three per core. The energy savings reached as high as 48% for the CG NAS benchmark”.

Other savings can be significant if not so dramatic – especially when multiplied by the Tianhe-2 48,000 Intel Xeon Phi coprocessors. Shao and Brooks investigated the Linpack benchmark suite using an instruction-level energy model. They observed increases in energy efficiency as high as 10% on Linpack and between 1% to 5% on real applications. A microbenchmarking study by Choi et al. found that the Intel Xeon Phi offers energy benefits to highly irregular data processing workloads. Apparently the Xeon Phi requires an order of magnitude less energy per access during random memory access operations, which is a boon for sparse matrix and graph algorithms.

Share this:

  • Twitter

Filed Under: Featured article, Featured news, News, News, Xeon Phi Tagged With: HPC

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • MultiOS Gaming, Media, and OpenCL Using XenGT Virtual Machines On Shared Intel GPUs
  • High Performance Ray Tracing With Embree On Intel Xeon Phi
  • Intel Xeon Phi Study Guide
  • Free Intermediate-Level Deep-Learning Course by Google

Archives

© 2025 · techenablement.com