• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Featured article / Dynamic Load Balancing using OpenMP 4.0

Dynamic Load Balancing using OpenMP 4.0

October 17, 2014 by Rob Farber Leave a Comment

Gilles Civario and Michael Lysaght from ICHEC show how to take advantage of the OpenMP 4.0 standard on Xeon and Intel Xeon Phi coprocessors to portably and efficiently maximize an N-body kernel on the entire available hardware. The chapter authors point out that the sample code can be used as a template applicable for countless of real live codes. By adapting this template to their algorithm, scientists and code developers alike will unleash the full potential of the hardware they have access to in a very simple and straightforward manner.

Cover3D-fs8

A simple and portable method is introduced to dynamically balance workload between all the computing resources available on a heterogeneous platform, including the Intel Xeon Phi coprocessors. Speed-ups of up to 4x in single precision and 6.5x in double precision, relative to using the two conventional Xeon processors on their own. Moreover, the method, using only a handful of standard OpenMP 4.0 features throughout, is perfectly portable and future proof. It should be emphasized that the dynamic load-balancing method we described here can be trivially adapted to all sorts of computational problems where the Intel Xeon Phi coprocessor can be exploited, which should make it of interest to a wide community of developers.

Speed‐up (in single and double precision) of several versions of our N-Body code relative to two Xeon E5‐2660 v2 Ivy Bridge processors (higher is better)

Speed‐up (in single and double precision) of several versions of an N-Body code relative to two Xeon E5‐2660 v2 Ivy Bridge processors (higher is better). (courtesy Morgan Kaufmann)

Chapter Authors

Gilles Civario

Gilles Civario

Gilles Civario joined ICHEC in June 2008 where he is now a Senior Software Architect. His main role is to design and implement tailored hardware and software solutions to users of the National Service and to ICHEC’s technology transfer client companies. Gilles is also involved in the broader aspects of the Centre’s mission where his expertise is valuable in areas such as code installation, debugging or optimisation and hardware evaluation. He is also particularly involved all aspects related to novel architectures and their programming languages. As such Gilles has gained a wide recognition from industry developers and users alike, for driving forward the development of novel architectures, such as NVIDIA CUDA-enabled GPUs and Intel Xeon Phi coprocessors. Gilles holds two Master degrees in Scientific Computing and Applied Mathematics from the University of Franche Conté, France.

Michael Lysaght

Michael Lysaght

Michael Lysaght leads the Novel Technologies Activity and the Intel Parallel Computing Centre at the Irish Centre for High End Computing (ICHEC), where he has a particular focus on supporting the Irish scientific user community and Irish industry in the exploitation of emerging multi-/many- core technologies. In conjunction with his role at ICHEC, Michael also leads the WP7 ‘Exploitation of HPC Tools and Techniques’ activity as part of the EU’s PRACE 3IP project. Michael joined ICHEC in 2011 after working in the UK as a HPC application expert as part of HECToR’s distributed Computational Science and Engineering program, where he worked on re-factoring and optimising community codes for the UK research community. Prior to this he worked for three years as a UK EPSRC Postdoctoral Research Fellow in theoretical atomic physics at Queen’s University Belfast, where he pioneered the development of Time-Dependent R-Matrix Theory and associated parallel applications including the TDRM and RMT codes. Michael obtained his PhD in physics in 2006 from University College Dublin.

Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of High Performance Parallelism Pearls.

Share this:

  • Twitter

Filed Under: Featured article, Featured news, News, News, Xeon Phi Tagged With: HPC, Intel, Intel Xeon Phi, x86

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Rob Farber
  • NVIDIA GTC'17 announcements make them a complete 'soup to nuts' solution for specialized deep-learning applications
  • Altera OpenCL Programmable FPGA Talks QPI, HMC, and 100G Optical Interconnect
  • Facebook Open Source GPU FFT 1.5x Faster Than NVIDIA CUFFT
  • Intel Xeon Phi Study Guide

Archives

© 2026 · techenablement.com