• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Featured article / Optimizing for Reacting Navier‐Stokes Equations

Optimizing for Reacting Navier‐Stokes Equations

October 8, 2014 by Rob Farber Leave a Comment

Antonio Valles and Weiqun Zhang note the optimizations discussed in their High Performance Parallelism Pearls chapter that, “significantly improved concurrency on both Intel Xeon Phi coprocessors and Intel Xeon processors” by transforming a fine-grain thread parallel approach to a more coarse-grain, memory allocation considerate approach plus improving vectorization. They observe that many applications are not properly optimized to take advantage of so many hybrid  (MPI + OpenMP) hardware threads per node. In addition, their chapter briefly demonstrates how new features in VTune Amplifier XE are used for OpenMP analysis on the LBNL minimalist SMC combustion code that acts as a computational proxy for the full version that solves the multicomponent, reacting, compressible Navier-­‐Stokes equations.

Cover3D-fs8

Specifically, the chapter code optimizations start with the adoption of a coarse-grained OpenMP approach (referred to as a ThreadBox) followed by two optimizations (stack allocation and blocking) to account for  side-effects plus an additional form of optimization that restructured loops for SIMD vectorization in Fortran.

SMC Simulation

SMC Simulation (courtesy Morgan Kaufmann)

Chapter Authors

Weiqun Zhang

Weiqun Zhang

Weiqun Zhang is a member of the Center for Computational Sciences and Engineering at Lawrence Berkeley National Laboratory.  He received his B.S. in physics from the University of Science and Technology in China, and his Ph.D. in astronomy and astrophysics from the University of California, Santa Cruz.  His research interests lie in high-performance computing, numerical methods for partial differential equations, and applications to science and engineering fields including combustion and astrophysics.

Antonio Valles

Antonio Valles

Antonio Valles is Senior Software Engineer at Intel Corporation currently focused on performance analysis and optimizations for the Intel Xeon Phi coprocessors. Antonio has analyzed and optimized software at Intel since 1997 spanning client, mobile, and HPC segments. Antonio loves to code and has written multiple internal post-Si and pre-Si tools to help analyze and optimize applications. He received his BS in Electrical Engineering at Arizona State University in 1997.

Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of High Performance Parallelism Pearls.

Share this:

  • Twitter

Filed Under: Featured article, Featured news, News, News, Xeon Phi Tagged With: HPC, Intel, Intel Xeon Phi, x86

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Face It: AI Gets Personal to Make You Look Better!
  • CUDA Study Guide
  • Apache Spark Claims 10x to 100x Faster than Hadoop MapReduce
  • PyFR: A GPU-Accelerated Next-Generation Computational Fluid Dynamics Python Framework
  • Paper Compares AMD, NVIDIA, Intel Xeon Phi CFD Turbulent Flow Mesh Performance Using OpenMP and OpenCL

Archives

© 2023 · techenablement.com