• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Featured article / Scalable Out-Of-Core Solvers On A Cluster

Scalable Out-Of-Core Solvers On A Cluster

November 7, 2014 by Rob Farber Leave a Comment

This chapters documents the implementation of a parallel distributed memory out-of-core (OOC) solver for performing LU and Cholesky factorizations of a large dense matrix on clusters equipped with Intel Xeon Phi coprocessors.

Cover3D-fs8

The code was ported from CUDA with high-level library routines in CUBLAS

  • This matches well with the offload model for the coprocessor using the Intel Language Extensions for Offload (Intel LEO) #pragma compiler directives
  • “The translation of CUBLAS calls to using equivalent operations in MKL is quite straight-forward.”
C26 performance per MIC

Image courtesy Morgan Kaufmann

Out of core performance results are expressed via the memory ratio, ρ, between the host and device.

C26 host memory ratio

Image courtesy Morgan Kaufmann

Strong scaling study on Beacon

The processor grid is 2 ρ ∗ ρ where ρ 2 nodes are used and ρ is up to 6.

The processor grid is 2 ρ ∗ ρ where ρ 2 nodes are used and ρ is up to 6. (Image courtesy Morgan Kaufmann)

 Chapter Authors

Eduardo D’Azevedo

Eduardo D’Azevedo

Eduardo D’Azevedo is a staff scientist in the Computational Mathematics Group at the Oak Ridge National Laboratory. He obtained his B. Math in 1983, M. Math in 1985, and PhD. in 1989 all from the Department of Com- puter Science in the Faculty of Mathematics at the University of Waterloo, Ontario, Canada. D’Azevedo’s current research interests include develop- ing highly scalable parallel solvers that can take advantage of acceleration on Graphics Processor Unit and in enhancing the parallel efficiency of scien- tific applications running on leadership supercomputers. D’Azevedo has con- tributed to several projects in materials science and fusion in the Scientific Discovery through Advanced Computing (SciDAC) program. He has devel- oped the on out-of-core and compact storage extensions for the ScaLAPACK library and has made fundamental contributions in optimal mesh generation.

Ki Sing Chan

Ki Sing Chan

Ki Sing Chan is an undergraduate student at the Chinese University of Hong Kong majoring in Mathematics and Information Engineering with a minor in Computer Science. He is expected to graduate in 2015. His first research experience took place in the Oak Ridge National Laboratory in Tennessee during the summer break in 2013. The research focuses on the implementation of a Cholesky Factorization algorithm for large dense matrix. He continued the research work part-time after he went back to Hong Kong. He is working at Stanford University for a DNA sequencing research project in summer 2014. He is interested in becoming a software engineer while getting more interested with the idea of a research career.

Shi-Quan Su

Shi-Quan Su

 As a HPC Consultant, Shiquan works closely with the users of NICS from various disciplines. The offered supports include various levels, such as software installation, performance improvement. He has the specialty to help the users to migrate the codes to the novel platforms available at NICS, such as the Multiple General Purpose Graphical Processing Units (GPGPUs) Cluster. Before he joined NICS, Shiquan was a Post Doctoral Research Associate at Oak Ridge National Lab and Louisiana State University. He worked on state-of-art large scale Quantum Monte Carlo simulation for the high transition temperature superconductive materials. Shiquan obtained his Bsc in Physics from Nanjing University in 2002, Mphil in Physics from Peking University in 2005, and PhD in Physics from the Chinese University of Hong Kong in 2008.

Kwai Wong

Kwai Wong

Dr. Kwai Wong is the deputy director of the Campus Programs group at JICS. He is the director of the CFD Laboratory in the Mechanical, Aerospace, and Biomedical Engineering at the University of Tennessee, Knoxville. Kwai’s background includes numerical linear algebra, parallel computing, computa- tional fluid dynamics, and finite element method. Dr. Wong holds a BS in aerospace engineering, a MS in Math and a PhD in Engineering Science from The University of Tennessee.

Click to see the overview article “Teaching The World About Intel Xeon Phi” that contains a list of TechEnablement links about why each chapter is considered a “Parallelism Pearl” plus information about James Reinders and Jim Jeffers, the editors of High Performance Parallelism Pearls.

Share this:

  • Twitter
  • Email
  • Google

Filed Under: Featured article, Featured news, News Tagged With: HPC, Intel, Intel Xeon Phi, x86

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Newsletter

Recent Posts

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

NVIDIA GTC 2018 Shrewdly Incremental to Position NVDA Stock for Massive Growth

NVIDIA GTC 2018 Shrewdly Incremental to Position NVDA Stock for Massive Growth

April 3, 2018 By Rob Farber Leave a Comment

Face It: AI Gets Personal to Make You Look Better!

Face It: AI Gets Personal to Make You Look Better!

March 12, 2018 By admin Leave a Comment

SURFsara Achieves Accuracy and Performance Breakthroughs for Both Deep Learning and Wide Network Training

SURFsara Achieves Accuracy and Performance Breakthroughs for Both Deep Learning and Wide Network Training

November 10, 2017 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Third-Party Use Cases Illustrate the Success of CPU-based Visualization
  • PyFR: A GPU-Accelerated Next-Generation Computational Fluid Dynamics Python Framework
  • NASA Charts Path For CFD To 2030 - Projects Future Computer Technology!
  • Rob Farber
  • MSI WS60 Mobile Workstation - Awesome CUDA-Capable, Linux, and Window Mobility
  • Google+
  • Linkedin
  • Twitter

Archives

© 2018 · techenablement.com

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.