• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Archives for x86

Data Transfer Using The Intel COI Library

October 30, 2014 by Rob Farber Leave a Comment

This short chapter gives an introduction to the Intel COI library and discusses the pros and cons of different data buffers as well as provides benchmarks on transfer latency and bandwidth between the host and the coprocessor. For any non-trivial applications, there is likely going to be a need to share data between the host and the coprocessor. These valuable information are … [Read more...]

Performance Optimization Of Black-Scholes Pricing On Intel Xeon Phi

October 29, 2014 by Rob Farber Leave a Comment

Who would have thought that a mere two hundred lines of code provide so many capabilities! The chapter authors (Iosif Meyerov, Alexander Sysoyev, Nikita Astafiev, and Ilya Burylov) apply their optimization expertise for Intel Xeon and Intel Xeon Phi to calculate the fair prices of a set of European options. They chose this the Black-Scholes calculation for the following … [Read more...]

Efficient Nested Parallelism On Large Scale Systems

October 28, 2014 by Rob Farber Leave a Comment

Choosing right threading library is critical for application performance, as different threading libraries provide significantly different performance behavior, especially when dealing with the complex computer systems as Intel Xeon Phi coprocessor and NUMA Intel Xeon processor machines. Unfortunately, choosing the right threading library is not enough, addition application … [Read more...]

NWChem Quantum Chemistry Simulations at Scale

October 27, 2014 by Rob Farber Leave a Comment

This chapter describes the performance of NWChem's CCSD(T) method running on a large-scale hybrid cluster of 460 dual-socket Xeon E5-2600 series nodes each of which is equipped with two Intel Xeon Phi 5110P coprocessor cards (a total of 62.5k hybrid cores). The chapter authors describe how, without any low-level programming, offload transfers and compute kernels have been … [Read more...]

Author Call for Volume 2 Of High Performance Parallelism Pearls

October 24, 2014 by Rob Farber Leave a Comment

James Reinders and Jim Jeffers have opened up proposal submissions for another Intel Xeon and Intel Xeon Phi Pearls book tentatively titled, High Performance Parallelism Pearls – Multicore and Many-core Programming Approaches! It is expected that the submission deadline will be March 7, 2015. Proposal submission can be made here. Don't miss this opportunity to contribute … [Read more...]

Native File Systems on Intel Xeon Phi

October 24, 2014 by Rob Farber Leave a Comment

A teraflop/s computational capability is useless without data. The Intel Xeon Phi family supports a number of file systems including Lustre, NFS, Fraunhofer BeeGFS® (formerly FHGFS), and the Panasas® PanFS® file system. The chapter author, Michael Hebenstreit, also discusses the importance of a correct network setup. He notes in his chapter summary (courtesy Morgan … [Read more...]

Integrating Intel Xeon Phi Coprocessors into a Cluster Environment

October 23, 2014 by Rob Farber Leave a Comment

The chapter authors build on the standard Intel MPSS documentation that provides the information required for workstation installs, but does not provide techniques needed for successful deployment in a cluster environment. Based on multiple authors' many years of experience managing HPC clusters and specific experience with the Intel Xeon Phi coprocessor family since the … [Read more...]

Power Analysis on the Intel Xeon Phi Coprocessor

October 22, 2014 by Rob Farber Leave a Comment

Power has become the limiting factor today on how far we can scale an HPC cluster today. Some cluster installations today are running upwards of 20,000,000 watts (20MW) of power to solve large HPC applications. Power has now taken center-stage as a key challenge we need to address in order to scale a cluster to new levels of high performance. The chapter author,  Claude J. … [Read more...]

Heterogeneous Computing with MPI On Intel Xeon Phi

October 21, 2014 by Rob Farber Leave a Comment

The chapter authors discuss the hardware heterogeneity found in modern clusters and then analyze a  typical Intel Xeon Phi coprocessor accelerated node on the Stampede cluster at TACC, with an eye towards how MPI is used in similar clusters, and the positioning an MPI task within the node. The performance through different communication pathways is highlighted using micro … [Read more...]

Concurrent Kernel Offloading On Intel Xeon Phi

October 20, 2014 by Rob Farber Leave a Comment

Chapter 12 of High Performance Parallelism Pearls discusses optimizing performance when offloading concurrent kernels (e.g. task-parallelism) to the Intel Xeon Phi coprocessor. The authors state, "Our ultimate optimization target in this chapter is to improve the computational throughput of multiple small-scale workloads on the Intel Xeon Phi coprocessor by concurrent kernel … [Read more...]

« Previous Page
Next Page »

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Accelerating the Traveling Salesman Problem with GPUs and Intel Xeon Phi
  • SC14 - Fast Hybrid GPU Betweenness Centrality Code Achieves Nearly Ideal Scaling to 192 GPUs
  • CUDA 340.29 Driver Significantly Boosts GPU Performance (100s GF/s For Machine-Learning)
  • Remote Teaching Rooms Available At SC14
  • SenseHUD $99 Heads Up Display for Cars - Pre-Order Price

Archives

© 2026 · techenablement.com