Featured article Archives - Page 17 of 21

Integrating Intel Xeon Phi Coprocessors into a Cluster Environment

October 23, 2014 by Rob Farber Leave a Comment

The chapter authors build on the standard Intel MPSS documentation that provides the information required for workstation installs, but does not provide techniques needed for successful deployment in a cluster environment. Based on multiple authors' many years of experience managing HPC clusters and specific experience with the Intel Xeon Phi coprocessor family since the … [Read more...]

Power Analysis on the Intel Xeon Phi Coprocessor

October 22, 2014 by Rob Farber Leave a Comment

Power has become the limiting factor today on how far we can scale an HPC cluster today. Some cluster installations today are running upwards of 20,000,000 watts (20MW) of power to solve large HPC applications. Power has now taken center-stage as a key challenge we need to address in order to scale a cluster to new levels of high performance. The chapter author, Claude J. … [Read more...]

Heterogeneous Computing with MPI On Intel Xeon Phi

October 21, 2014 by Rob Farber Leave a Comment

The chapter authors discuss the hardware heterogeneity found in modern clusters and then analyze a typical Intel Xeon Phi coprocessor accelerated node on the Stampede cluster at TACC, with an eye towards how MPI is used in similar clusters, and the positioning an MPI task within the node. The performance through different communication pathways is highlighted using micro … [Read more...]

Concurrent Kernel Offloading On Intel Xeon Phi

October 20, 2014 by Rob Farber Leave a Comment

Chapter 12 of High Performance Parallelism Pearls discusses optimizing performance when offloading concurrent kernels (e.g. task-parallelism) to the Intel Xeon Phi coprocessor. The authors state, "Our ultimate optimization target in this chapter is to improve the computational throughput of multiple small-scale workloads on the Intel Xeon Phi coprocessor by concurrent kernel … [Read more...]

Dynamic Load Balancing using OpenMP 4.0

October 17, 2014 by Rob Farber Leave a Comment

Gilles Civario and Michael Lysaght from ICHEC show how to take advantage of the OpenMP 4.0 standard on Xeon and Intel Xeon Phi coprocessors to portably and efficiently maximize an N-body kernel on the entire available hardware. The chapter authors point out that the sample code can be used as a template applicable for countless of real live codes. By adapting this template to … [Read more...]

N-body Methods on Intel Xeon Phi Coprocessors

October 16, 2014 by Rob Farber Leave a Comment

The chapter authors (Rio Yokota and Mustafa Abdul Jabbar) achieve roughly 1.5 TF/s single-precision performance when running an optimized direct N-body kernel on an Intel Xeon Phi coprocessor. This level of performance was achieved through the use of OpenMP, SIMD directives, and _mm512 intrinsics. The authors note the strong scalability of the execution was close to ideal, … [Read more...]

A Many-Core Implementation Of The Direct N-body Problem

October 15, 2014 by Rob Farber Leave a Comment

Chapter 9 of High Performance Parallelism Pearls presents several optimizations that are usually necessary to obtain good performance on an Intel Xeon Phi coprocessor that include: introducing a softening factor, exploring the impact of single- vs. double-precision, Improving tililing, utilizing an SoA (Structure of Arrays) layout, generating code that does not maintain IEEE … [Read more...]

Optimizing Gather/Scatter Patterns On Intel Xeon Phi

October 14, 2014 by Rob Farber Leave a Comment

Many modern microarchitectures rely on single-instruction multiple-data (SIMD) execution to provide high compute capabilities in an energy efficient manner. Such microarchitectures including those employed by the most recent Intel Xeon processors and Intel Xeon Phi coprocessors are optimized and/or better suited to dealing with contiguous loads and stores than non-contiguous … [Read more...]

Deep-Learning And Numerical Optimization

October 13, 2014 by Rob Farber Leave a Comment

The massively parallel mapping and code described in this chapter is generic and can be applied to a broad spectrum of numerical optimization and machine-learning algorithms ranging from neural networks to support vector machines to expectation maximization and independent components analysis. Many of these techniques are heavily used in lucrative data-mining and social media … [Read more...]

Parallel Evaluation Of Fault Tree Expressions

October 10, 2014 by Rob Farber Leave a Comment

Readers are guided through a progression from a scalar fault tree code to one mapped effectively to Intel Xeon Phi with the open-source ispc (Intel SPMD Program Compiler). Fault trees express failure relationships between systems using Boolean logic to evaluate the vulnerability of systems based on component reliability, system redundancy, physical protection, and other — … [Read more...]

« Previous Page