• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Archives for Featured news

CreativeC GPU And Intel Xeon Phi Cluster For SC14 Class Runs Mobile In Van

November 14, 2014 by Rob Farber Leave a Comment

Our all-day class at SC14 on Sunday November 16, “From ‘Hello World’ to Exascale Using x86, GPUs and Intel Xeon Phi Coprocessors” (tut106s1) received more than double our expected enrollment! Students will be able to run on both Intel Xeon Phi and GPU supercomputers at TACC via an Xsede allocation (thank you very much) and on a CreativeC supercomputer and visualization cluster … [Read more...]

Under $200 Intel Xeon Phi

November 13, 2014 by Rob Farber Leave a Comment

For a limited time Intel is selling Intel® Xeon Phi™ Coprocessor 31S1P for under $200.  This offer is designed for Software developers to cost-effectively purchase systems or clusters from OEMs to modernize their code for greater levels of performance. See one of the OEMs at this link, or Intel your rep for eligibility requirements. Additionally, as part of this developer … [Read more...]

Morton Order Improves Performance

November 11, 2014 by Rob Farber Leave a Comment

Author Kerry Evans writes in his High Performance Parallelism Pearls  chapter, "There are many facets to performance optimization but three issues to deal with right from the beginning are memory access, vectorization, and parallelization. Unless we can optimize these, we cannot achieve peak performance.” Specifically, this chapter examines a method of mapping multidimensional … [Read more...]

Sparse matrix-vector multiplication: parallelization and vectorization

November 10, 2014 by Rob Farber Leave a Comment

The chapter authors (Albert-Jan N. Yzelman, Dirk Roose, and Karl Meerbergen) note that, "Current hardware trends lead to an increasing width of vector units as well as to decreasing effective bandwidth-per-core. For sparse computations these two trends conflict.”  For this reason they designed a usable and efficient data structure for vectorized sparse computations  on … [Read more...]

Scalable Out-Of-Core Solvers On A Cluster

November 7, 2014 by Rob Farber Leave a Comment

This chapters documents the implementation of a parallel distributed memory out-of-core (OOC) solver for performing LU and Cholesky factorizations of a large dense matrix on clusters equipped with Intel Xeon Phi coprocessors. The code was ported from CUDA with high-level library routines in CUBLAS This matches well with the offload model for the coprocessor using the … [Read more...]

Heterogeneous MPI Optimization With ITAC

November 6, 2014 by Rob Farber Leave a Comment

This chapter focuses on the workload balance of MPI applications running in heterogeneous cluster environment consisting of Intel Xeon processors and Intel Xeon Phi coprocessors in a financial industry application that calculates Asian option payoffs. Three cases are considered: unbalanced symmetric MPI code, manual balancing with pre-calculated performances of the cluster … [Read more...]

Profiling Guided Optimization On Intel Xeon Phi

November 5, 2014 by Rob Farber Leave a Comment

This chapter in High Performance Parallelism Pearls by Andrey Vladimirov focuses on the use of Intel VTune Amplifier XE reports to understand where to apply optimization on matrix transposition, a small and self-contained workload of great practical value. The optimization process applied to the code relies exclusively on programming in a high-level language plus utilization of … [Read more...]

Characterization And Optimization Methodology Applied To Stencil Computations

November 4, 2014 by Rob Farber Leave a Comment

The  chapter discuss characterization and optimization methodology applied to a 3D finite differences (3DFD) algorithm used to solve constant or variable density isotropic acoustic wave equation (Iso3DFD). From an unoptimized version to the most optimized, the authors achieved a six-fold performance improvement on Intel Xeon E5-2697v2 processors and a nearly thirty-fold … [Read more...]

Portable Performance with OpenCL On Intel Xeon Phi

November 3, 2014 by Rob Farber Leave a Comment

This High Performance Parallelism Pearl show the potential for using the OpenCL™ standard parallel programming language to deliver portable performance on Intel Xeon Phi coprocessors, Xeon processors, and many-core devices such as GPUs from multiple vendors. This portable performance can be delivered from a single program without needing multiple versions of the code, an … [Read more...]

High Performance Ray Tracing With Embree On Intel Xeon Phi

October 31, 2014 by Rob Farber Leave a Comment

Ray tracing is a technique for generating images of synthetic scenes. Because ray tracing simulates the physics of light transport in the real world, it can be used to achieve high quality and even photorealistic results. The chapter authors in High Performance Parallelism Pearls describe how the Intel Embree ray tracing kernel library can be used to achieve high performance … [Read more...]

« Previous Page
Next Page »

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • ARM64 with CUDA Early Access Boards Now Available
  • Altera OpenCL Programmable FPGA Talks QPI, HMC, and 100G Optical Interconnect
  • N-body Methods on Intel Xeon Phi Coprocessors
  • Performance Optimization Of Black-Scholes Pricing On Intel Xeon Phi
  • Turn Glasses or Sunglasses into Smart Glasses with Sony Device

Archives

© 2026 · techenablement.com