• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Archives for Intel

Concurrent Kernel Offloading On Intel Xeon Phi

October 20, 2014 by Rob Farber Leave a Comment

Chapter 12 of High Performance Parallelism Pearls discusses optimizing performance when offloading concurrent kernels (e.g. task-parallelism) to the Intel Xeon Phi coprocessor. The authors state, "Our ultimate optimization target in this chapter is to improve the computational throughput of multiple small-scale workloads on the Intel Xeon Phi coprocessor by concurrent kernel … [Read more...]

Augmented Reality Company Magic Leap Bought by Google for $500M

October 18, 2014 by Rob Farber Leave a Comment

The reports are that the highly secretive augmented reality company Magic Leap has been acquired by Google for $500M. The acquisition of Magic Leap signals the chocolate factory wishes to become the Hersey's chocolatier for the augmented reality industry and demonstrates the demand for augmented reality applications. TechEnablement is observing a movement towards … [Read more...]

Dynamic Load Balancing using OpenMP 4.0

October 17, 2014 by Rob Farber Leave a Comment

Gilles Civario and Michael Lysaght from ICHEC show how to take advantage of the OpenMP 4.0 standard on Xeon and Intel Xeon Phi coprocessors to portably and efficiently maximize an N-body kernel on the entire available hardware. The chapter authors point out that the sample code can be used as a template applicable for countless of real live codes. By adapting this template to … [Read more...]

Free Colfax Intensive Intel Xeon Phi Training Slides

October 16, 2014 by Rob Farber Leave a Comment

Colfax International is making the 280 page slide deck  by Andrey Vladimirov  and Vadim Karpusenko titled "Parallel Programming and Optimization with Intel Xeon Phi Coprocessors" available free to those who wish to register at this URL. This training is an intensive course for developers wishing to leverage the Intel MIC architecture. and increase their knowledge of multi-core … [Read more...]

OpenCL Included! Intel Integrated Native Developer Experience

October 16, 2014 by Rob Farber Leave a Comment

Intel just released INDE. the Intel® Integrated Native Developer Experience cross-platform development suite that is claimed to provide a complete, consistent set of C++ and Java tools, libraries, and samples for environment setup, code creation, compilation, debugging, and analysis on Intel® architecture-based devices and select capabilities on ARM-based Android devices. The … [Read more...]

N-body Methods on Intel Xeon Phi Coprocessors

October 16, 2014 by Rob Farber Leave a Comment

The chapter authors (Rio Yokota and Mustafa Abdul Jabbar) achieve roughly 1.5 TF/s single-precision performance when running an optimized direct N-body kernel on an Intel Xeon Phi coprocessor. This level of performance was achieved through the use of OpenMP, SIMD directives, and _mm512 intrinsics. The authors note the strong scalability of the execution was close to ideal, … [Read more...]

A Many-Core Implementation Of The Direct N-body Problem

October 15, 2014 by Rob Farber Leave a Comment

Chapter 9 of High Performance Parallelism Pearls presents several optimizations that are usually necessary to obtain good performance on an Intel Xeon Phi coprocessor that include: introducing a softening factor, exploring the impact of single- vs. double-precision, Improving tililing, utilizing an SoA (Structure of Arrays) layout, generating code that does not maintain IEEE … [Read more...]

Optimizing Gather/Scatter Patterns On Intel Xeon Phi

October 14, 2014 by Rob Farber Leave a Comment

Many modern microarchitectures rely on single-instruction multiple-data (SIMD) execution to provide high compute capabilities in an energy efficient manner. Such microarchitectures including those employed by the most recent Intel Xeon processors and Intel Xeon Phi coprocessors are optimized and/or better suited to dealing with contiguous loads and stores than non-contiguous … [Read more...]

Deep-­Learning And Numerical Optimization

October 13, 2014 by Rob Farber Leave a Comment

The massively parallel mapping and code described in this chapter is generic and can be applied to a broad spectrum of numerical optimization and machine-learning algorithms ranging from neural networks to support vector machines to expectation maximization and independent components analysis. Many of these techniques are heavily used in lucrative data-mining and social media … [Read more...]

Intel Xeon Phi Provides Cambridge 30x Speedup in Production COSMOS WALLS Code

October 10, 2014 by Rob Farber Leave a Comment

Professor Paul Shellard, the COSMOS Director at Cambridge University reports a 30x speedup of the heavily utilized production WALLS code and he notes "Our expectation is that all our cosmological field theory codes, like WALLS, will have similarly large speed-ups when optimized and ported to Xeon Phi."  Currently the project is transferring a larger portion of the CMB analysis … [Read more...]

« Previous Page
Next Page »

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Dynamic Load Balancing using OpenMP 4.0
  • Register For Lustre's Brent Gorda Parallel Storage and Big Data HP-Cast
  • MAGMA LU Decompositions, Factorizations, and Eigensolvers for Intel Xeon Phi Coprocessors Released
  • PyFR - Python/GPU Combustion Code Shortlisted for Several HPCWire Readers Choice Awards
  • Learn how to program IBM's 'Deep-Learning' SyNAPSE chip

Archives

© 2026 · techenablement.com