AMD and Pathscale announced at SC14 that they have joined the OpenACC standards committee. OpenACC provides an efficient and performance-portable path for developing massively parallel programs across a wide range of accelerators, including GPUs, many core coprocessors and multi-core CPUs. OpenACC has been gaining traction for parallel programming. Such a move appears … [Read more...]
DIY Augmented Reality Pinball Using Webcam, Projector, And Whiteboard
Using a webcam, projector, and a whiteboard, engineers at Technion, the Israel Institute of Technology, showed how to create a pinball game where the user draws the ramps, places computer generated flippers and bumpers, as well as adding real obstacles to create the pinball game. Use the techniques discussed in my book, CUDA Application Design and Development and working … [Read more...]
Gary Grider Sees End Of Parallel File Systems Before He Reaches Retirement Age
HPC storage luminary Gary Grider, noted in his talk at the SC14 Seagate HPC User Group meeting, "The Future of Supercomputing" that he funded the development of the Lustre file-system (which can stream a data at a TB/s), and he believes it is possible we will see the end of such file-systems before he reaches retirement age. Instead, Gary sees object based storage similar to … [Read more...]
Seismic Changes in the Animation Industry
TechEnablement caught up with DreamWorks CTO Lincoln Wallen after his plenary invited talk at SC14. We had the opportunity to ask Lincoln about our observation of seismic changes happening within the animation industry as technology enables small and mid-sized businesses to create studio quality animated characters for television, augmented reality, and eventually for movies … [Read more...]
Inside The IBM NVIDIA Volta plus NVlink 2017 Delivery for $325M DOE Procurements
The U.S. Department of Energy unveiled plans to build two GPU-accelerated leadership class supercomputers (Summit at ORNL and Sierra at LLNL) in a combined $325M USD procurement to be installed in 2017 that will be based on next-generation IBM POWER servers incorporating NVIDIA® Volta GPU accelerators plus NVLink™ high-speed GPU interconnect technology. The announcement by U.S. … [Read more...]
Game Changers at SC14 – Obsidian and Landsvirkjun
TechEnablement has identified two game-changing "must watch" SC14 attendees: (1) Obsidian Strategics and (2) Iceland's Landsvirkjun National Power Company. In combination, these two organizations have the potential to convert the worlds hunger for HPC and the Internet from climate destroying coal to renewable energy. As Jeff Goodell observed, "coal was supposed to be the engine … [Read more...]
CreativeC GPU And Intel Xeon Phi Cluster For SC14 Class Runs Mobile In Van
Our all-day class at SC14 on Sunday November 16, “From ‘Hello World’ to Exascale Using x86, GPUs and Intel Xeon Phi Coprocessors” (tut106s1) received more than double our expected enrollment! Students will be able to run on both Intel Xeon Phi and GPU supercomputers at TACC via an Xsede allocation (thank you very much) and on a CreativeC supercomputer and visualization cluster … [Read more...]
Under $200 Intel Xeon Phi
For a limited time Intel is selling Intel® Xeon Phi™ Coprocessor 31S1P for under $200. This offer is designed for Software developers to cost-effectively purchase systems or clusters from OEMs to modernize their code for greater levels of performance. See one of the OEMs at this link, or Intel your rep for eligibility requirements. Additionally, as part of this developer … [Read more...]
Morton Order Improves Performance
Author Kerry Evans writes in his High Performance Parallelism Pearls chapter, "There are many facets to performance optimization but three issues to deal with right from the beginning are memory access, vectorization, and parallelization. Unless we can optimize these, we cannot achieve peak performance.” Specifically, this chapter examines a method of mapping multidimensional … [Read more...]
Sparse matrix-vector multiplication: parallelization and vectorization
The chapter authors (Albert-Jan N. Yzelman, Dirk Roose, and Karl Meerbergen) note that, "Current hardware trends lead to an increasing width of vector units as well as to decreasing effective bandwidth-per-core. For sparse computations these two trends conflict.” For this reason they designed a usable and efficient data structure for vectorized sparse computations on … [Read more...]









