Intel Engineer Robert Ioffe has posted an OpenCL QuickSort tutorial that utilizes nested parallelism and Workgroup-scan functions. In particular, the tutorial shows how to use the OpenCL™ 2.0 enqueue_kernel functions that queue kernels from the device without host intervention (Much like dynamic parallelism) plus work_group_scan_exclusive_add and … [Read more...]
MAGMA LU Decompositions, Factorizations, and Eigensolvers for Intel Xeon Phi Coprocessors Released
MAGMA MIC 1.3.1 now provides implementations for MAGMA's one-sided (LU, QR, and Cholesky) and two-sided (Hessenberg, bi- and tridiagonal reductions) dense matrix factorizations, as well as linear and eigenproblem solver for Intel Xeon Phi Coprocessors. The MAGMA MIC 1.3.1 release adds Added orthogonal transformations … [Read more...]
Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors
Andrey Vladimirov at ColFax International has posted source code and a paper, "Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices" on the ColFax site. Andrey notes, "Benchmarks show that the discussed optimizations improve the application performance on the coprocessor by a factor of 2.8 compared to the unoptimized … [Read more...]
Attend or Submit to the 3rd IWOCL May 12-13, 2015 at Stanford University
The 3rd IWOCL (International Workshop on OpenCL) takes place at Stanford University, California from Tuesday 12 to Wednesday 13 May 2015. Workshops are held on the Tuesday, followed by the one-day conference on Wednesday. The conference is accompanied by a poster session and table-top displays by sponsors. OpenCL Papers, Workshops and Posters The IWOCL 2015 call for … [Read more...]
Free IEEE OpenACC Webinar Using the PGI Compiler
Register here to view a recent Dec. 11, 2014 IEEE webinar on OpenACC by Michael Wolf, a compiler engineer at PGI, who presents the latest PGI support for C++ features and will look at the roadmap for more complete PGI OpenACC support in the future. Michael will also show some significant performance enhancements that should impact all OpenACC programmers. He closes with a short … [Read more...]
Register for TACC Webcast Teaching Parallel R Using Intel Xeon Phi
Register to learn about using R - and Intel Xeon Phi accelerated R - in your HPC applications via this TACC webcast. For more information about accelerated R see the pdf of the TACC presentation, "High-Performance R". This workshop will introduce participants to data intensive computing using R on Stampede. Prior experience with R is necessary in order to benefit from the … [Read more...]
Unlike Oculus – Microsoft HoloLens Lets You Move
By itself, the 3D Microsoft HoloLens is interesting, but the big news is that the device is real-world - you can wear it while standing and walking around! http://youtu.be/aThCr0PsyuA?t=1m40s In comparison, Oculus Rift DK2 is still a closed-world, not-recommended-to-move visor. https://www.youtube.com/watch?v=0Ignn19Ajvs See our summary visor review for more … [Read more...]
Jeremy Howard – A Deep-Learning Education and Why It Can Take Your Job
An excellent TED.com talk by Jeremy Howard teaches how deep-learning can let computers see, hear, and comprehend sentences with human accuracy. He then looks ahead to the implications to jobs in the developed world. http://www.ted.com/talks/jeremy_howard_the_wonderful_and_terrifying_implications_of_computers_that_can_learn … [Read more...]
Intel’s $2.1B 2014 Revenue Shows Internet of Things is Here to Stay
IoT (Internet of Things) is here to stay as demonstrated by Intel's $2.1B revenue from IoT reported in the latest 2014 earnings report. IoT is characterized by a diversity of sensors, wireless radios, and truly miniature processors that enhance what devices around us can sense and do. For example, TechEnablement wrote about the new Curie module, which fits a sensor hub, a Quark … [Read more...]
Facebook Open-Sources Torch for Deep-Learning Neural Networks
Facebook has made Torch, an open source development environment for numerics, machine learning, and computer vision, with a particular emphasis on deep learning and convolutional nets available to everyone. The latest release includes GPU-optimized modules for large convolutional nets (ConvNets), as well as networks with sparse activations that are commonly used in Natural … [Read more...]









