Achieve a 7.4x speedup with 8 GPUs over the performance of a single GPU through the use of task-based parallelism and concurrent kernels! Traditional GPU programming typically views the GPU as a monolithic device that runs a single parallel kernel across the entire device. This approach is fantastic when one kernel can provide enough work to keep the GPU busy. The conundrum is … [Read more...]
You are here: Home / Archives for Cirrascale
