The paper, "Accelerating Phylogenetic Inference on GPUs: an OpenACC and CUDA comparison" by University of Barcelona and Intel Barcelona Research Center claim near-CUDA performance for OpenACC - within 10% - that can be achieved when accelerating a Phylogenetic Tree code based on the popular MrBayes Markov chain Monte Carlo (MCMC) package. Comparing with state-of-art … [Read more...]
Inside NVIDIA’s Unified Memory: Multi-GPU Limitations and the Need for a cudaMadvise API Call
The CUDA 6.0 Unified Memory offers a “single-pointer-to-data” model that is similar to CUDA’s zero-copy mapped memory. Both make it trivially easy for the programmer to access memory on the CPU or GPU, but applications that use mapped memory have to perform a PCI bus transfer occur every time a memory access steps outside of a cache line while a kernel running in a Unified … [Read more...]