Analysis of Phylogenetic Tree Code Shows OpenACC Within 10% Of Native CUDA

The paper, “Accelerating Phylogenetic Inference on GPUs: an OpenACC and CUDA comparison” by University of Barcelona and Intel Barcelona Research Center claim near-CUDA performance for OpenACC – within 10% – that can be achieved when accelerating a Phylogenetic Tree code based on the popular MrBayes Markov chain Monte Carlo (MCMC) package. Comparing with state-of-art GPU’s implementations, the OpenACC and CUDA versions showed performance gains of up to 5.2x and 5.7x, respectively. Aside from modifications to the array storage, the authors note it was only necessary to introduce 18 lines of code in order to parallelize 7 functions with OpenACC. These results are within 5% of the recent OpenACC versus hand-optimized CUDA performance comparison performed by University of Illinois at Urbana-Champaign researchers on benchmarks chosen from the Rodinia Benchmark suite (link).

Surprisingly, the University of Barcelona and Intel Barcelona researchers did not provide an comparison against a multi-core processor running the OpenACC code. A nice feature of OpenACC is that it can produce efficient code for both GPUs and multi-core processors. Standards compliant OpenACC applications can run on the host simply by specifying the device type for the ACC_DEVICE_TYPE environment variable.

Share this:

Leave a Reply Cancel reply