• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / News / Analysis of Phylogenetic Tree Code Shows OpenACC Within 10% Of Native CUDA

Analysis of Phylogenetic Tree Code Shows OpenACC Within 10% Of Native CUDA

September 28, 2014 by Rob Farber Leave a Comment

The paper, “Accelerating Phylogenetic Inference on GPUs: an OpenACC and CUDA comparison” by University of Barcelona and Intel Barcelona Research Center claim near-CUDA performance for OpenACC – within 10% – that can be achieved when accelerating a Phylogenetic Tree code based on the popular MrBayes Markov chain Monte Carlo (MCMC) package. Comparing with state-of-art GPU’s implementations, the OpenACC and CUDA versions showed performance gains of up to 5.2x and 5.7x, respectively. Aside from modifications to the array storage, the authors note it was only necessary to introduce 18 lines of code in order to parallelize 7 functions with OpenACC. These results are within 5% of the recent OpenACC versus hand-optimized CUDA performance comparison performed by University of Illinois at Urbana-Champaign researchers on benchmarks  chosen from the Rodinia Benchmark suite  (link).

Surprisingly, the University of Barcelona and Intel Barcelona researchers did not provide an comparison against a multi-core processor running the OpenACC code.  A nice feature of OpenACC is that it can produce efficient code for both GPUs and multi-core processors. Standards compliant OpenACC applications  can run on the host simply by specifying  the device type for the ACC_DEVICE_TYPE environment variable.

Share this:

  • Twitter

Filed Under: CUDA, Featured news, News, News, News, openacc Tagged With: CUDA Unified Memory, GPU, HPC, openacc

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Part 1: Load-Balanced, Strong-Scaling Task-Based Parallelism on GPUs
  • About
  • Rob Farber
  • Teaching The World About Intel Xeon Phi
  • Guide to Get Ubuntu 14.10 Running Natively on Nvidia Shield Tablet

Archives

© 2026 · techenablement.com