• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Featured news / Facebook Open Source GPU FFT 1.5x Faster Than NVIDIA CUFFT

Facebook Open Source GPU FFT 1.5x Faster Than NVIDIA CUFFT

January 2, 2015 by Rob Farber Leave a Comment

Facebook has written a Fast Fourier Transform  (fbfft) that is 1.5x faster than the NVIDIA CUFFT implementation at sizes 8-64. The paper “Fast Convolutional Nets with fbfft: A GPU Performance Evaluation” discusses the performance increases by changing to a non-zero padded FFT layout (potentially eliminating data copies), the use of autotuning, and clipping to conditionally load a value (that allows for more efficient control flow rather than using explicit loop prologues and epilogues).

The Facebook AI research authors, Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino and Yann LeCun, “observed an overall mean speedup of 1.51× with standard deviation 0.21 and geometric mean 1.49×. The minimum speedup was 1.21×, despite sometimes performing more computations with fbfft which can only interpolate to a power of 2. These experiments exercise the zero-copy padding and lower memory footprints of fbfft compared to cuFFT.” The authors are working on additional optimizations such as tiling and bit twiddling elision.

fbcudnn Speedups (image courtesy Arxiv.org)

fbcudnn Speedups (image courtesy Arxiv.org)

For more information, see the arxiv.org paper, “Fast Convolutional Nets with fbfft: A GPU Performance Evaluation” or the Facebook github repository for fbcudnn.

 

 

Share this:

  • Twitter

Filed Under: Featured news, News Tagged With: deep-learning, FFT, machine-learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Recovering Speech from a Potato-chip Bag Viewed Through Soundproof Glass - Even With Commodity Cameras!
  • DARPA Goals, Requirements, and History of the SyNAPSE Project
  • Lustre Delivers 10x the Bandwidth of NFS on Intel Xeon Phi
  • South Africa Team Wins Their Second Student Supercomputing Competition At ISC14
  • Micron Automata Processor SDK Now Available - Includes Online Demo!

Archives

© 2025 · techenablement.com