• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / News / Facebook Open-Sources Torch for Deep-Learning Neural Networks

Facebook Open-Sources Torch for Deep-Learning Neural Networks

January 19, 2015 by Rob Farber Leave a Comment

Facebook has made Torch, an open source development environment for numerics, machine learning, and computer vision, with a particular emphasis on deep learning and convolutional nets available to everyone. The latest release includes GPU-optimized modules for large convolutional nets (ConvNets), as well as networks with sparse activations that are commonly used in Natural Language Processing applications. The ConvNet modules include a fast FFT-based convolutional layer covered in an earlier TechEnablement article, “Facebook Open Source GPU FFT 1.5x Faster Than NVIDIA CUFFT“.

Torch includes a number of other CUDA-based modules and containers, including:

  • Containers that allow the user to parallelize the training on multiple GPUs using both the data-parallel model (mini-batch split over GPUs), or the model-parallel model (network split over multiple GPUs).
  • An optimized Lookup Table that is often used when learning embedding of discrete objects (e.g. words) and neural language models.
  • Hierarchical SoftMax module to speed up training over extremely large number of classes.
  • Cross-map pooling (sometimes known as MaxOut) often used for certain types of visual and text models.
  • A GPU implementation of 1-bit SGD based on the paper by Frank Seide, et al.
  • A significantly faster Temporal Convolution layer, which computes the 1-D convolution of an input with a kernel, typically used in ConvNets for speech recognition and natural language applications. The latest version improves upon the original Torch implementation by utilizing the same BLAS primitives in a significantly more efficient regime. Observed speedups range from 3x to 10x on a single GPU, depending on the input sizes, kernel sizes, and strides.

Soumith Chintala claims in the Facebook research blog post that “Torch is widely used at a number of academic labs as well as at Google/DeepMind, Twitter, NVIDIA, AMD, Intel, and many other companies”. For more information see http://torch.ch/.

Soumith Chintala (image courtesy github)

  • Interested readers can also find the TechEnablement deep-learning teaching code that achieved 13 PF/s average sustained performance on the farbopt github repository. More about how the parallel mapping that delivers petaflop performance on GPUs and Intel Xeon Phi can be found here.
  • NVIDIA also provides the cuDNN deep-learning library.

 

Share this:

  • Twitter

Filed Under: CUDA, Featured article, Featured news, News, News Tagged With: deep-learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • High Performance Ray Tracing With Embree On Intel Xeon Phi
  • MultiOS Gaming, Media, and OpenCL Using XenGT Virtual Machines On Shared Intel GPUs
  • Intel Xeon Phi Study Guide
  • Free Intermediate-Level Deep-Learning Course by Google

Archives

© 2025 · techenablement.com