Facebook Open-Sources Torch for Deep-Learning Neural Networks

Facebook has made Torch, an open source development environment for numerics, machine learning, and computer vision, with a particular emphasis on deep learning and convolutional nets available to everyone. The latest release includes GPU-optimized modules for large convolutional nets (ConvNets), as well as networks with sparse activations that are commonly used in Natural Language Processing applications. The ConvNet modules include a fast FFT-based convolutional layer covered in an earlier TechEnablement article, “Facebook Open Source GPU FFT 1.5x Faster Than NVIDIA CUFFT“.

Torch includes a number of other CUDA-based modules and containers, including:

Containers that allow the user to parallelize the training on multiple GPUs using both the data-parallel model (mini-batch split over GPUs), or the model-parallel model (network split over multiple GPUs).
An optimized Lookup Table that is often used when learning embedding of discrete objects (e.g. words) and neural language models.
Hierarchical SoftMax module to speed up training over extremely large number of classes.
Cross-map pooling (sometimes known as MaxOut) often used for certain types of visual and text models.
A GPU implementation of 1-bit SGD based on the paper by Frank Seide, et al.
A significantly faster Temporal Convolution layer, which computes the 1-D convolution of an input with a kernel, typically used in ConvNets for speech recognition and natural language applications. The latest version improves upon the original Torch implementation by utilizing the same BLAS primitives in a significantly more efficient regime. Observed speedups range from 3x to 10x on a single GPU, depending on the input sizes, kernel sizes, and strides.

Soumith Chintala claims in the Facebook research blog post that “Torch is widely used at a number of academic labs as well as at Google/DeepMind, Twitter, NVIDIA, AMD, Intel, and many other companies”. For more information see http://torch.ch/.

Soumith Chintala (image courtesy github)

Interested readers can also find the TechEnablement deep-learning teaching code that achieved 13 PF/s average sustained performance on the farbopt github repository. More about how the parallel mapping that delivers petaflop performance on GPUs and Intel Xeon Phi can be found here.
NVIDIA also provides the cuDNN deep-learning library.

Share this:

Leave a Reply Cancel reply