• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Analysis / Automatically Caption Images With Neural Networks and Vector Space Math

Automatically Caption Images With Neural Networks and Vector Space Math

December 4, 2014 by Rob Farber Leave a Comment

Imagine a magic algorithm that can create captions that accurately describe an image. The Google authors of, “Show and Tell: A Neural Image Caption Generator” claim to have created a machine-learning algorithm that approaches human-accuracy. If true, the value is clear as conventional text-based search methods can include relevant images as well as text. machine-translation services can handle Chinese characters as well. For consumers, the implications are many-fold as shoppers can use text search to more accurately find products. Meanwhile, search providers can learn tremendous amounts about consumers from the images they post. (For more information on this latter benefit, see our earlier article”Monetizing Image Recognition By Looking at the Background“.)

The authors (Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan) note their model is, “based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image”. Recurrent neural network architectures are computationally expensive to train as the network must be iterated to allow information to flow through both the feed forward and back (recurrent) connections. In today’s technology world, massively-parallel energy efficient GPUs or Intel Xeon Phi coprocessors are generally required to provide the floating-point capability for training.

LSTM: the memory block contains a cell c which is controlled by three gates. In blue we show the recurrent connections – the output m at time t − 1 is fed back to the memory at time t via the three gates; the cell value is fed back via the forget gate; the predicted word at time t − 1 is fed back in addition to the memory output m at time t into the Softmax for word prediction. (Image courtesy Arxiv.org)

LSTM: the memory block contains a cell c which is controlled by three gates. In blue we show the recurrent connections – the output m at time t − 1 is fed back to the memory at time t via the three gates; the cell value is fed back via the forget gate; the predicted word at time t − 1 is fed back in addition to the memory output m at time t into the Softmax for word prediction. (Image courtesy Arxiv.org)

The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets demonstate the accuracy of the model and the fluency of the language it learns solely from image descriptions.

Qualitative and quantitative measures indicate that the trained models are frequently quite accurate. For instance,  the current state-of-the-art BLEU score (the higher the better) on the Pascal dataset is 25, the trained model yields 59, while human performance around 69. Similar improvements in BLEU score occur on the Flickr30k dataset, from 55 to 66, and on SBU, from 19 to 27.

A selection of evaluation results, grouped by human rating. (Image courtesy arxiv.org)

A selection of evaluation results, grouped by human rating. (Image courtesy arxiv.org)

It will be interesting to see how this technology evolves at major search sites like Google, and companies utilizing the IBM Synapse Chip.

Share this:

  • Twitter
  • Email
  • Google

Filed Under: Analysis, Featured article, Featured news Tagged With: deep-learning, GPU, HPC, Intel Xeon Phi, Synapse

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Newsletter

Recent Posts

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

NVIDIA GTC 2018 Shrewdly Incremental to Position NVDA Stock for Massive Growth

NVIDIA GTC 2018 Shrewdly Incremental to Position NVDA Stock for Massive Growth

April 3, 2018 By Rob Farber Leave a Comment

Face It: AI Gets Personal to Make You Look Better!

Face It: AI Gets Personal to Make You Look Better!

March 12, 2018 By admin Leave a Comment

SURFsara Achieves Accuracy and Performance Breakthroughs for Both Deep Learning and Wide Network Training

SURFsara Achieves Accuracy and Performance Breakthroughs for Both Deep Learning and Wide Network Training

November 10, 2017 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Third-Party Use Cases Illustrate the Success of CPU-based Visualization
  • PyFR: A GPU-Accelerated Next-Generation Computational Fluid Dynamics Python Framework
  • NASA Charts Path For CFD To 2030 - Projects Future Computer Technology!
  • MSI WS60 Mobile Workstation - Awesome CUDA-Capable, Linux, and Window Mobility
  • Face It: AI Gets Personal to Make You Look Better!
  • Google+
  • Linkedin
  • Twitter

Archives

© 2018 · techenablement.com

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.