• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / Analysis / Automatically Caption Images With Neural Networks and Vector Space Math

Automatically Caption Images With Neural Networks and Vector Space Math

December 4, 2014 by Rob Farber Leave a Comment

Imagine a magic algorithm that can create captions that accurately describe an image. The Google authors of, “Show and Tell: A Neural Image Caption Generator” claim to have created a machine-learning algorithm that approaches human-accuracy. If true, the value is clear as conventional text-based search methods can include relevant images as well as text. machine-translation services can handle Chinese characters as well. For consumers, the implications are many-fold as shoppers can use text search to more accurately find products. Meanwhile, search providers can learn tremendous amounts about consumers from the images they post. (For more information on this latter benefit, see our earlier article”Monetizing Image Recognition By Looking at the Background“.)

The authors (Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan) note their model is, “based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image”. Recurrent neural network architectures are computationally expensive to train as the network must be iterated to allow information to flow through both the feed forward and back (recurrent) connections. In today’s technology world, massively-parallel energy efficient GPUs or Intel Xeon Phi coprocessors are generally required to provide the floating-point capability for training.

LSTM: the memory block contains a cell c which is controlled by three gates. In blue we show the recurrent connections – the output m at time t − 1 is fed back to the memory at time t via the three gates; the cell value is fed back via the forget gate; the predicted word at time t − 1 is fed back in addition to the memory output m at time t into the Softmax for word prediction. (Image courtesy Arxiv.org)

LSTM: the memory block contains a cell c which is controlled by three gates. In blue we show the recurrent connections – the output m at time t − 1 is fed back to the memory at time t via the three gates; the cell value is fed back via the forget gate; the predicted word at time t − 1 is fed back in addition to the memory output m at time t into the Softmax for word prediction. (Image courtesy Arxiv.org)

The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets demonstate the accuracy of the model and the fluency of the language it learns solely from image descriptions.

Qualitative and quantitative measures indicate that the trained models are frequently quite accurate. For instance,  the current state-of-the-art BLEU score (the higher the better) on the Pascal dataset is 25, the trained model yields 59, while human performance around 69. Similar improvements in BLEU score occur on the Flickr30k dataset, from 55 to 66, and on SBU, from 19 to 27.

A selection of evaluation results, grouped by human rating. (Image courtesy arxiv.org)

A selection of evaluation results, grouped by human rating. (Image courtesy arxiv.org)

It will be interesting to see how this technology evolves at major search sites like Google, and companies utilizing the IBM Synapse Chip.

Share this:

  • Twitter

Filed Under: Analysis, Featured article, Featured news Tagged With: deep-learning, GPU, HPC, Intel Xeon Phi, Synapse

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Face It: AI Gets Personal to Make You Look Better!
  • Intel Xeon Phi Optimization Part 1 of 3: Multi-Threading and Parallel Reduction
  • LibreOffice OpenCL Acceleration for the Masses - Intel vs. AMD GPU performance
  • The CUDA Thrust API Now Supports Streams and Concurrent Tasks
  • Native File Systems on Intel Xeon Phi

Archives

© 2021 · techenablement.com