Deep-Learning Behind Microsoft Cross-Language Real-Time Skype Translator

Deep-learning lies at the heart of the Microsoft Skype translator, a near real-time speech-to-speech machine translation tool that enables voice conversations between individuals speaking different languages. The claim is that a beta version of Skype Translator will be released sometime in 2014. According to Microsoft, machine-learning of large datasets culled from social media allowed the creation of Skype Translate. For TechEnablement, the success of this project highlights the importance of fast machine-learning algorithms and hardware like our exascale-capable tutorial code on GPUs and Intel Xeon Phi.

The workflow for Skype Translator appears to be similar to the following: (disclaimer, I am not affiliated with Microsoft or this project)

Putative Skype translate speech-to-speech workflow

Understanding the difference between written language and spoken language is key to hearing natural sounding phrases. The more conversational style of writing in social media was enlisted by Microsoft to help. The task of culling the conversational phrases from social media fell to, among others, the Machine Translation team, based in Redmond lead by Arul Menezes.

“The technology is only as good as the data,” Menezes says. “One big focus has been to scale up the amount and kinds of data that go into the machine-learning training of these systems.” (quote and image courtesy of Microsoft [link])

Arul Menezes

Microsoft Research has been increasing the amount conversational data used to fine-tune the model-based training approach. Work that began with a set of 24 hours of such data now has increased significantly.

Video discussion of deep-learning neural networks

Demo of Skype Translate in action:

For more information:

Share this:

Leave a Reply Cancel reply