Potatoes have ears as well as eyes – or at least potato-chip bags have ears. Researchers from MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. The ACM paper, “The visual microphone: passive recovery of sound from video” describes the algorithm. Many surfaces flexible enough to vibrate at audio frequencies appear to be usable. Examples include potato chip bags, aluminum foil, the surface of a glass of water, and even the leaves of a potted plant. Essentially the technique passes successive frames of video through a battery of image filters, which are used to measure fluctuations, such as the changing color values at boundaries, at several different orientations (horizontal, vertical, and diagonal) and several different scales. The research team reports that intelligible audio can be recovered using a high-speed camera. Useful information such as speaker gender, number of speakers, and potentially speaker identities can also be recovered using a commodity camera. In particular, the strobe light effect that is causes fast moving objects to move in steps rather than blurring can be exploited so commodity cameras can provide useful high-frequency information. Obviously eavesdropping is an application, but Alexei Efros, an associate professor of electrical engineering and computer science at the University of California at Berkeley also feels the technique can be used to characterize the properties of materials.
Looking for the next cool Tegra K1 application? This would certainly get interest. Also, beware your webcam.
- “The visual microphone: passive recovery of sound from video.” Abe Davis, et al. Journal ACM Transactions on Graphics (TOG), Volume 33 Issue 4, July 2014, Article No. 79. DOI: 10.1145/2601097.2601119.
http://youtu.be/FKXOucXB4a8 Interested? Check out our article on “Robots that See Through Solid Walls With WiFi”