Submission deadlines are Feb. 4, 2015 (single microphone) and Feb. 19, 2015 (multiple microphone). The ASpIRE challenge asks solvers to develop innovative speech recognition systems that can be trained on conversational telephone speech, and yet work well on far-field microphone data from noisy, reverberant rooms. Participants will have the opportunity to evaluate their techniques on a common set of challenging data that includes significant room noise and reverberation. Whereas the Babel program seeks to develop agile and robust technology that can be rapidly applied to any human language, this Challenge focuses on English language speech recognition.
There are two evaluation conditions:
- The Single Microphone (single-mic) Condition tests the ability to mitigate noise and reverberation given a single microphone recording (selected randomly) from speech recorded in several rooms with a variety of microphones. Single-mic evaluation data will be made available at 10:00 A.M. Eastern Standard Time 04-Feb-2015 and submissions must be received by 11:59 P.M. Eastern Standard Time 11-Feb-2015 to be eligible for award.
- The Multiple Microphone (multi-mic) Condition tests the ability to mitigate noise and reverberation given all of the microphone recordings of speech recorded in several rooms with a variety of microphones. Multi-mic evaluation data will be made available at 10:00 A.M. Eastern Standard Time 12-Feb-2015 and submissions must be received by 11:59 P.M. Eastern Standard Time 19-Feb-2015 to be eligible for award.
In both conditions, word error rate (WER) will be used as the objective measure of performance. Solvers can participate in either or both conditions. There will be separate monetary awards given for the best system in the single microphone ($30,000) and the multiple microphone ($20,000) conditions. The winner in each condition must achieve a WER that is at least 1% lower than the performance levels attained by the second best system to win.
TechEnablement has covered recent advances in audio collection such as recovering speech from a potato chip bag viewed through soundproof glass.
Leave a Reply