Common Voice Project

This is a slight deviation from usual posts. I was exploring speech recognition for edge devices and happened to run across the open-source Common Voice Project from Mozilla(Firefox) team. Everyone in tech should know about this initiative. Common Voice is a massive multilingual collection of recorded speech for AI-related voice training. It’s entirely open-sourced. ie) The raw datasets, code bases and even the ML models generated from them. The Speech-To-Text engine which was implemented is called the Mozilla DeepSpeech and it’s based on a 2014 DeepSpeech paper from Baidu Search Engine. (DeepSpeech and DeepSpeech2 are great papers if you want to read about their technical gory details). I have linked it below.

To date, the project has recorded over 20,000 hours of data in more than 96 languages across the world. That’s pretty large. Of the recorded data, more than 15,000 hours are validated. The beautiful part is that the entire validation is crowd-sourced. Meaning folks across the world spend a couple of minutes here and there to double-check the recorded data and see if the transcription of these many hours of data is actually correct.

Having such a massive database will always help researchers develop great speech recognition applications going ahead. My sincere request to everyone reading this would be to go the link below and spend maybe 3-5mins maybe on a weekly basis to help with the recording or validation. Whenever you are free or bored. The UI is very simple and easy to use. You just hear the audio and click if the said audio content matches the written text below it. Kinda like a captcha for audio. Or take time to record/validate in your local language. If even a few of you contribute it will greatly help out extending human knowledge. This is one open-source project which can massively impact the future of speech recognition.

Links:

https://commonvoice.mozilla.org/

https://github.com/mozilla/DeepSpeech

https://arxiv.org/pdf/1412.5567.pdf

PS: Fun fact there are only 17hrs recorded Hindi audio on the platform whereas English tops at 3100 hours of recorded data.

If you liked the post, Share it with your friends!

Comments are closed.