Answer the question
In order to leave comments, you need to log in
Which API to choose for speech recognition in movies?
In general, while the knowledge of my English leaves much to be desired, I came up with the idea to translate videos on the fly using the speech recognition api and the translator api. We need the smartest service that can recognize speech with an admixture of extraneous sounds (music, crowd noise, street noise, etc.)
Have the technologies matured to this or have I even swung?)
Answer the question
In order to leave comments, you need to log in
Automatic subtitles on YouTube clearly show that even ordinary speech (without special noise from speaking bloggers) English needs to be corrected quite strongly after recognition.
And api, for example, Yandex is suitable mainly for monosyllabic queries, for example, when navigating through the telephone voice menu, for which, in fact, such apis are used. Those. recognition at the level of yes, no, numbers.
Well, or sharpened systems for something specifically like addresses, where you can conduct an inaccurate search.
Are subtitles for wimps?
Still how they waved. If you want to translate "colloquial" English, this is hardly possible at all.
It's about how they talk. There are a lot of abbreviations (reductions), decent slang and idioms that also crumple and can only be recognized by ear. Unless you need a converter to sounds, and based on some patterns, add some ML here.
Question:
Make it real.
It remains to find a ready-made API and / or create your own NN and train it.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question