How to make a quick voice recognition for a smart speaker?

S

Semyon2021-02-06 21:14:08

Speech recognition

Semyon, 2021-02-06 21:14:08

Hello, I am assembling with my paws a smart column ala Yandex Station, but without AI). It will be a prototype for production, should distill speech into text and simply write it to a text file. Everything would be fine, but voice recognition through Google, Yandex and pocketsphinx takes too much time, however, unfortunately, I could not find anything else, and Google and Yandex are not suitable for commerce. I am considering open source solutions because the column code will be under a GPT license. Maybe there is some hardware recognition module or a smart library? It is desirable to make a column on stm32, but I'm ready to take something else, at least RPI if necessary, I will consider any options.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

N

nshmyrev, 2021-02-11
@Hitreno

For offline recognition on RPi (better 4 than 3) you can use Vosk:
https://github.com/alphacep/vosk-api
Works well. Demo here:
https://www.youtube.com/watch?v=iRwBIrWJlcI
for details, you can ask in the telegram
https://t.me/speech_recognition_ru

D

Dr. Bacon, 2021-02-06
@bacon

On stm32, voice recognition cannot be done, and besides this, you also need to understand the meaning of the text, and this is orders of magnitude more difficult. So the answer is no.

R

rPman, 2021-02-06
@rPman

If you use Google, then put for recognition any piece of iron with running google chrome, in which you will have a daemon running, using the speach api for speech recognition and synthesis.
Google and Yandex have the best offline Russian language recognition algorithms, but the latter is not so much 'open' (of course, these engines are proprietary and in general they are one of the most powerful cyber-espionage tools in the world).
When I say offline, it means Google will not send voice traffic to the network (and even this is not guaranteed), but the network connection itself will be required. I suppose this was done so that no one would use this engine for commercial purposes, and of course, control - who, when, where.
Android has 100% offline voice engines, i.e. for example, google translate works and recognizes it perfectly and very quickly, but to get api specifically for the Google engine .... I remember exactly a few years ago people were poking around in their libraries, then they covered it up.