A
A
Art_Sh822020-01-12 10:28:42
Microcontrollers
Art_Sh82, 2020-01-12 10:28:42

How to create a dataset of voice commands?

Hello.
I am making a voice command recognition system based on the Tensor flow lite neural network. It works on an MK with a Cortex m4 core
. Interested in the correct algorithm for preparing a dataset of samples of the pronunciation of voice commands and subsequent training. At the moment, I did this: I collected (recorded) samples of the pronunciation of commands, brought all the files to the same form, and then trained the grid with this dataset. Recordings were made in silence. The system worked, commands are recognized, but only in silence. How to make the system more tolerant of ambient noise?
What is generally the correct algorithm for creating a dataset and training a grid? I read that somehow they mix noise - they add files with noise recordings to the dataset, but I did not find any sensible information. Thanks in advance for your replies.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
vanyamba-electronics, 2020-01-13
@vanyamba-electronics

I will illustrate the problem on pattern recognition.
Let's say there is a neural network that recognizes a drawn cross. There are two sticks - the neuron fires. The sticks intersect - another neuron fires. Two neurons fired - "there is a cross" appeared at the output.
Now let's add noise - three sticks instead of two. How should the neural network behave in this case? Theoretically, if the sticks intersect, then there is a cross, but this way we get a false positive, for example, if the picture is not a cross, but the letter H.
In some cases, it is important to determine the presence of a cross as an abstract, but what if this figure sets the password ? Then a false positive is unacceptable - the system should not open access both with a cross and with H.
Likewise in this case. It is necessary to filter the noise by amplitude, and then recognize the command. If the command is not recognized because, for example, two people are talking at that moment, then the command is not recognized. Otherwise, there will be a false positive, and the system will interfere with your attention to talk in her presence.
You can make the system smarter - recognize different noises. Suppose a command is given, and at the same time a car is driving - this is one case. Or a command is given, and at the same time the child is crying - this is another case.
Such a system promises to work much better, but there can be quite a few such cases. Will the capabilities of the microcontroller be enough - that's the question.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question