Answer the question
In order to leave comments, you need to log in
How to process audio before feeding it to the neural network?
I am writing a neural network for sound recognition (study).
Sound bites range from 0.5 seconds to 3 seconds.
I figured out how to separate the sound, I just divide the fragments into even parts (already done), it lasts about 0.1 seconds for me.
But I don’t know how to adequately process it before submitting it to the neural network. In its raw form, it does not work, no matter how many neurons the error is too large.
The question is how to make close sounds distinguishable.
Something like pass through the formula, and the sound will become more clear / large-scale / graceful (?), Suitable for a neural network.
Link to the article, the formula in the answer - everything will do.
Answer the question
In order to leave comments, you need to log in
It is necessary to reduce the dimensionality of the data by encoding the inputs, finding qualitative and quantitative features in the frames.
You distill it into the frequency domain (read about FFT), cut the resulting spectrum into strips, give the intensity in each strip to the network input.
An extended version is to do the same, but with wavelets.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question