How to process audio before feeding it to the neural network?

S

Stepan Sidorov2020-09-27 17:07:43

Neural networks

Stepan Sidorov, 2020-09-27 17:07:43

I am writing a neural network for sound recognition (study).
Sound bites range from 0.5 seconds to 3 seconds.
I figured out how to separate the sound, I just divide the fragments into even parts (already done), it lasts about 0.1 seconds for me.
But I don’t know how to adequately process it before submitting it to the neural network. In its raw form, it does not work, no matter how many neurons the error is too large.

The question is how to make close sounds distinguishable.
Something like pass through the formula, and the sound will become more clear / large-scale / graceful (?), Suitable for a neural network.

Link to the article, the formula in the answer - everything will do.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

Roman Mirilaczvili, 2020-09-27
@always-prog

It is necessary to reduce the dimensionality of the data by encoding the inputs, finding qualitative and quantitative features in the frames.

A

Armenian Radio, 2020-09-27
@gbg

You distill it into the frequency domain (read about FFT), cut the resulting spectrum into strips, give the intensity in each strip to the network input.
An extended version is to do the same, but with wavelets.