Voice detection in a (telephone) audio file and cut, which library to use in Python?

X

xmaster832015-01-27 04:21:15

Python

xmaster83, 2015-01-27 04:21:15

There is a task, to cut the voice of a telephone conversation into separate small wav
. I have already broken the calling and receiving channel, now the question is how to cut the remaining segments of the voice, is there a python library that works with this?
Thanks

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

O

Ololesha Ololoev, 2015-02-05
@alexeygrigorev

I did it without a library. To do this, you can use the energy of the signal in the window, and if the energy exceeds a certain threshold, then there is a voice in the window. There is often noise in a telephone audio file, so it makes sense to first "align" the signal in the window - i.e. subtract the average of each element.
More or less like this:

window = signal[i:(i+win_len)]
energy = ((window - window.mean()) ** 2).sum()
voice = energy > threshold

The algorithm is an automaton with two states "silence" and "voice":
You can switch to the "voice" mode only after you have met several consecutive windows with a voice, so as not to pull out any clicks and other artifacts from the recording.
In my case t = 10 and a window width of 512 gave the best results, but in your case the result may be different. The level of silence I used, I don't remember exactly. You can just take silence and see what kind of energy it has and compare it with the energy of the voice, and take the average value between them.