How does autotune work?

R

Robotex2012-10-12 02:26:01

Mathematics

Robotex, 2012-10-12 02:26:01

Many times I have seen applications that automatically adjust what is sung into the microphone in such a way that the voice falls into the notes. How are they arranged? What algorithms are they based on?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

M

merlin-vrn, 2012-10-12
@merlin-vrn

1. Gate
Determines if there is a signal at the input. Usually adaptive. If there is no signal, the logic of analysis and change does not work, the signal from the input is simply delayed by an algorithmic delay (tens of ms) and copied to the output.
2. Analysis
Quite simply. FFT, the tone parts (cord harmonics) and formants are separated. The tone frequency of the harmonic part is analyzed.
It is calculated how to change this frequency. (There may be several approaches here.)
3. Operations
The tone part is shifted. Formants remain. Then it's all reassembled
. The whole algorithm gives some delay, the size of which is related to the size of the FFT window. They also do window coverings. In general, the delay is usually more than a window twice.
The modes of operation are:
- Automatic pull-up when the frequency is replaced by the frequency of the nearest exact tone. For example, if your sound is 1/8 higher than the “Do” note of the first octave, it will be lowered to an exact match with this note
– “Guided” pull-up (guided, midi mode). Sound and a MIDI stream of notes are sent to the program, to which everything is attracted. So you can even sing monotonously, and the program will make a melody out of it.
- You can just add a copy "three semitones higher." The vocalist sings alone, but sounds like "with a backing vocal".
- You can do it this way: you sing alone, and hold a chord on the MIDI keyboard. All notes of the chord are sent to the program. This starts several processes of the operation, and the result is summarized. It turns out that you sing this chord in chorus with yourself. When used properly, the results are amazing.
In general, this approach - formant synthesis - is used to distort the sound (vocoder), but since in this case we take a natural voice and leave the formants, the sound is also more like a natural one.
Something like this.

M

mydoom, 2012-10-12
@mydoom

Somehow I thought about this question, I didn’t bother to google it, so I’m talking from the bulldozer, if I’m wrong, please correct me.
The algorithm is something like this - we take a recording, cut off the background to leave only the voice, then the Fourier transform, and we pull up the harmonics that make up the voice to the desired frequencies, which correspond to the steps of the scale that we want to get.