G
G
Gol2013-02-02 19:54:58
Programming
Gol, 2013-02-02 19:54:58

Audio transposition?

I'm sitting here, I'm writing a sound effects server for the virtual machine, I need it to emulate the sound of the car engine (as well as the horn and other sounds). All based on ALSA.
There is a buffer in which I place a sample of the engine sound. I figured out the mixing of sounds (for example, beeping simultaneously with the sound of the engine) - stupidly byte-by-byte addition of two sample buffers, an engine and a horn (with a subsequent limitation in amplitude so that there is no crackling). But to make the transposition of the sound of the engine, something does not work. Transposition is a change in key, i.e. the faster the car goes, the louder the sound. I tried to leave only every 2nd (or 3rd, 4th) byte from the sample buffer, but it turns out some kind of garbage.
Tell me, pliz, in which direction to dig. This whole thing has to be done on the fly. Vague memories of FFT (Fast Fourier Transform) creep into my head, but I don’t specifically understand how to apply it.
UPD> Everything is done, it works and does not even lag.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
merlin-vrn, 2013-02-03
@merlin-vrn

If you're not afraid to read the source, grab libmodplug and see how they do it. The principle has already been described to you by MTonly .
What you have been doing is "too much". If you take every second sample, you get a primitive special case (but absolutely correct from the point of view of theory!) resampling "one octave higher". Not a little, just an octave, because frequency doubles.
If you need not to double, resampling needs to be done differently. In particular, take not every second, but let's say, when playing every next 100 samples, get them from 101 initial samples. This way you will make the sound a little higher. A primitive option for you will be linear interpolation: the samples are evenly spaced, but the "grids" do not match. Calculate with what weights the next two readings should be included.
Example:
Let the PCM readings be 15 123 53 234 54 52 35 151… We want to play them a fifth higher, i.e. for every three inputs we have to reproduce two.
then the output will be:
15, (125+53)/2, 234, (54+52)/2, 35, and so on. I am here "intermediate" readings that were at those points where there was nothing in the original, I calculate using linear interpolation. Since the new samples lie exactly in the middle, the two "adjacent" ones enter with the same weights.
Another example with the same sequence: we want to make seven out of these eight readings. Output:
15, (123*5+53*1)/6, (53*4+234*2)/6, (234*3+54*3)/6, (54*2+52*4) /6, (52*1+35*5)/6, 151,…
Or, shift by a=0.992:
15, 123*(1-a)+53*a, 53*(1-2a)+234 *2a,… - it remains only to learn how to properly handle the moment when n*a becomes greater than one.
It is worth drawing a “timing chart” for one or another number of readings, and then see what and how to calculate.
For accurate interpolation, there are more advanced algorithms, in particular, FIR filters (finite impulse response, FIR, finite impulse response) - they will give less “overtones” with such a change in tone.
And then you will forget about implementing this yourself and start using the rubberband library :)

M
MT, 2013-02-02
@MTonly

The general principle of sample-based synthesis is that the higher the pitch needed, the faster the sample should be played back. Doubling the playback speed raises the pitch by an octave. For intermediate semitones, a fractional factor should be used.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question