How to create audio autoencoder with voices / deepfake system on audio data?

M

MaxBat2021-08-10 17:28:50

Audio

MaxBat, 2021-08-10 17:28:50

I want to create autoencoder models of this type.

Human audio 1 -------> encoder -------> latent space 1 -------> decoder 1 -------> Human audio 1 Human audio
2 --- ----> encoder -------> latentspace 2 -------> decoder 2 -------> Human 2

audio "mobile", so to speak.

While rummaged through a bunch of information and do not even understand what to take. Tell me where to start, how and with what help it can be implemented, etc.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

dmshar, 2021-08-10
@MaxBat

In general, the essence of the question is not clear.
Especially since you searched through a bunch of information and found nothing (???)
Are you looking for information on autoencoders? Sound processing? Or in general - "where to start"?
Well, let's start with the autoencoder. I don’t remember good literature, but there are enough articles on the topic:
https://towardsdatascience.com/autoencoders-overvi...
https://towardsdatascience.com/introduction-to-aut...
Using ТensorFlow:
https:/ /www.machinelearningmastery.ru/implementing...
https://russianblogs.com/article/28481357544/
Using PyTourch
https://towardsdatascience.com/beginner-guide-to-v...
With practical implementation examples:
https://towardsdatascience.com/autoencoders-introd... There are
even separate discussions on the topic How to separately use the autoencoder encoder in keras and Tensorflow:
https://coderoad.ru/39551478/How-to-use-separately...
https: //coderoad.ru/51566573/Tensorflow-Keras-use...
So it's not clear what exactly you didn't find.
There is something to study in the given links, especially if you don't even "understand what to do". Here, study. Then try to be more specific in your question.