Answer the question
In order to leave comments, you need to log in
How to run this neuron?
I'm not good at programming at all.
Need help getting this neuron up and running.
Explain step by step
https://github.com/respeecher/vae_workshop
Answer the question
In order to leave comments, you need to log in
This neuron is launched like this:
1) Install the necessary;
2) You start.
Usually they start with a kick.
Threat author, so you wrote a question, then you look at how it looks like in the end, or do you care?
Requirements
tensorflow >= 1.5
tflearn (I'm sorry)
librosa
scipy
Usage
1) Feature extraction
Prepare a set of audio files in some folder (say, audio/) and split it into two subfolders audio/train and audio/test. The files can be in either .wav or .ogg format. Extract features as using extract_features.py:
$ cd vae_workshop
$ python extract_features.py ../audio/ ../audio_feat/
The command above will extract both compressed mel and full linear log-magnitude spectrograms and store them as .npz files in audio_feat /. See python extract_features.py -h for more details.
2) Train the VAE
Run the VAE training script:
$ python train_beta_vae.py checkpoints/vae/ ../audio_feat/train/ ../audio_feat/test/
On a 1080Ti it takes about 1.5 hours to train the network for 300k iterations. While it is training, it is interesting to take a look at the learning dynamics using tensorboard:
$ tensorboard --logdir checkpoints/
3) Train the mel spectrogram inverter
Mel-spectrogram is usually about 100 frames per second at 80 dimensions per frame. This is a very compressed representation, so to get nicely sounding waveforms one can use a separately trained neural network (a CBHG block, as in Wang, Yuxuan et al. "Tacotron: Towards End-to-End Speech Synthesis.", 2017) to convert from eg an 80-dim melspec representation to a 513-dim power spectrum, which can then be converted back to the waveform using the Griffin-Lim approximation (Daniel Griffin and Jae Lim. Signal estimation from modified short-time Fourier transform, 1984).
Run the training script:
$ python train_inverter.py checkpoints/inverter/ ../audio_feat/train/ ../audio_feat/test/
With a 2-hour single speaker dataset, it takes about 40k iterations to reach reasonable quality. This model is much larger, so it can take several hours to train.
4) Run the notebook and test the models
$ python jupyter-notebook
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question