How to calculate how much video memory is needed to train a neural network?

F

FerroPanda2018-04-21 14:08:17

Python

FerroPanda, 2018-04-21 14:08:17

Is there any information at what NN configuration how much memory is needed in the video card for comfortable training of the neural network? Or maybe someone watched how full the memory was when training your neural network? Then I would like to see - the network configuration, the size of the minibatch during training and how much memory it took.
The purpose of this - I'm just starting to deal with neural networks (Python + Keras). I'm planning to buy an Nvidia card. At the initial stage, I don’t plan to write something super complicated, so the question arose - is there any point in chasing gigabytes right away? If typical examples from minist or sifar-10 take no more than 1-2 GB during training, then there is no point in buying a card with 6-8-11 GB ... anyway, in a year and a half you will need to upgrade, and there already the corresponding cards will be "brought up".
In principle, the most difficult thing that I see now is an incoming layer of up to 10k neurons, a hidden layer of up to 30k neurons with a relay, and an outgoing layer of 5-10 neurons with a softmax. The size of the training sample is 100k-500k examples. If someone can tell me how much optimal video memory is needed for this, I will be grateful. =)

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

F

FerroPanda, 2018-12-02
@FerroPanda

I'll answer myself. The required amount of memory is equal to the size of the training sample + a little more. If there is more data than the amount of video card memory, then the performance of the core drops, because. the constant swapping of data from RAM to the memory of the video card begins. In this case, the video card processor load drops by 2-3 times, sometimes more.
If you have a training sample of 10+ GB, then at 1060 it is possible that the map loading will be constant, close to the maximum. Those. The percentage is not fast and manages to process what has been loaded. A card 1080 and above in such a situation will be constantly idle waiting for data, i.e. the training time of the NN will be conditionally the same.
If the sample size is in the region of 5-6 GB or less, then 1080 will be significantly faster.

N

nedurov, 2020-01-02
@nedurov

Depends on the device on which you will teach.
The size of the training sample usually does not greatly affect, since it can be done in parts, but the model can eat significantly. For example, on a video card where processes are parallelized - for calculations, sections of the model are duplicated in memory. For the processor, parallelization occurs to a lesser extent and therefore the model takes up less data. Duplication of model data is necessary because the model in memory must be unchanged throughout the entire learning process, but what batches you will train the model has less effect on the size of the reserved memory