How to run a Neural Network with large volumes of information?

B

BestJS2018-11-05 05:25:06

Neural networks

BestJS, 2018-11-05 05:25:06

Now I use a ready-made neural network from the developer.
The data for the neural network is stored in a json file, in principle, like all neural networks that I have met.
Already they weigh about 2 gigs ... And they will constantly increase in size.
What if I have several terabytes of data???
It would be foolish to keep such a volume in a file. Well, the RAM is not enough.
The first thing that comes to mind is to use a database. But here the question immediately arises: how to make or make the neural network work with the database...? What DB to use?
Please tell me your thoughts on this issue, what to study, what to watch.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

dmshar, 2018-11-05
@BestJS

1. The difference in volume between 2 GB and "several terabytes" is very big. Are you sure that terabyte volumes will be achieved in the foreseeable future?
2. In the NTFS file system, the theoretical file size can be up to 32 exabytes. Practically - a little less, but I think you should have enough.
3. Stupid or not stupid to keep in a file does not depend on the amount of information, but on what you want to do with it. If you just store it in the right format and then feed it to your neuron, then switching to any database will not give you anything, except slowing down work and INCREASING resource consumption.
4. If you go to the database, then you have two paths. Or, before loading into the neuron, reformat your data - perhaps not all at once, but in parts - into a format perceived by the neuron. Or write your own code for working with the database, and implement it in the library used (fortunately, many neurons have it in the form of Open Source). The options do not differ much from each other in complexity.
5. "There is not enough RAM" - this is if you use the "in memory" class algorithm. It is necessary to look for (or write a neuron) that does not have this drawback. Such questions can be Googled for the keyword "Streaming algorithm".
5. If you really suddenly jump into the Big Data area with the need to work in streaming mode (while you are clearly not there), then you will have to look into the Hadoop and Spark area.