Where to store infinity of records (111 * 10^29)?

A

Alex Wells2016-04-27 17:10:29

SQL

Alex Wells, 2016-04-27 17:10:29

Hello. You need to store entries in the form key => value. Number of records - 111 * 10^29 (according to my algorithm). Key - number or string up to 35 characters long, value - string up to 60 characters long. Question: where can you store such a huge amount of data? The search time for this database is 20-25 seconds. I understand that the volumes of data are huge, so you can store the database in RAM (how much memory will it eat in different languages?), is it possible?
Thanks in advance.

Reply

Answer the question

In order to leave comments, you need to log in

7 answer(s)

R

Rsa97, 2016-04-27
@Rsa97

What are you going to keep?
95*111*10 ²⁹ bytes ≈ 10 ³⁴ bytes ≈ 10 ²² terabytes
So you buy a thousand million million million hard drives for 10 Tb and get your own storage.

D

Dark Hole, 2016-04-27
@abyrkov

On an infinite hard drive, of course. And how did you write such garbage?

T

ThunderCat, 2016-04-28
@ThunderCat

100,000,000,000,000,000,000,000,000,000,000,000 bytes...
won't fit on a flash drive (

A

Alexey, 2016-04-27
@alsopub

You can keep storing such a large amount of data in the same place where you generate it - in the algorithm.
PS. A question the answer.

M

MetaDone, 2016-04-27
@MetaDone

https://aws.amazon.com/documentation/dynamodb/

M

Maxim Kudryavtsev, 2016-04-28
@kumaxim

First, check your algorithm again. Most likely, you have a good bunch of duplicates there, if not 100%, then some pieces will definitely be repeated. I do not believe that all 100% will be somehow very unique.
The first thing you do is put common pieces of your information into a separate field. Do you know such a structure as a tree? This common piece will be stored at the top of the tree. Further, each vertex stores links to lower nodes with some other unique data, and so on. In principle, you can have an infinite number of nesting levels.
The question is where to store. Something better than hard drives has not yet been invented for this. In your case, it would make more sense to use hybrid SATA + SSD + RAM storage. The data that is accessed most often is in Redis (i.e. RAM), just frequently used - on SSD, something rarely needed - on SATA. Write the frequency calculation algorithm yourself, defining for your task what is often, not very, and rarely.
Which of the providers can provide this - digitalOcean has tariffs with hybrid SATA + SSD screws, take a closer look at them. I also advise you to look towards docker, in your case, I think you will need 10+ machines for storage, and this thing will allow you to manage their configuration more easily.
Regarding the time for extraction, search, etc. - Google on the topic "storage of trees", "search in a tree", etc. Try to get away from complete graphs, try to get away even from cycles, I will even say more, DO NOT DO a complete graph or a cycle in a graph on such a volume, you will simply shoot yourself in the foot.