N
N
Nestratov2014-03-20 14:16:57
PHP
Nestratov, 2014-03-20 14:16:57

How to organize work with a huge amount of information?

Hello. Help find a solution. There are codes like as2dSd9 , there are about 50.000.000 of them . Advise where to store this information, maybe some kind of database. What is the best way to organize this? Work with information will be in php and, probably, c#.

Answer the question

In order to leave comments, you need to log in

8 answer(s)
A
Andrey Dugin, 2014-03-24
@Nestratov

A structure like Radix Tree, in my opinion, fits perfectly:
en.wikipedia.org/wiki/Radix_tree

R
Rpsl, 2014-03-20
@Rpsl

If complex selections are not required and RAM allows, then you can store at least in key-value storage.
For example, put in a radish, where the code is the key:
Then just check if there is such a key or not.
You can remove it in mongodb or mysql, but these will be more expensive solutions in terms of memory and access speed, because index maintenance is required.

T
Timur Sergeevich, 2014-03-20
@MyAlesya

Well, in MySQL you can)

A
Andrey Fedorov, 2014-03-20
@4b65696e

I would recommend radishes. Fast key-value storage. The key is the value of the code. If you need speed.
Though I think and (My | Pg) SQL will cope with an index on 50 records.

A
Andrew, 2014-03-20
@kaasius

In pursuit of previous speakers. I would do this:
1. The main storage is redis. If storage of attributes is necessary, then it is necessary to look towards hashes. If you just need to check the presence of a key in the database, you can, as before, store information in the format key -> 1.
2. Storage "just in case" - in sql. That is, if an active data update is expected, then it must be taken into account that redis is not an ACID storage, that is, part of the data may be lost upon failure. If the data update is not active, you can refuse additional storage.

I
Ivan Starkov, 2014-03-20
@icelaba

In general, with such tasks, the question always arises - what to choose - speed or memory, recently it has become easier to resolve -
now memory is very cheap, so if speed is important, then there is no point in saving on memory - buying memory is cheaper than stirring up something very complicated.
Therefore, if speed is important (millions of checks per second for the presence of a key), then you can store it even in a text file,
and to check, create a software hash, (map, set) and load values ​​​​at server start, 50 million keys are not so much for standard std::set is about (implementation dependent)
(sizeof(key)+sizeof(_Rb_tree_node_base))*50000000 = (20+16)*50000000 = about 2
gigabytes level of several thousand checks per second, then you are correctly advised by redis or any other key value storage, there is immediately where to store, immediately there is an api for any languages
​​​​but radish is also essentially an inmemory base, i.e. memory loves
Hence, if memory is important and speed is not at all important - hundreds of checks per second, then welcome to the world of sql or similar databases.

L
LucemFerre, 2014-03-20
@LucemFerre

Based on the task, most likely, SQL will be the best solution. If your as2dSd9 type code is unique, then it will be correct to make the primary key for this field. Then the search operation by code will be fast enough.
If the codes are not unique, and speed is critical, partitioning can be used, i.e. splitting data into multiple tables. For example, you take a set of the first characters, and make your own table for each character. Accordingly, your amount of data in one table is reduced tenfold. Options how to partition can be different. You can take the ACII code from the same first character, and divide the tables by the remainder of the division by some number. Accordingly, by changing the number you change the number of tables. The point is that the amount of data in the table would give an acceptable sampling rate. And don't forget about the search index ;)
Also, there will be no problems with working with this data from both PHP and C#

A
afiskon, 2014-03-25
@afiskon

50 million is not that much. It will easily fit into one PostgreSQL on normal hardware.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question