A
A
Andrey Surzhikov2019-05-02 17:14:55
Database design
Andrey Surzhikov, 2019-05-02 17:14:55

Which database to choose to store the "group membership" of the social network?

There are 10,000 VKontakte group IDs.
Every day, all members (user_id) of these groups are uploaded via API.
It turns out ~ 20 million records.
It turns out such a bunch:
user_id, group_id, date
Further, every day, new members of the groups are calculated (we find members who were not in the group yesterday but are today).
Question:
What type of database should be used so that the process of recording and counting new entrants is as fast as possible?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
mamokino, 2019-05-03
@mamokino

And Contact so easily gives you 20 million? And does not set any limits on speed / frequency? Marvelous.
It's not about the database, it's about the ability to use it.
Both Tarantool Key-value types and MySQL relational types and MongoDB documentary types are suitable.
If by all means it would be desirable to consider as means of a DBMS, then I would take relational. It will be quite convenient with it (the aggregation / grouping functions with sum calculation are quite smart; you just need to remember to create indexes on the grouped fields, in this case it is by group_id and by date) only, it is possible that the write speed will not suit you - then you will need to use bulk load/bulk insert when inserting. In order not to strain the database every time for these calculations, then at the end of the day write the calculated amounts to another table with the structure (date, group_id, count).
But a more correct solution, if speed is really important to you, is to implement this calculation in the server’s RAM without any DBMS., is an easy task. And the volumes of modern servers are enough for the eyes to store all this in memory. The speed will be just fantastic.
After all, if you think about it, then in general you can calculate the amount you need immediately after receiving a response from the VK API - for this you need to keep on the server in the RAM just an array / hash table with a size of 10,000 elements . This is nonsense, not size.
The database here will be needed only to save the final calculated figures. This will be the same table described above with the structure (data, group_id, count)
The hash table itself does not even need to be programmed. This is the most popular data structure. It is either already built into the standard library of your programming language. Or, for your programming language, there are already several third-party libraries with a ready-made implementation. It is called differently in different languages, you can find it under the names: collection, map, associative array, etc.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question