C
C
Cloudo582014-08-19 17:20:44
Python
Cloudo58, 2014-08-19 17:20:44

What technologies should be used to frequently update a large number of rows in a database?

The essence of the task is as follows: it is necessary to keep up to date the database of a certain group in the social. networks (posts on the wall + comments on posts, likes on comments) and build a rating for users participating in the life of the group.
It is necessary to issue a place in the rating at the request of the user, while the rating should be updated, say, 1 time per hour.
What technologies (database, synchronization mechanism) are better to use for this?
I tried postgres as a database, update - update comment tables by cron.
As a result, it turns out that you need to recalculate the overall rating after each update. Because the comment table has over 10 million entries, the query runs for about a minute, with only a very small percentage of the entries being changed.
Now I'm looking towards mongodb and redis.
Prompt, please, a competent way of maintenance in an actual condition of such DB.
I would like users not to notice delays in getting their place in the ranking due to blocking at the time of updating the ranking.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Aleksey Kuzmin, 2014-08-19
@AlekseyKuzmin

option 1. change storage structures. The rating is stored in another table. Spread comments across multiple shards.
option 2. update the rating of a particular user when he leaves a comment, and not all users in a row.
option 3. - option 1 + 2 and queue tasks for updating the rating in the same radish with the replacement of old tasks. So the rating will be updated, but not more than once every 5 minutes (for example). If a user leaves 10 comments in 5 minutes, the rating will be updated only 1 time. The system will be more stable. there will be delays in updating the rating, but most importantly, everything else will work.

A
Alexey Cheremisin, 2014-08-20
@leahch

У Вас выбрана плохая схема обновления рейтингов. При такой схеме, чем больше комментариев, тем с каждым разом будет труднее их обновлять, ведь придется каждый раз пробегать все записи.
Зачем пересчитывать рейтинг каждый раз для ВСЕХ записей? Попробуйте сделать рейтинг счетчиками, а уж счетчики пересчитывать в проценты или что там у Вас на лету.
К примеру, изначально:
запись 1 - счетчик 1
запись 2 - счетчик 1
запись 3 - счетчик 1
Далее, на запись 2 было сделано последовательно 10 комментариев (при каждом комментарии обновляем счетчик записи), стало
запись 1 - счетчик 1
запись 2 - счетчик 11
запись 3 - счетчик 1
комментарий 1, для записи 2, счетчик, 1
...
comment 10, for record 2, counter, 1
Now, to compile a rating, it is enough for us to find only the maximum by counter, taking it as 100 percent, in the current case it will be 11. And we don’t need to update anything by cron, we don’t need any enumeration of records.
For example, for 5 records:
record 1 - counter 33
record 2 - counter 78
record 3 - counter 3
record 4 - counter 22
record 5 - counter 15
We have a maximum of MAX = 78, we calculate the rating using the formula record record counter * ( 100 / MAX )
Total when you select the maximum and any of the entries will be
entry 1 - counter 33 - rating 42
entry 2 - counter 78 - rating 100
entry 3 - counter 3 - rating 4
entry 4 - counter 22 - rating 28
record 5 - counter 15 - rating 19
E... This scheme will give not only an absolute rating, but also a relative one by choosing the maximum from a selection of records (sorry for the tautology). In other words, if records are somehow grouped by tags, then searching for the maximum by counter with a selection by tag will rank the records relative to this selection.
PS. Phew, that's washed down something ... Ask questions.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question