A
A
Alexey Kovalenko2022-02-08 11:46:28
PostgreSQL
Alexey Kovalenko, 2022-02-08 11:46:28

How to organize data storage for a social network news feed?

How to organize data storage for a news feed?
Cockroach DB database (core from PostgreSQL). Main entities: users, user relationship (friends, subscribers), objects (products), spheres, user subscriptions to spheres, posts, collections.
As I understand it, there are two main approaches:
1. Form a tape on the fly from the RDBMS. The implementation of such an approach in this case is not possible, because such a query will have to affect dozens of tables with dozens of joins, and the resulting set of records will contain completely heterogeneous data, for which the RDBMS is not entirely suitable.
2. Prepare a tape for each user separately. That is, an event occurs in some area, it needs to be recorded in the feed for a million subscribers. I think this method is more suitable, but two circumstances confuse me: the load on writing to the database and the amount of data to store, the chaos in resolving all sorts of situations: subscriptions, unsubscribes, deleting posts, and so on.
Which database should I choose to implement Approach 2 or similar? Casandra and HBase or Aerospike, Scylla?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
H
hint000, 2022-02-08
@kovalit

2. Prepare a tape for each user separately. That is, an event occurs in some area, it needs to be recorded in the feed for a million subscribers.
The trouble with this option is that out of a million subscribers, half did not go to the feed for half a year+, another 200 thousand did not go for a month+, another 200 thousand did not go for a week+, and your database all this time is uselessly threshing for just a million.
There is no easy solution here (there are a lot of difficult ones in social networks). You need to combine at leastthe first and second methods (and maybe some third and some fourth). And to do this optimally (without requiring as much power per million subscribers as Vkontakteik would spend on 10 million subscribers), really smart algorithms will be required, and not just advice about a suitable DBMS. Offhand, for events from friends, the second method is more suitable, and for events of spheres, the first method. But it is not exactly. You can use the second method only for those who logged in no more than a day ago, etc.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question