Answer the question
In order to leave comments, you need to log in
How to competently build sql queries in highly loaded databases?
1. There is a table for several million users, and there is also a table with movie titles. You need to select pairs of users who have at least 7 movies in common. I'm thinking of making separate tables for users and for movies and linking them many-to-many. Is this the optimal architecture, or can you think of something better? And how in this case should the optimal sql-query to the database look like?
2. You need to select users from a certain city, with the active status, and who have at least 4 movies. Again, what would be the optimal query?
3. There is a log table where data is stored about when the user was active (two timestamps - 1. became active 2. ceased to be active). It is necessary to calculate how many active users there were for a certain period of time.
Everything is complicated by the fact that there is a lot of data and there can also be a lot of requests, you need to deal with it.
Answer the question
In order to leave comments, you need to log in
1. You think correctly, only if you are looking for such pairs among ALL users and ALL films, then it will not do without a full scan of all 3 tables. Caching will help, either functional indexes, or some pivot tables controlled by triggers or materialized view.
2. Is Active a bool or a status from a list? In any case, we need an index by city_id and either by the active field, or a partial index where active = true. (I hope you have PostgreSQL)
3. Just where with 2 conditions? Or is there a catch?
Queries should not be complicated here, but you need to provide the right indexes to make it work quickly.
If you need help specifically with queries, create https://www.db-fiddle.com/ with sample data, at least 10 lines each, so that the structure is clear
The ID of active users must be stored separately. Otherwise, constantly recalculating the index for this field will be a fierce death
1-2 Try graph DBMS. They are better suited for related entities.
guides.neo4j.com/sandbox/recommendations
3. Probably analytical DBMS.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question