S
S
sl02020-05-13 19:43:45
PHP
sl0, 2020-05-13 19:43:45

How to competently build sql queries in highly loaded databases?

1. There is a table for several million users, and there is also a table with movie titles. You need to select pairs of users who have at least 7 movies in common. I'm thinking of making separate tables for users and for movies and linking them many-to-many. Is this the optimal architecture, or can you think of something better? And how in this case should the optimal sql-query to the database look like?

2. You need to select users from a certain city, with the active status, and who have at least 4 movies. Again, what would be the optimal query?

3. There is a log table where data is stored about when the user was active (two timestamps - 1. became active 2. ceased to be active). It is necessary to calculate how many active users there were for a certain period of time.

Everything is complicated by the fact that there is a lot of data and there can also be a lot of requests, you need to deal with it.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Alexander Filippenko, 2020-05-13
@alexfilus

1. You think correctly, only if you are looking for such pairs among ALL users and ALL films, then it will not do without a full scan of all 3 tables. Caching will help, either functional indexes, or some pivot tables controlled by triggers or materialized view.
2. Is Active a bool or a status from a list? In any case, we need an index by city_id and either by the active field, or a partial index where active = true. (I hope you have PostgreSQL)
3. Just where with 2 conditions? Or is there a catch?
Queries should not be complicated here, but you need to provide the right indexes to make it work quickly.
If you need help specifically with queries, create https://www.db-fiddle.com/ with sample data, at least 10 lines each, so that the structure is clear

A
AlexBergal, 2020-05-14
@AlexBergal

The ID of active users must be stored separately. Otherwise, constantly recalculating the index for this field will be a fierce death

R
Roman Mirilaczvili, 2020-05-14
@2ord

1-2 Try graph DBMS. They are better suited for related entities.
guides.neo4j.com/sandbox/recommendations
3. Probably analytical DBMS.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question