N
N
nokimaro2019-10-29 09:41:56
MySQL
nokimaro, 2019-10-29 09:41:56

Sorting pages by the number of views per day, week, month?

I'm trying to solve the problem of building top pages based on views per day/week/month.
There are 20 million pages in total and about 1 million views are made per day.
The data is stored in MySQL and therefore the solution of the problem is on it.
Below is the current solution and its disadvantages.
And the question is, is it possible to do it differently (more optimally) on MySQL / SQL, or do it better using other technologies? Have you had experience in solving similar problems and what did you use?
The current decision such:
Is 2 tables, actually pages (items) and . a table for counting the number of page views for a given day.
In order not to make an extra JOIN, denormalization was done and the view counters for sorting items were duplicated in the items table.
I can display items with a simple query with sorting

ORDER BY items.count_views_day DESC
ORDER BY items.count_views_week DESC
ORDER BY items.count_views_month DESC

Simplified table structure
CREATE TABLE `items` 
( 
     `item_id`           BIGINT NOT NULL, 
     `count_views_day`   INT NOT NULL, 
     `count_views_week`  INT NOT NULL, 
     `count_views_month` INT NOT NULL, 
     PRIMARY KEY (`item_id`)
) 

CREATE TABLE `item_views` 
( 
    `item_id`       BIGINT NOT NULL , 
    `day_timestamp` INT NOT NULL , 
    `count_views`   INT NOT NULL , 
)

When a pageview is done, I do an UPDATE of the counters in items and I do an INSERT or UPDATE of the counter in item_views.
Once a day, I update the counters in the items table by summing the data from item_views for the required period.
In fact, it turns out that the top is built by the number of views for the desired period (day/week/month) + (today's views).
The disadvantage of my solution is that the daily update of the counters in items causes locks and takes a long time.
Update request below. Repeats 3 times for a day/week/month, by changing the dates in the BETWEEN condition
UPDATE items SET count_views_week = (
   SELECT IFNULL(SUM(item_views.count_views), 0)
   FROM item_views
   WHERE ( 
     item_views.day_timestamp BETWEEN ... AND ...
   ) AND item_views.item_id = items.item_id
)

Locks are tightened, because while items.count_views_week is being updated, all other requests to increase the counter are blocked. In fact, the query will update every row in items, which is currently 20 million rows.
I would be glad to hear your opinion and suggestions for optimization.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Andrey, 2019-10-29
@VladimirAndreev

Do you need top pages in real time?
Can't count once an hour?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question