A
A
Alexey Zhurbitsky2013-03-26 11:46:21
MySQL
Alexey Zhurbitsky, 2013-03-26 11:46:21

MySQL query how?

The data is in the following form:

+------------+------+-------+------+
| date       | uid  | page  | hit  |
+------------+------+-------+------+
| 2013-03-25 | 1    | main  | 10   |
+------------+------+-------+------+
| 2013-03-25 | 1    | about | 2    |
+------------+------+-------+------+
| 2013-03-25 | 2    | main  | 1    |
+------------+------+-------+------+
| 2013-03-26 | 1    | main  | 3    |
+------------+------+-------+------+

dump
CREATE TABLE page_stat(
    `id` int NOT NULL auto_increment,
    `date` date,
    `uid` int,
    `page` varchar(64),
    `hit` int,
    PRIMARY KEY (`id`)
);

INSERT INTO page_stat (date, uid, page, hit) values
('2013-03-25', 1, 'main', 10),
('2013-03-25', 1, 'about', 2),
('2013-03-25', 2, 'main', 1),
('2013-03-26', 1, 'main', 3);

Those. for each day, a record is kept of how many users visited a particular section.
With the count of visits to each section, everything is clear:
SELECT page, sum(hit) hits FROM page_stat GROUP BY page;

Result
+-------+------+
| page  | hits |
+-------+------+
| about |    2 |
| main  |   14 |
+-------+------+

Now you need to count the number of unique users for each section.
The result should be like this:
+-------+------+
| page  | uniq |
+-------+------+
| about |    1 |
| main  |    2 |
+-------+------+

The maximum that was possible to come up with is
SELECT page, count(DISTINCT uid) uniq FROM page_stat GROUP BY page;

But such a request on real data is processed for more than a minute.
How much I understand in such request for each page implicitly becomes What else there are options to charge the data on unique users? Modification of the data storage structure is allowed. UPD Needed an index (page, uid)
SELECT DISTINCT uid FROM page_stat WHERE page=...

Answer the question

In order to leave comments, you need to log in

3 answer(s)
G
gaelpa, 2013-03-26
@blo

No indexes? As far as I understand, an index on (page, uid) should speed up your query.

S
Stdit, 2013-03-26
@Stdit

If all statistics queries are known in advance, pre-aggregation by denormalization on triggers can be useful. For example, assign an after insert trigger to the page_stat table, which writes a record to the page_uid (page, uid) table, if it does not already exist. Requesting page_uid will already be easier, although recalculation will still be required. To avoid it, including, you can also put an after insert trigger on this table, which, in the event of a new record, increments the counter of a specific page in the third table page_uniques_count (page, count). The data that is already in the database will need to be run through the query before activating the triggers and entered into the service tables. The situation is complicated by the introduction of requests for the number of uniques for a certain (not arbitrary) period of time (for example, per day, month). It is really possible to organize it accordingly by adding a date rounded to the desired interval in the service tables. For each such interval, a separate counter table may be required, but larger intervals can be obtained from small ones (for example, by months from daily) by classical group by aggregation, which in this case will work much faster than with raw data from page_stat.

U
UZER2006, 2013-03-26
@UZER2006

Have you tried nested queries with grouping? I'm rather weak in this matter, I'm not sure what will help, but suddenly.

SELECT page, COUNT(uid) AS uniq FROM (SELECT page, uid FROM page_stat GROUP BY page,uid) AS A GROUP BY page

In general, in a similar case (some joint statistics CDRand another table), I made a separate table with the necessary data, which was updated by a trigger on INSERTin CDR.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question