How expensive is the GROUP BY() + SUM() operation?

E

entermix2018-07-30 18:47:44

MySQL

entermix, 2018-07-30 18:47:44

Let's say there is a table:
statistics

date (int), user_id(int), param1(int), param2(int), param3(int) ...

Indexes are configured:
uniq_user_id_date
fk_user_id
We make approximately the following selection:

$sql_date_format = '%Y-%m';
// $sql_date_format = '%Y-%m-%u';
// $sql_date_format = '%Y-%m-%d %H:00:00';
// ...
        $statistics_obj = DB::select()
            ->select(
                ['date', 'date'],
                [DB::expr('DATE_FORMAT(FROM_UNIXTIME(`date`), :date_format)', [
                    ':date_format' => $sql_date_format,]), 'date_formatted',],
                [DB::expr('SUM(`param1`)'), 'param1']
            )
            ->from(ORM::factory('Statistic')->table_name())
            ->where('user_id', '=', $user->id)
            ->and_where('date', 'BETWEEN', [$this->_start, $this->_end])
            ->group_by('date_formatted')
            ->as_object()
            ->execute();

Let's assume that the records in the table contain per-minute statistics for each user. What is the probability that such selections will be slow if there are millions of records in the table?
UPD: This information will be used to build graphs (live statistics in the personal account). Graphs and tables (by hours/by weeks/by months/by years)

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

A

ApeCoder, 2018-07-30
@entermix

It doesn't matter how many records are in the table at all. You have a condition by user and date and an index by this pair, respectively, the grouping will be only within the date range and by this user.
The grouping is done by a formatted date, MariaDB is unlikely to be able to figure out that ordering by date will help, so it will most likely group this as an unordered set. I think if you group by original date before grouping can be faster - you can format later.
As an alternative, index the formatted date
Populate the table with data that is approximately similar in distribution and look at the query plan.

M

m0nym, 2018-07-30
@m0nym

What does slow mean in your mind?
What kind of iron?
What indexes?
And guess what - generate random data and check.

L

Lazy @BojackHorseman MySQL, 2018-07-30
Tag

"slowly" is a relative term. it’s impossible for production under load, but for reports it will work once a week

A

Alexander Aksentiev, 2018-07-30
@Sanasol

Sooner or later, the selections become too fat for building reports/graphs/whatever on the fly.
Maybe it's better to immediately choose something more suitable for this
https://habr.com/company/oleg-bunin/blog/329062/