L
L
Larisa .•º2020-11-18 22:30:29
PostgreSQL
Larisa .•º, 2020-11-18 22:30:29

What are the ways to optimize the sum calculation on big data?

I have a request, its essence is simply to sum up the data, according to the given conditions.
Data to count over 5 mil records.
Selection conditions are different, for example:

- selection by sex
- selection by age
- selection by region

If you create a materialization_view, then this option disappears, there is no way to pass the condition.
Are there any other ways to optimize aggregation?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
G
galaxy, 2020-11-18
@barolina

In general, it doesn't exist: based on the wording of the question, you want to access arbitrary records and sum arbitrary columns.
If you slightly reduce the degree of flexibility, you can pre-aggregate data for subsequent analytics (you can say, create a series of materialized views). To make it clearer, let's have a raw sales table (I intentionally make it denormalized, in reality it can be several tables that are joined in the query):

sales
------------------
datetime
client_first_name
client_last_name
client_email
client_region
client_city
client_age
client_gender
product_name
product_category
product_type
product_manufacturer
product_price
quantity
shop_id
shop_region
shop_city
seller_first_name
seller_last_name
seller_department
...

You probably won't regularly report on this table with arbitrary filters and groupings. For example, you want to see data broken down by region/gender/age of the customer, product category/type, store. Then you can make a pre-aggregated table, where there will be a grouping by the required fields:
SELECT client_region, client_gender, client_age, product_category, product_type, shop_id, SUM(quantity), SUM(product_price*quantity)
  FROM sales
 GROUP BY client_region, client_gender, client_age, product_category, product_type, shop_id

Also, if you are not going to filter by time up to a second, but make reports by days at most, you can also group by the datetime field (GROUP BY date_trunc('day', datetime)).
In order not to produce a bunch of tables, you can organize such a storage (usually called Data warehouse ) according to the star scheme .

D
Dimonchik, 2020-11-18
@dimonchik2013

indexes and column base

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question