Answer the question
In order to leave comments, you need to log in
How to organize the attribute storage structure?
How to organize the storage structure of object attributes for many different types of goods and services in a relational database: smartphones, hotels, movies, etc.?
Object attributes cannot be put into separate columns, as there are too many of them.
The first option is to use postgreSQL's noSQL features, i.e. store attribute values as json. But there is an unacceptable problem with indexing. GIN indexes work for an exact match. For example:
SELECT
*
FROM
products
WHERE
products.attributes @> '{"length": 10}'
SELECT
*
FROM
products
WHERE
(attributes->>'length')::int4 > 10
SELECT
product_id
FROM
products_attributes
WHERE
attribute_id = 348612462852833281 AND value BETWEEN 1 AND 10
INTERSECT
SELECT
product_id
FROM
products_attributes
WHERE
attribute_id = 348612464425861121 AND value BETWEEN 10 AND 20
INTERSECT
SELECT
product_id
FROM
products_attributes
WHERE
attribute_id = 372655158259580929 AND value BETWEEN 20 AND 30
Answer the question
In order to leave comments, you need to log in
In general, it is rather pointless to argue in isolation from the data. It is necessary to look at specific query plans and execution time.
For example, for the statistics that you cited here somewhere (it seems that the answer has already been lost), I get quite acceptable results (in my opinion, it’s not a fact that you will be satisfied):
create table products (
id serial primary key,
name text not null
);
create table products_attributes (
id serial primary key,
attribute_id int not null,
product_id int not null references products (id),
value double precision
);
-- 10 млн продуктов
insert into products (name)
select 'p' || n from generate_series(1, 10000000) n;
-- в среднем по 10 аттрибутов на продукт
-- всего 20 разных аттрибутов
-- значения - случайные из 1-1000
insert into products_attributes(attribute_id, product_id, value)
select a,
p.id,
trunc(random() * 1000)
from products p, generate_series(1, 20) a
where random() < 0.5;
create index ix_attr_attribute_id on products_attributes(attribute_id);
create index ix_attr_product_id on products_attributes(product_id);
create unique index uk_attr_attr_product on products_attributes(product_id, attribute_id);
create index ix_attr_value on products_attributes(value);
-- немного статистики
select attribute_id, count(*) from products_attributes group by 1 order by 1;
attribute_id | count
--------------+---------
1 | 5001345
2 | 5001937
3 | 4998754
4 | 4998706
5 | 4999357
6 | 5004465
7 | 4999215
...
select product_id, count(*) from products_attributes group by 1 order by 2 desc limit 20;
product_id | count
------------+-------
4769292 | 20
5366802 | 20
7241348 | 20
3019891 | 20
7789046 | 20
1688646 | 19
1585970 | 19
...
SELECT count(*) FROM products_attributes WHERE attribute_id = 1 AND value BETWEEN 1 AND 400;
count
---------
1999212
(1 row)
SELECT count(*) FROM products_attributes WHERE attribute_id = 2 AND value BETWEEN 1 AND 400;
count
---------
1999385
(1 row)
SELECT count(*) FROM products_attributes WHERE attribute_id = 3 AND value BETWEEN 20 AND 30;
count
-------
55318
(1 row)
explain analyze
select * from products
where id in (
select product_id from products_attributes
where attribute_id = 1 AND value BETWEEN 1 AND 400
)
and id in (
select product_id from products_attributes
where attribute_id = 2 AND value BETWEEN 1 AND 400
)
and id in (
select product_id from products_attributes
where attribute_id = 3 AND value BETWEEN 20 AND 30
);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=164603.86..347381.12 rows=2047 width=12) (actual time=680.688..3753.927 rows=2189 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=163603.86..346176.42 rows=853 width=12) (actual time=652.714..3713.617 rows=730 loops=3)
-> Nested Loop (cost=163603.43..341334.66 rows=762 width=12) (actual time=652.439..3294.369 rows=730 loops=3)
-> Parallel Hash Join (cost=163602.86..313081.39 rows=4167 width=8) (actual time=649.437..1742.210 rows=3689 loops=3)
Hash Cond: (products_attributes_1.product_id = products_attributes_2.product_id)
-> Parallel Index Scan using ix_attr_attribute_id on products_attributes products_attributes_1 (cost=0.57..147444.27 rows=775173 width=4) (actual time=0.577..880.079 rows=666462 loops=3)
Index Cond: (attribute_id = 2)
Filter: ((value >= '1'::double precision) AND (value <= '400'::double precision))
Rows Removed by Filter: 1000851
-> Parallel Hash (cost=163306.17..163306.17 rows=23690 width=4) (actual time=647.483..647.483 rows=18439 loops=3)
Buckets: 65536 Batches: 1 Memory Usage: 2752kB
-> Parallel Index Scan using ix_attr_attribute_id on products_attributes products_attributes_2 (cost=0.57..163306.17 rows=23690 width=4) (actual time=18.296..639.541 rows=18439 loops=3)
Index Cond: (attribute_id = 3)
Filter: ((value >= '20'::double precision) AND (value <= '30'::double precision))
Rows Removed by Filter: 1647812
-> Index Scan using uk_attr_attr_product on products_attributes (cost=0.57..6.78 rows=1 width=4) (actual time=0.420..0.420 rows=0 loops=11066)
Index Cond: ((product_id = products_attributes_1.product_id) AND (attribute_id = 1))
Filter: ((value >= '1'::double precision) AND (value <= '400'::double precision))
Rows Removed by Filter: 0
-> Index Scan using products_pkey on products (cost=0.43..6.35 rows=1 width=12) (actual time=0.572..0.572 rows=1 loops=2189)
Index Cond: (id = products_attributes.product_id)
Planning Time: 4.481 ms
JIT:
Functions: 93
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 14.220 ms, Inlining 0.000 ms, Optimization 2.852 ms, Emission 50.889 ms, Total 67.961 ms
Execution Time: 3762.035 ms
(29 rows)
explain analyze
select * from products
where id in (
select product_id from products_attributes
where attribute_id = 1 AND value BETWEEN 1 AND 400
)
and id in (
select product_id from products_attributes
where attribute_id = 2 AND value BETWEEN 1 AND 400
)
and id in (
select product_id from products_attributes
where attribute_id = 3 AND value BETWEEN 20 AND 30
) limit 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2.14..7292.14 rows=20 width=12) (actual time=3.743..47.724 rows=20 loops=1)
-> Nested Loop (cost=2.14..746133.39 rows=2047 width=12) (actual time=3.741..47.714 rows=20 loops=1)
-> Nested Loop (cost=1.70..734511.89 rows=1829 width=12) (actual time=3.313..44.710 rows=20 loops=1)
-> Nested Loop (cost=1.14..666709.45 rows=10000 width=8) (actual time=2.377..42.276 rows=95 loops=1)
-> Index Scan using ix_attr_attribute_id on products_attributes products_attributes_2 (cost=0.57..209246.27 rows=56856 width=4) (actual time=1.250..14.969 rows=511 loops=1)
Index Cond: (attribute_id = 3)
Filter: ((value >= '20'::double precision) AND (value <= '30'::double precision))
Rows Removed by Filter: 46859
-> Index Scan using uk_attr_attr_product on products_attributes products_attributes_1 (cost=0.57..8.05 rows=1 width=4) (actual time=0.053..0.053 rows=0 loops=511)
Index Cond: ((product_id = products_attributes_2.product_id) AND (attribute_id = 2))
Filter: ((value >= '1'::double precision) AND (value <= '400'::double precision))
Rows Removed by Filter: 0
-> Index Scan using uk_attr_attr_product on products_attributes (cost=0.57..6.78 rows=1 width=4) (actual time=0.025..0.025 rows=0 loops=95)
Index Cond: ((product_id = products_attributes_1.product_id) AND (attribute_id = 1))
Filter: ((value >= '1'::double precision) AND (value <= '400'::double precision))
Rows Removed by Filter: 0
-> Index Scan using products_pkey on products (cost=0.43..6.35 rows=1 width=12) (actual time=0.149..0.149 rows=1 loops=20)
Index Cond: (id = products_attributes.product_id)
Planning Time: 9.959 ms
Execution Time: 47.796 ms
(20 rows)
Stop doing wild things, for example, and find out that you should use search engines like elasticsearch for search. And you can also store a relational database in json
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question