What is a fast database for an online store with more than 50 thousand products and search?

A

Andrew2014-06-24 15:53:14

Python

Andrew, 2014-06-24 15:53:14

Good afternoon!
I am writing a project on Django, an online store with a large catalog of products from 50 thousand. The search will be carried out on a variety of parameters.
Now the structure for searching in the form of an array in postgres is used, with a search on it. The search itself, in principle, is carried out quickly, but the output of available options for the search is collected for a very long time.
An example is Yandex Market. He poked at the phones - he brought them out, and on the right, all the available options for the parameters.
I use redis for caching, but no matter how ridiculous it sounds, it only makes it worse, radishes are not designed for caching large objects.
Any ideas?
I don't use EAV. I select products by type tags @> ['param1', 'param2', 'param3'
SELECT unnest(tags) FROM items WHERE tags @> ['param1', 'param2', 'param3'] GROUP BY 1
24 GB RAM. Search parameters from 1 to 6 at once. The more parameters the user immediately specifies, the faster everything is searched. ManyToMany is bad because the "diameter" parameter, for example, can have 10-15 values at once. If I want to find a product that has a diameter of 10, 11, 12, this makes a join for each of the diameters, which slows down the system a lot.
It was all redis and a list of over a million items in it. I overlooked it, but there are positive results - goods without any caches are shipped in 350-400 ms along with available filters and 200-250 without filters, only goods.
Are there prospects for even more acceleration?)
To all those who suffer - solr + facets and you will be happy,

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

D

Dimitriy, 2014-06-24
@xmdy

Make a nested query before using unnest,

Subquery

explain (analyze) select unnest(tags) from (SELECT tags FROM t5 WHERE tags @> array['param-1'] group by 1) as t group by 1;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=14655.30..14660.38 rows=1000 width=84) (actual time=490.721..490.722 rows=10 loops=1)
   ->  Subquery Scan on t  (cost=14647.62..14652.80 rows=1000 width=84) (actual time=490.690..490.706 rows=55 loops=1)
         ->  HashAggregate  (cost=14647.62..14647.73 rows=10 width=84) (actual time=490.676..490.676 rows=10 loops=1)
               ->  Seq Scan on t5  (cost=0.00..13459.00 rows=475450 width=84) (actual time=0.033..181.649 rows=475324 loops=1)
                     Filter: (tags @> '{param-1}'::text[])
                     Rows Removed by Filter: 24676
 Total runtime: 490.843 ms
(7 rows)

original

explain (analyze) SELECT unnest(tags) FROM t5 WHERE tags @> array['param-1'] group by 1;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=368857.88..368862.95 rows=1000 width=84) (actual time=1196.758..1196.759 rows=10 loops=1)
   ->  Seq Scan on t5  (cost=0.00..249995.38 rows=47545000 width=84) (actual time=0.020..680.547 rows=2501231 loops=1)
         Filter: (tags @> '{param-1}'::text[])
         Rows Removed by Filter: 24676
 Total runtime: 1196.790 ms
(5 rows)

с set enable_seqscan to off;

explain (analyze) select unnest(tags) from (SELECT tags FROM t5 WHERE tags @> array['param-1'] group by 1) as t group by 1;
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=19433.16..19438.24 rows=1000 width=84) (actual time=411.248..411.249 rows=10 loops=1)
   ->  Subquery Scan on t  (cost=19425.49..19430.66 rows=1000 width=84) (actual time=411.220..411.233 rows=55 loops=1)
         ->  HashAggregate  (cost=19425.49..19425.59 rows=10 width=84) (actual time=411.205..411.206 rows=10 loops=1)
               ->  Bitmap Heap Scan on t5  (cost=5084.74..18236.86 rows=475450 width=84) (actual time=74.696..126.809 rows=475324 loops=1)
                     Recheck Cond: (tags @> '{param-1}'::text[])
                     ->  Bitmap Index Scan on t5_tagx_gist  (cost=0.00..4965.87 rows=475450 width=0) (actual time=73.514..73.514 rows=475324 loops=1)
                           Index Cond: (tags @> '{param-1}'::text[])
 Total runtime: 411.337 ms
(8 rows)

initial data:

create table t5 as (select i id, rand_array() tags from generate_series(1,500000) as i);
create index t5_tagx_gin on t5 using gin (tags);

create or replace function rand_array() returns text[][] as $$
begin
 return (select array_agg(t1.col1) from (select format('param-%s', i) col1 from generate_series(1,rand(10)) as i) as t1);
end;
$$ language 'plpgsql' strict;

PS, i5 2410M 8gb, PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu

O

OnYourLips, 2014-06-24
@OnYourLips

Postgres
For products and their parameters, type JSON to not use EAV.

F

FacedSID, 2014-06-29
@FacedSID

I don’t know why EAV didn’t please you. At me the basis of values on each field lies in different tables. When there is a selection by one parameter, then the table that is responsible for a specific parameter is connected. It turns out that when searching by three parameters, for example, three tables are connected, the total volume of which is 3000 records, and not one table in which there are about 200,000 records (and several times for each parameter) and this is quite acceptable + caching of SQL queries and the objects themselves in apc.