M
M
mkone1122021-05-04 03:11:08
PostgreSQL
mkone112, 2021-05-04 03:11:08

How to quickly and with minimal error count the number of records in a table with billions of records?

I am working on a django project using postgres as the database. The database contains thousands of tables, many of the tables can contain tens of billions of records.
The problem is that count() runs for half an hour.
I used an approximate record count like:

SELECT reltuples FROM pg_class WHERE relname = "table"

But the result is 300 times less than the real one - this is too big an error.
Is there a way to quickly count the number of records without slowing down the write speed?

Answer the question

In order to leave comments, you need to log in

6 answer(s)
T
tested life, 2021-05-04
@krasszen2

Not sure, but maybe it will help ? Also, read the comments there.

B
Boris Korobkov, 2021-05-04
@BorisKorobkov

SELECT reltuples FROM pg_class WHERE relname = "table"

This is the fastest way.
The error can (should) be reduced by doing VACUUM or ANALYZE. See https://www.postgresql.org/docs/9.2/row-estimation... for details.

R
Romses Panagiotis, 2021-05-04
@romesses

TimescaleDB

Похоже, что вам нужна TimescaleDB - оптимизированная СУБД для временных серий.
Как раз вашу проблему должен решить данный запрос (в рамках использования TimescaleDB):
SELECT h.schema_name,
    h.table_name,
    h.id AS table_id,
    h.associated_table_prefix,
    row_estimate.row_estimate
   FROM _timescaledb_catalog.hypertable h
     CROSS JOIN LATERAL ( SELECT sum(cl.reltuples) AS row_estimate
           FROM _timescaledb_catalog.chunk c
             JOIN pg_class cl ON cl.relname = c.table_name
          WHERE c.hypertable_id = h.id
          GROUP BY h.schema_name, h.table_name) row_estimate
ORDER BY schema_name, table_name;

https://github.com/timescale/timescaledb/issues/525
И заодно проверьте другие TSDB.

Added
Try it with the EXPLAIN trick :
CREATE FUNCTION row_estimator(query text) RETURNS bigint
   LANGUAGE plpgsql AS
$$DECLARE
   plan jsonb;
BEGIN
   EXECUTE 'EXPLAIN (FORMAT JSON) ' || query INTO plan;
 
   RETURN (plan->0->'Plan'->>'Plan Rows')::bigint;
END;$$;

https://www.cybertec-postgresql.com/en/postgresql-...
https://wiki.postgresql.org/wiki/Count_estimate
https://www.citusdata.com/blog/2016/10/12/ count-pe...

V
Vladimir Kudrya, 2021-05-04
@Mugenzo

I don’t know how adequate the option is, but what if, in principle, we get rid of the calculation from the database, and become attached to the creations and deletions of the given ones themselves, like if the record is successfully added, then we in a separate plate created specifically to store the number of records do an increment of the number, and if we delete then the decrement

P
Pretor DH, 2021-05-05
@PretorDH

Let's analyze. Google is no help in your case.
How to Define an Auto Increment Primary Key in Pos...
With such large tables it is important that there is index continuity.
This means that deleting records from such tables is bad form. And from this it follows that the number of records is equal to the value of the next index. So it is enough to know the value of the index.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question