How to optimize SELECT COUNT query?

A

Anton2016-11-14 11:27:21

PostgreSQL

Anton, 2016-11-14 11:27:21

I always thought that if you specify a field in the COUNT body, then the request will run faster. Decided to check it out and was surprised:
SELECT count(*) FROM public.news;

1 row retrieved starting from 1 in 252ms (execution: 245ms, fetching: 7ms)
1 row retrieved starting from 1 in 231ms (execution: 227ms, fetching: 4ms)

SELECT count(id) FROM public.news;

1 row retrieved starting from 1 in 343ms (execution: 340ms, fetching: 3ms)
1 row retrieved starting from 1 in 300ms (execution: 296ms, fetching: 4ms)

The request in which the field is specified always has more time. Fulfilled the request several times with a small (several seconds) interval.
And now I have a question: how to optimize the query by counting the number of rows so that it runs as quickly as possible?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Alexey Nemiro, 2016-11-14
@AlekseyNemiro

SELECT COUNT(*) FROM - iterate over all rows.
SELECT COUNT(id) FROM - iterate over all rows in which the specified field (in this case id ) has a value other than NULL .
Without specifying fields - the best option for PostgreSQL :
https://wiki.postgresql.org/wiki/Slow_Counting
If everything is really bad, then as an option, you can make your own counter.
Here is an excerpt from the PostgreSQL Wiki in Russian:
Because the index is not used. PostgreSQL performs a visibility check on each record and thus performs a sequential scan of the entire table. If you want, you can track the number of rows in the table using triggers, but this will slow down write operations to the table.
You may get some appreciation. The reltuples column in the pg_class table contains information from the result of the last ANALYZE statement on that table. On a large table, the accuracy of this value is thousandths of a percent, which is quite sufficient for many purposes.
The "exact" result of count will often not be accurate for a long time anyway; due to MVCC concurrency, count will only be accurate at the time a running SELECT count(*) query is invoked (or be limited by the transaction isolation levels of that transaction), and may be out of date by the time the query completes. In a table-modifying transaction running all the time, two calls to count(*) that complete at the same time may show different values if the modifying transaction completed between their invocations.
https://wiki.postgresql.org/wiki/FAQ_...

S

Sergey Gornostaev, 2016-11-14
@sergey-gornostaev

There is a hack. Very fast, but slightly inaccurate.
For example, in a table with 31560 rows, it just gave me the number 31558, but it ran 20 times faster. The accuracy depends on how often the ANALYZE table is run.