How are collisions resolved in PostgreSQL hashes?

B

beduin012021-05-07 18:01:34

PostgreSQL

beduin01, 2021-05-07 18:01:34

If the index type is hash, what happens if a collision occurs? How does the DB handle this situation?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

R

rPman, 2021-05-07
@beduin01

A hash index for a database does not require hash uniqueness (i.e. it can be a 2-byte hash with 65536 values for the database in a million rows for each value there will be many collisions), this mechanism is needed more to divide the effort of scanning the entire table by the possible number of hash values, i.e. in the index for records with the same hash, the query will go through all of them (if the hash has a collision for id equal to 1 and 10, then this hash will have 2 records in the index table and when searching for the right one, both of them will be scanned).
here or here there is a bit about it

D

Dimonchik, 2021-05-07
@dimonchik2013

no way, but what are the collisions? request - hash - return by hash, you yourself sacrifice accuracy for the sake of speed
where accuracy is B-Tree

M

Melkij, 2021-05-07
@melkij

When writing data, tuple ids are simply stored as they are in the index. It doesn't matter if it's a different version of the string or if the hash function has collided with other data.
When reading hash, the index tells the executor that the returned tuple id must be rechecked against real data (sets xs_recheck for indexscan or selects bitmap scan with recheck condition) - the mechanism lossy search.