N
N
Nikita Kolosov2016-03-30 11:02:02
PostgreSQL
Nikita Kolosov, 2016-03-30 11:02:02

How to select elements from a table that are not contained in the second one?

There is a table x, with id and title fields, there is a table y with id, x_id fields.
Accordingly, it is necessary to select all id's from table x that are not in table y.
Decision

select id from x where id not in (select x_id from y);

very slow (records> 1000000 both there and there).
Question: is it possible to somehow select the desired data quickly?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
E
Evgeny Bykov, 2016-03-30
@bizon2000

To increase overall performance, you need to force RDBMS to use merge, and not hash join and not nested loop using an index.
To do this, we merge two tables

SELECT id FROM x
UNION ALL
SELECT id FROM x
UNION ALL
SELECT x_id FROM y

In this set, each id appears 1 time (if it is in y and not in x), 2 times (if it is in x and not in y), and 3 times (if it is in both tables).
Then we group and select those groups that contain exactly two records
SELECT id
    FROM (SELECT id FROM x
          UNION ALL
          SELECT id FROM x
          UNION ALL
          SELECT x_id FROM y
         )
    GROUP BY id
    HAVING COUNT(*) = 2

Such a query does not require indexes and will be very efficient even on very large tables
. Of course, the decision is based on the assumption that id is unique in table x, and x_id is unique in table y. If id is not unique in table x, then it should be done
instead
. The same applies to table column x_id column y

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question