How to exclude duplicate records?

4

4ainik2017-09-29 17:21:55

MySQL

4ainik, 2017-09-29 17:21:55

Not a trivial task:
In general, there are two tables, identical in structure, but different in content, i.e. almost different, it is only obvious that there are differences (the number of records is different), so you need to actually find out the scale of the disaster (differences) for which you actually need to exclude the same records (that is, available in both tables at the same time).
The problem is that the number of records is about 1 million in each table :(
Is there a way to filter out discrepancies by sql query? If so, how to do it?
I know that there is a way to join tables like:
select * from t1,t2 inner join ... but this is not quite the same and the task is the opposite.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

I

Immortal_pony, 2017-09-29
@4ainik

This is how you can get data from the table some_table that is not in the table other_table .

SELECT 
    `some_table`.*
FROM 
    `some_table`
     LEFT JOIN `other_table` ON (`other_table`.`id`=`some_table`.`id`)
WHERE 
     `other_table`.`id` IS NULL

This is not a problem, especially if the table is indexed.

G

gill-sama, 2017-09-29
@gill-sama

If the data structure is the same, then you should use it ; it
works by analogy with union, only the other way around, it removes intersecting data from the sample, which is what you need

T

terrier, 2017-09-29
@terrier

SELECT * FROM t1 LEFT JOIN t2 USING (<some_field>) WHERE t2.<some_field> IS NULL

(and the same for t2 )
JOIN should be executed noticeably faster than EXCEPT, which is similar in meaning (however, check on real data)