Answer the question
In order to leave comments, you need to log in
Sampling of duplicate email addresses in sql?
How can I filter out duplicate database email addresses given Gmail's automatic bullet and punctuation capability?
For example, I want these addresses to be detected as duplicates:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Points, pluses can be in different places, also numbers can be various.
[email protected] and [email protected] are actually the same user.
Answer the question
In order to leave comments, you need to log in
Well, the query, of course, without an index, but you can run it once
SET SESSION group_concat_max_len = 10000000;
select replace(replace(email, '.',''),'+',''), group_concat(email), count(1) cnt
from table
group by replace(replace(email, '.',''),'+','')
having cnt > 1
order by cnt desc
and without a script it is possible, purely sql?Can. You make the actions performed in the script as a function. And add a calculated field to the table with the result of this function. Or use it for grouping.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question