M
M
maxilian2019-10-16 18:31:43
SQL
maxilian, 2019-10-16 18:31:43

Sampling of duplicate email addresses in sql?

How can I filter out duplicate database email addresses given Gmail's automatic bullet and punctuation capability?
For example, I want these addresses to be detected as duplicates:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Points, pluses can be in different places, also numbers can be various.
[email protected] and [email protected] are actually the same user.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
F
FanatPHP, 2019-10-16
@maxilian

Well, the query, of course, without an index, but you can run it once

SET SESSION group_concat_max_len = 10000000;
select replace(replace(email, '.',''),'+',''), group_concat(email), count(1) cnt
from table 
group by replace(replace(email, '.',''),'+','') 
having cnt > 1
order by cnt desc

K
Konstantin Tsvetkov, 2019-10-16
@tsklab

and without a script it is possible, purely sql?
Can. You make the actions performed in the script as a function. And add a calculated field to the table with the result of this function. Or use it for grouping.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question