E
E
Ernest Faizullin2016-08-16 14:05:07
SQL
Ernest Faizullin, 2016-08-16 14:05:07

How to compare strings in SQL without taking into account stop words?

There is a list of stop words that do not need to be taken into account when comparing sentences, for example:

'концерт'
'группа'
'группы'
'альбом'
'альбома'
'песни'
'презентация'

When comparing the names of concerts, it is necessary that these phrases can be identified as the same concert:
'Группа ZebraHead'
'Песни группы ZebraHead'
'ZebraHead'
'Концерт группы ZebraHead'
'ZebraHead. Презентация альбома'

There are thousands of concert names in the table, and among them there are similar names that need to be identified and combined by assigning group_id (the smallest identifier in the group) to these rows.
I tried the Levenshtein function, but the table has a lot of rows and it works very slowly, and sometimes it just freezes.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
P
Pretor DH, 2016-08-16
@PretorDH

Maybe it's easier to search for a common ZebraHead?
If the text is not long (headings only):
- create a table with the words and their corresponding id.
- throw out stop words from there by crossing tables.
- make substitutions of errors, transliterations, variations.
- group by words.
Or use the experience of search linguistic analyzers article to help you: https://habrahabr.ru/post/114997/
but it will not be fast.
PS But I think we need to slightly change the architecture.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question