P
P
Pesetsu2019-08-10 19:13:25
Search engines
Pesetsu, 2019-08-10 19:13:25

Search on the site, taking into account different spellings of the word (like Google)?

Hello colleagues and visitors.
There is a certain table, it has a column, let's call it title. There is a line where title = Hiroshima and an alternate spelling Hiroshima (Romaji). There is user N, he is looking for the word "Herashima", and he means to find "Hiroshima", but of course he does not find it. Alternatively, you can write "Hiroshima, herashima, hiroshima, or even herashima". There can be hundreds of thousands of such records, prescribing all possible spellings is not an option (firstly, it will take a lot of time, and secondly, you need to know how the user will perceive this or that name by ear in order to take into account all the options).
All this happens primarily because these words are a free transliteration by ear from Asian languages. Mostly from Japanese. There are other reasons, but in this context, it does not matter.
I am looking for an algorithm that will allow me to correct words for the correct ones, as Google does.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
Sergey Gornostaev, 2019-08-10
@sergey-gornostaev

Steven Skiena - Algorithms Design Guide , chapter 8, page 300. But it's better not to reinvent the wheel and just take a full text search engine with support for fuzzy search.

S
skrimafonolog, 2019-08-15
@skrimafonolog

"Since Google does it yourself" - forget it right away.
One of the richest firms in the world, where search is a major part of the business, therefore, the best minds in this field are working on the implementation of search - you can not compete with them.
1) You can connect Google search to your site.
2) You can write the conversion rules yourself and index not the original words themselves, but the words already processed by your rules (and this is only by manually selecting the rules).
How the rules are made
Highload Conference. Why is it not located!
For an example, see the implementation of the soundex algorithm.

A
Arthur, 2019-08-10
@ar2rsoft

Sphinx, ElasticSearch and similar

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question