What program or service for comparing words and phrases can be used to compare a large amount of data?

L

leks3252014-08-28 10:08:25

big data

leks325, 2014-08-28 10:08:25

There is a database in excel with the correct titles of 100,000 books and a second database with requests from users who are looking for these books, but not always naming them correctly, including with grammatical errors. What program or service should be used to compare these two tables and find the closest names to the correct ones?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

X

xmoonlight, 2014-08-28
@leks325

Write a fuzzy search algorithm yourself or use existing ones.
Try to compose hashes like this:
Example: Cognitive
Hash: onepzvtl:14
[in order of the most frequent character][etc. of remaining]:[total number of characters]
Input: cognitive
hash: shepzvTile:14
If not matched, move left 1 character at each iteration:
1. sheepzvtl:14 == shepzvtl:14 - not found
2. sheepzvtl == shepzvtl - not found
3. sheepzvtl == sheapzvtl- not found
....
N. she == she - FOUND (something else might be found).
We look from the results closest in number of characters. In the example: by the 14th.
Displaying the first N-matches... For example, the first 5 similar...

Y

yttrium, 2014-08-28
@yttrium

elasticsearch.org
sphinxsearch.com
xapian.org
PostgreSQL ( pg_trgm , fts )

M

Mikhail Lyalin, 2014-08-28
@mr_jok

abandon tables and display hints for similar names / topics, etc.