Answer the question
In order to leave comments, you need to log in
What program or service for comparing words and phrases can be used to compare a large amount of data?
There is a database in excel with the correct titles of 100,000 books and a second database with requests from users who are looking for these books, but not always naming them correctly, including with grammatical errors. What program or service should be used to compare these two tables and find the closest names to the correct ones?
Answer the question
In order to leave comments, you need to log in
Write a fuzzy search algorithm yourself or use existing ones.
Try to compose hashes like this:
Example: Cognitive
Hash: onepzvtl:14
[in order of the most frequent character][etc. of remaining]:[total number of characters]
Input: cognitive
hash: shepzvTile:14
If not matched, move left 1 character at each iteration:
1. sheepzvtl:14 == shepzvtl:14 - not found
2. sheepzvtl == shepzvtl - not found
3. sheepzvtl == sheapzvtl- not found
....
N. she == she - FOUND (something else might be found).
We look from the results closest in number of characters. In the example: by the 14th.
Displaying the first N-matches... For example, the first 5 similar...
elasticsearch.org
sphinxsearch.com
xapian.org
PostgreSQL ( pg_trgm , fts )
abandon tables and display hints for similar names / topics, etc.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question