D
D
DemonIa2018-01-22 19:27:13
natural language processing
DemonIa, 2018-01-22 19:27:13

What is the best algorithm to use to determine string similarity?

Hello.
The essence of the problem: there is an Excel spreadsheet. In columns A and B - the names of sports teams (they are not identical, but similar).
It is necessary to analyze these two columns and remove those values ​​(in one or the other column) for which there is no pair.
Example:
column A
Manchester - Ajax
Bundesliga - Neapoli
column B
Manchester (U 17) - Ajax (U 17)
Dynamo - Chernomorets
In this case, you need to delete the pairs "Bundesliga - Neapoli" and "Dynamo - Chernomorets".
I see two options here - either compare them as strings using the Levenshtein algorithm, or split the strings by spaces into array elements, and compare their "crossings" using the Tanimoto algorithm.
Excel is an example. The final implementation will be either PHP or NodeJS.
Thank you.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question