Answer the question
In order to leave comments, you need to log in
How to develop an algorithm that will choose for each organization of the base X from several hypotheses only one best one?
Hello.
Your help is needed. In which direction to read, study and try.
The task is as follows:
There is a csv-file with data in which hypotheses are collected about the correspondence between data about organizations in the X database and data about organizations from another source. There are several columns in the file (id, name, address, r_id, r_name, r_addrees). The prefix 'r_' is data about organizations from another source.
As I understand it, you need to use the name and address columns. The id has no effect on the result.
Am I considering the "Fuzzy Wuzzy" library to solve this problem or are there other options?
UPD :
Problem solved. As expected, the Fuzzy Wuzzy library helped me with this. For faster processing use python-Levenshtein.
Detailed description of my solution on my page .
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question