How to develop an algorithm that will choose for each organization of the base X from several hypotheses only one best one?

A

Alex Lee2017-04-10 07:34:27

Python

Alex Lee, 2017-04-10 07:34:27

Hello.
Your help is needed. In which direction to read, study and try.
The task is as follows:
There is a csv-file with data in which hypotheses are collected about the correspondence between data about organizations in the X database and data about organizations from another source. There are several columns in the file (id, name, address, r_id, r_name, r_addrees). The prefix 'r_' is data about organizations from another source.
As I understand it, you need to use the name and address columns. The id has no effect on the result.
Am I considering the "Fuzzy Wuzzy" library to solve this problem or are there other options?
UPD :
Problem solved. As expected, the Fuzzy Wuzzy library helped me with this. For faster processing use python-Levenshtein.
Detailed description of my solution on my page .

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dimonchik, 2017-04-10
@dimonchik2013

https://habrahabr.ru/post/106207/ and related articles in the same place, there are about 5-8 of them