K
K
Ken Jee2017-09-24 22:18:25
PHP
Ken Jee, 2017-09-24 22:18:25

How to determine the similarity of the product name from different suppliers?

The problem is this. The product range of the online store is formed on the basis of several excel files from different suppliers. Suppose in the file of supplier "A" there is a product "Vacuum cleaner Typhoon black", and in the file of supplier "B" there is the same product, but with a slightly changed name "Vacuum cleaner Typhoon (color: black)". How to identify the similarity of goods in order to minimize the presence of identical (duplicate) commodity items in the database?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
I
Ilya Gerasimov, 2017-09-24
@Machez

well, for example, indexing a directory with sphinx and displaying possible duplicates at the time of import, or similar search tools

T
TyzhSysAdmin, 2017-09-24
@POS_troi

It was a long time ago, somewhere in the early 2000s.
Uploading to the catalog was carried out from 1C 7.7., Previously, all the price lists of suppliers were loaded into this same 1C, for each supplier there was a directory of synonyms.
"Vacuum cleaner Typhoon (color: black)" => "Vacuum cleaner Typhoon black".
When importing a price list, the names were simply checked in the directory of synonyms and brought to a general view.
A very stupid decision in the forehead, but fortunately they were "car tires" and more or less everyone had the same.

E
Eugene, 2017-09-25
@zolt85

In our project, we use the comparison of two strings using the "Levenshtein Distance" to prevent duplicates from being entered into the directory of organizations, because someone writes Rosneft, someone ROSNEFT, someone puts quotes in the name, and someone doesn't, and that's it in that spirit. Quite a working option.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question