A
A
ace of spades2019-04-23 17:06:18
Cataloging
ace of spades, 2019-04-23 17:06:18

How to bring the prices of suppliers to a single form?

Here's what is:
1. There are three price lists from different suppliers (there may be more in the future).
2. Each column has two columns: product name and price. Approximately 10,000 items.
3. All tables have the same products, but the names are different. For example: in one “Mars chocolate”, in another “Mars chocolate”, or “iphone6 ​​smartphone” and “i phone 6 phone”. That is, as a permutation of words, as well as spaces, abbreviations, and so on.
Task: find the same products and give them a single identifier in all three tables.
Problem: how to determine matches as accurately as possible?
As I think, this can be solved: we take one table as a basis and look for matches in other tables by words (with a permutation of letters and a match up to n number of letters, but do not touch the numbers). Plus, we use prices as a matching criterion, that is, we set a threshold: it should not differ by more than, say, 100 rubles.
Or calculate the percentage deviation, by words and by price.
This is real? Or is it easier, and I'm confused? Maybe there are ready-made solutions?
And the ultimate goal is this: in the online store on WP, update the price tags for the smallest of all price lists.
If you have to order a script/plugin from a programmer, what would be the approximate price tag? If you do it in essence (it's like ML) and the ability to load additional prices.
Thanks to everyone who cares!

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Alexander Denisov, 2019-04-23
@Grinvind

Try this: https://habr.com/ru/post/428814/

P
Pychev Anatoly, 2019-04-23
@pton

I think any algorithm will give an error
As an option: Remove all spaces and non-alphanumeric characters, bring to one case and add character codes. You will get a weak likeness of a hash. in this way, all matches can be found, regardless of the permutation of words. If there are "chocolate" and "chocolate" in the positions, then this method will show a mismatch. To take into account this difference, you can run a preliminary auto-replacement of single-root words for some specific value.
Well, then you can come up with different auto-replacements to get closer to the ideal.
But the possibility of error still remains.

A
Antonio Solo, 2019-04-29
@solotony

1) run and merge strong matches
2) for non-strict matches, make a "gluer" - a workstation for viewing and building links 10,000 a bit.
5cc72212d22ed928168436.png

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question