Answer the question
In order to leave comments, you need to log in
How to automatically recognize the product name?
Good day!
There is a base of goods (~30,000) tires for cars.
In the existing database of products, the names are something like this: Michelin Pilot Super Sport 295/35 R19 104Y
And there are price lists of suppliers, usually in xls format, but the format of product names for each supplier is different, for example , 295/35 R19 Michelin Pilot Super Sport 104Y or 295/35R19 104Y Michelin Pilot Super Sport or other variations. Plus, the supplier can add some internal designation to the product name or shorten the full model name with a small abbreviation (for example: Michelin Pilot Alpin 4 275/30 R20 97W , abbreviates as Michelin PA4 275/30 R20 97W , etc.)
The task is to automatically correctly find products in the database, using the names from the price lists of suppliers.
Perhaps there is some tool, API for competent search. I thought about using the Ya.Market API to search for product names. But I am in Ukraine, and at the moment this service is blocked for us ...
Thank you for your attention, I hope for help.
Screenshots with fragments of the assortment of several price lists of suppliers, for example:
Answer the question
In order to leave comments, you need to log in
The first thing I would do is collect a database of brands (eg Michelin, Pirelli, etc.). Next, you need to parse the tire parameters: 275/30 R20, thereby narrowing the search. The matter remains small).
PS
If that's the only problem, then it's not a problem at all :)
The task is quite interesting, but I can not think of a clear solution.
In general, there should be an article, but if it is not there, then I think it would be more expedient to solve it through the compilation of a full-fledged unified database.
You can do it in several ways:
1) Add a table to the existing database with the matches "Price name" -> "product id" (I think the simplest and most accurate option, but rather laborious, i.e. when a new tire appears, you will need to supplement the table )
2) Parsing a string (for example, regular expressions) and searching the database in parts + searching by abbreviation (a time-consuming option, but collisions are possible with automatic compilation of abbreviations)
Is it possible to get a piece of the database for tests? Maybe we can think of something.
MYSQL full-text search, of course, 1 to 1 will not give a result, but still.
You can try splitting the product name into words and generating permutations.
You still have to analyze the "templates" of incoming product names. Maybe there are not many of them at all.
Write a separate module for each supplier, which will parse the corresponding tables and eventually bring everything into your database with a well-thought-out structure. Although parsing xls is still a pleasure........ You can automate their conversion to csv or xml - it will be easier there.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question