V
V
Viktor Vsk2014-11-29 01:30:20
Algorithms
Viktor Vsk, 2014-11-29 01:30:20

How is it logically correct to implement entity merging in a working system (many letters)?

Given :
- Online store
- Suppliers of goods that give access to a constantly updated price list. Different suppliers, different price list formats, different product codes. Various product names .
Actually, the question itself :
How to process the case logically, correctly and clearly when:
Today the link site.com/products/1 and site.com/products/2 works
And tomorrow the goods are combined and only site.com/products/1 works
(Not we take into account the algorithm for choosing the most appropriate price / availability / description / characteristics, etc.)
Create a model that will monitor the changes of each pair of products and store redirects in itself? Or is there a more human way?
Detailed description of the task and statement of the question :
Let's imagine that there is a system that constantly downloads prices from all suppliers and creates entities of the PriceItem type in the system.
Let the PriceItem entity have attributes: name, price, stock, supplier, code, for specifics.
Next, we want these PriceItems to become normal Products that the user will see on the site. Let's imagine for simplicity that filling in a product with photos, descriptions, characteristics, filters, etc. will be done manually already on the finished Product entity.
Let's reduce our task to the need for the Product to constantly keep the price and stock attributes up to date (Based on the PriceItem values).
It would seem that you just need to create a Product entity for each PriceItem. But here the questions begin. Different suppliers may have slightly different names for the same product ( Macbook 13" laptop and Macbook Air ), and we would like only one Product entity to exist in the system, but it was associated with two PriceItem entities from different suppliers.
Since the names are not identical, the usual comparison is not enough here, some metrics are needed.Imagine that we have implemented an algorithm for comparing names.I
see two "fundamental" solutions:

1) Each time we add a new PriceItem to the system, we go through the entire Product list and try to determine the matching entities. Not found - create a new Product based on PriceItem, found - add PriceItem to the existing Product.

2) When processing price lists from suppliers, create a Product for each PriceItem. Later, in the background, in parallel, slowly constantly look for matches and match PriceItem, linking it to the Product.

Cons of the first solution:
1) Matching similar products blocks the processing of supplier prices (which should take place regularly and the more often / faster, the better)
2) Still, this is a rather lengthy operation, which is somehow wrong to perform when adding each new PriceItem entity, which could run into the millions.
Pros:
1) It would seem that with this approach, the probability of the appearance of two different Products, which will be the same product, just from different suppliers with a slightly changed name, is extremely small. However, it seems to me that not everything is so good and the algorithms still undergo changes over time, and if today there was a decision to attribute 2 PriceItem to Product A, then tomorrow there is a chance that after changing the algorithms one of them will still belong to Product B (that is, , not such a big gain in accuracy, in my opinion)
Pluses of the second approach:
- It is possible to make the process permanent and parallelize as much as possible.
If I'm too clever and there are simpler options for implementing the "fundamental" approach of mapping PriceItem to Product, I'll be glad to hear.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
DancingOnWater, 2014-11-30
@DancingOnWater

Alas, not an expert in this area, but your problem is similar to BigData.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question