S
S
sivabur2015-05-11 16:33:30
HTML
sivabur, 2015-05-11 16:33:30

Algorithm by which you can find out the price of a product on any page (html)?

Parse micromarkup
if it is not there:
1. parse the number that stands near the rub UAH $ (take the first match in the code)
2. parse the nearest number that is with the text next to the "price" (take the first match in the code)
3. parse class values "price" "cena" (take the first match in the code)
what other options are there and which is better?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
T
tzps, 2015-05-11
@tzps

If we are talking literally about _any_ page, then I'm afraid this problem does not have a simple solution that gives an acceptable level of errors.
This task class is called Named Entity Recognition, and monetary value is one of the common entity classes.
If we are talking about some narrower search field, for example, a certain template of a common online store, then you can get by with parsing with regexps. But usually it's NER, with all its "charms".

Y
Yustas Alexu, 2015-05-11
@Yuxus

1) Instead of "rub." often just "r.", which will complicate the task somewhat. The letter "r." may be next to the old price or next to the amount when buying on credit. However, this figure may be earlier than the real price.
2) The word price can also go before the numbers.
3) This is not always a class, it can be an itemprop attribute, for example, or the word price can be included in a more complex class name, for example item-price. In addition, the word price can be found in the name of the online store itself.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question