M
M
Mikhail Maximov2019-04-04 17:32:09
Parsing
Mikhail Maximov, 2019-04-04 17:32:09

How to write a program to track changes on a website?

Good day to all!
There is an Internet portal on which the register of organizations is located. Placement in the form of a table. Changes in the register occur as the market situation changes. Intros/exclusions, etc.
Where to google and what to study in order to write a program that would monitor changes on the site, and then be able to process this information? What is the best way to write? Are there ready-made solutions for such purposes?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
sim3x, 2019-04-04
@mmaximov97

beutifulsoup - do not use
If the server is normal - it will return the time of the last change
If there is no
requests
we take the page
we make a hash from the content, check if the page has changed we
make a database (a set of json, yaml files)
scrapy
if there are many (thousand) pages,
most of it is done by the
HTML Parser itself -lxml

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question