Answer the question
In order to leave comments, you need to log in
How to organize automated data collection from sites?
There is such a task:
Collect reviews and ratings from different services. preferably automated and build reports based on this. Not all sites have an API that allows you to do this, so you will have to rip information directly from the pages.
Of course, you can write a bunch of different parsers, and then edit them for a long time and painfully with each design change, but there is a feeling that there is a ready-made solution for such tasks.
Does anyone know this?
Answer the question
In order to leave comments, you need to log in
To a similar question to yours, there was an answer https://code.google.com/p/boilerpipe/
I think it will help you.
As an option, in order not to suffer much with the parser itself, if php is used, then I recommend the phpQuery library, for java - jsoup. To the question of how to track changes in the layout: hang up a handler that will say that NULL is in the data, send a letter to the mail, and act. Unless of course there are not many resources to track
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question