A
A
alexmsk2013-12-17 16:37:36
Parsing
alexmsk, 2013-12-17 16:37:36

How to organize automated data collection from sites?

There is such a task:
Collect reviews and ratings from different services. preferably automated and build reports based on this. Not all sites have an API that allows you to do this, so you will have to rip information directly from the pages.
Of course, you can write a bunch of different parsers, and then edit them for a long time and painfully with each design change, but there is a feeling that there is a ready-made solution for such tasks.
Does anyone know this?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
mikiAsano, 2013-12-17
@mikiAsano

To a similar question to yours, there was an answer https://code.google.com/p/boilerpipe/
I think it will help you.

E
Edward, 2013-12-21
@evsmusic

As an option, in order not to suffer much with the parser itself, if php is used, then I recommend the phpQuery library, for java - jsoup. To the question of how to track changes in the layout: hang up a handler that will say that NULL is in the data, send a letter to the mail, and act. Unless of course there are not many resources to track

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question