B
B
bychok3002017-01-18 12:00:36
Java
bychok300, 2017-01-18 12:00:36

How to organize automatic parsing?

I have a Java web application that parses html. But it parses only 1 site per visit, how can I make it so that I would just take a list of urls, tags that will be parsed by the end of work?
As I understand it, by and large, you can simply make a collection with urls and iterate over this collection each time opening the url and scavenging what you need

Answer the question

In order to leave comments, you need to log in

2 answer(s)
P
protven, 2017-01-18
@protven

Don't reinvent the wheel, use standard web scraping frameworks, the so-called crawlers.
For example - https://github.com/yasserg/crawler4j or here https://github.com/DigitalPebble/storm-crawler
, into as many threads as needed.

E
emp1re, 2017-07-21
@emp1re

Clustering / Scaling well, as the streams have already suggested

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question