Answer the question
In order to leave comments, you need to log in
What are the ways to "legal" site scraping using Jsoup?
Guys, hello everyone!
I recently started using Jsoup for web scraping.
There was such a question ...
Now I'm just testing the application, roughly speaking, on one connection and pulling out links from the site page.
But then I need to "jump" on each link and carry out some manipulations, i.e. create a connection every time . And I'm afraid that with so many connections, the site can simply "ban" my IP address (yes, what to be afraid of, this has already happened once: D).
The question is - what needs to be done so that I would not be banned ???
1. Someone from colleagues mentioned the use of proxieswhen connecting. And indeed, rummaging through the internet, I found a way to tie a proxy to the connection.
2. And the second colleague said something about using session keys or cookies when connecting, I don’t remember exactly.
Because Since everything is clear with the first method, then I would like to know more information about the second method , if, of course, this answers my question. And of course, if anyone knows what other ways to "legally" communicate with the site and send him a large number of requests using Jsoup, t.s. so that he does not think that I am a bad program :)
I hope I put it correctly. Thank you!
PS Yes, there is an apishka on this site, but I'm interested in working with Jsoup.
Answer the question
In order to leave comments, you need to log in
"send him a large number of requests" - no site will be happy with this, this is extremely logical, so there are such options:
1. Parse when the site has the least traffic (at night, etc., it all depends on the site)
2. Sleep sometimes, i.e. do not hammer the site with requests, but pull out data in pieces
3. Use a proxy, but follow points 1 and 2
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question