How smart web crawler works?

M

Mors Superberg2017-12-14 15:11:46

Web development

Mors Superberg, 2017-12-14 15:11:46

People, hello! There was a task to write on php || python server web-crawler that will surf the web, collect links, just all the links it finds.
It became interesting how this is implemented, if we stupidly download pages and pull links regularly, it will be so-so, frankly, since the site can load all links via ajax (page body). Or there are sites with infinite loops that kill like software (when you go to the site, a working link is automatically generated that leads to the site with the same dynamically generated link, and so on ad infinitum). Can you advise a ready-made solution, or explain how best to do it all? thanks))))

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

F

Fixid, 2017-12-14
@NooooN

Selenium, and then write your logic