M
M
Matthew7772021-10-25 23:35:40
Search engines
Matthew777, 2021-10-25 23:35:40

Where do search robots get URLs to crawl?

I have no idea where search engines get page addresses for parsing and indexing? Do they brute force, or is there a certain place where all the addresses of pages published on the web are located? Can I find out all available pages for a specific domain?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
DevMan, 2021-10-25
@Matthew777

users themselves upload a list of pages on their site for initial indexing.
plus, the robot periodically parses links from pages that it already has in the index, and then runs through them, etc.

D
Dmitry, 2021-10-26
@pro100taa

I will add that in some cms (Wordpress) there is a built-in ping that reports to the update services when creating and updating pages. Link .

W
Whois, 2021-10-26
@andrey_id123456789

Can I find out all available pages for a specific domain?

This is called a sitemap, for example:
https://qna.habr.com/sm-questions.xml

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question