Answer the question
In order to leave comments, you need to log in
How to collect all site urls using nodejs?
Hello!
I am writing a parser in node.js. Faced a problem. I can not find a way to pull out all the urls of the site.
For example, there is a site:
example.com and it has various urls inside, for example example.com/article1-100
I want to pull all such addresses into an array and then parse the content through request and cheerio.
I came up with an option when individual parts of the address (article, 1, 2, 100) can be in an array and substituted for the main url during the search process, but this needs to be done for each site.
Is it possible to somehow more universally search for site urls by entering only the main example.com . I looked in the direction of regular expressions, but it's not entirely clear how they can be used here. Tell me plz.
Thanks
Answer the question
In order to leave comments, you need to log in
You can dig through search engines, for example, a Google search for "site:example.com" will show all indexed pages from this site.
The only thing is that there is a limit of 1000 results. But the query condition can be refined by specifying subsections: "site:example.com/some_path/"
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question