Answer the question
In order to leave comments, you need to log in
How to get a list of domains that a given site links to?
Wanted to get a list of sites in a specific geographic area and industry. Idea: take a list of sites that I already know, run a crawler there, it will find out which sites the sites I have link to, then go through them, etc.
However, it turned out that there is no such functionality in your favorite wget. That is, you can force it to download the entire Internet starting from a specific domain, but there is no function to simply provide a list of domains.
Is there a way to get wget to do what I want, or is there some lightweight crawler that can do just that (i.e. find sites linked from the current site)?
Answer the question
In order to leave comments, you need to log in
You look for all the constructions with regular href="([^"]+)"
expressions, rip out the domain from them, save them somewhere.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question