V
V
VitaliySm2015-02-11 15:25:08
Python
VitaliySm, 2015-02-11 15:25:08

How to scrape many different sites?

There is a spider that needs to go from one general page to pages with different domains. The role of start_urls will always be one page. How to specify allowed_domains? registering domains there is not an option, there can be a lot of them.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
VitaliySm, 2015-02-11
@VitaliySm

the problem was that I specified allowed_domains = ["domain"], and therefore the spider did not go to external resources. specifying allowed_domains = [] in this way solved the problem.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question