M
M
Maxim Osadchy2017-05-21 21:21:52
Parsing
Maxim Osadchy, 2017-05-21 21:21:52

How to scrape all images from a site?

It is necessary to go through all the pages of the site (there is no xml map), find the src for images with a certain class and save them to one folder. Optional - pull the name of the picture from the tag, for example h1.
Strongly do not throw stones, I encountered this task for the first time, I have no experience.
This is not a "do it for me" question, any links/recommendations would be greatly appreciated.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
RidgeA, 2017-05-21
@RidgeA

The algorithm is simple
1. Take the first page of the site.
2. Parse it and find the necessary information and links to other pages of the same site, save it all.
3. Mark the current page as parsed.
4. Go to point 1. with any new link to the site page (p. 2)
Here you can start looking for info from here https://habrahabr.ru/post/301426/

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question