M
M
mRelby2021-06-10 17:22:25
Parsing
mRelby, 2021-06-10 17:22:25

Parsing the site (its content) from the web archive. How?

Good day to all!

Actually, the question is right in the title. What is the best way to pull content (or the site itself) from the web archive today?
Perhaps someone has experience, share the buns.

Thanks in advance.

ps. maybe there is some python library for this case. It would be even better.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
W
weranda, 2021-06-12
@weranda

The Wayback Machine Downloader is called a contraption - if you copy everything, and if you parse, that is, take it apart, then there are a lot of options, for example lxml (it seems to be used inside BeautifulSoup and Scrapy).

I
Igor, 2021-06-17
@hurgadan

as an option https://github.com/puppeteer/puppeteer , for site parsing. I really don't know what you mean by "web archive"

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question