Answer the question
In order to leave comments, you need to log in
What tools can you use to save specific pages of sites (your own web archive)?
For my own (or maybe not) needs, I made a bookmarking service www.relater.ru and I want to attach the ability to save a specific page to which a bookmark is being made (in case the final page is not available).
The first thing that came to mind was to stupidly download the page with wget, and then work with it (or leave it as it is in the archive).
... but ideally, I would like to save the page as in Facebook or VKontakte (namely, the text of the article), but I can’t imagine how to “pull out” the content from the page (text of the article) and exclude garbage (header, footer, menu elements).
Answer the question
In order to leave comments, you need to log in
Solution found - https://github.com/feelinglucky/php-readability , largely influenced by @MonkAlbino's answer
What algorithm does Readability use for extracting... in English SO.
How it is done parsing a hundred ... on Habré from Mail.ru
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question