2
2
26info2014-01-30 13:49:20
Parsing
26info, 2014-01-30 13:49:20

What tools can you use to save specific pages of sites (your own web archive)?

For my own (or maybe not) needs, I made a bookmarking service www.relater.ru and I want to attach the ability to save a specific page to which a bookmark is being made (in case the final page is not available).
The first thing that came to mind was to stupidly download the page with wget, and then work with it (or leave it as it is in the archive).
... but ideally, I would like to save the page as in Facebook or VKontakte (namely, the text of the article), but I can’t imagine how to “pull out” the content from the page (text of the article) and exclude garbage (header, footer, menu elements).

Answer the question

In order to leave comments, you need to log in

3 answer(s)
2
26info, 2014-02-01
@26info

Solution found - https://github.com/feelinglucky/php-readability , largely influenced by @MonkAlbino's answer

M
Michael Danilov, 2014-01-31
@MonkAlbino

What algorithm does Readability use for extracting... in English SO.
How it is done parsing a hundred ... on Habré from Mail.ru

A
Alexander, 2014-01-30
@covorp

How to save html page using php to .rtf format for later printing?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question