B
B
best_santa2014-09-22 09:24:16
HTML
best_santa, 2014-09-22 09:24:16

How to copy useful content from html?

Already the whole brain is broken.
I rummaged through GitHub, Habra, Google ...
But I did not find a clear adequate solution.
The essence of the task:
It is necessary to extract useful content with formatting from the html page.
What is all this for?
All this is needed for one simple (seemingly) task - alternative page storage (like a bookmark). I often bookmark things. Very often, then these bookmarks become useless - sometimes the server is not available, then the pictures and attachments died on the free CDN, then in general there is no such site anymore.
I lose 20-30 percent of bookmarks with useful info...
Programming languages:
Very desirable php.
Also maybe AS3 (flash), JS...
A plugin for FF would be very handy.
There are no fundamental requirements for the language, if it can be ported to php or js or flash.
What infa is found:
Basically, they describe "solutions" with stupid regular expressions. This approach is impossible, since it is not universal, if I needed a parser, I would not even ask a question.
There is also PEAR Text_Diff for php, theoretically it fits, in practice it needs to be rewritten and rebuilt for the task.
There are also theoretical fabrications that I did not like. Firstly, theoretical, no source code, secondly, the theory will fail if the article is crammed with formatting.
Note 1:
Once I came across a plugin for FF, which coped with the task very easily, but I just can’t find it and I don’t even remember the approximate name. Met for a long time. When the plugin was enabled, any page on the web opened in the same style and the page had only a title and formatted text. Without any nonsense, such as menus, banners, hats, cellars and other junk.
Finding this plugin would be the perfect solution!
Note 2:
There was also a plugin for FF, which, on click, copied the selection with formatting to WordPress (judging by the description). I did not install this plugin, I have no idea how it works, but there is an assumption that it can be used for my task if I do not find a more adequate and faster solution. I don’t remember what it’s called either, and it’s definitely not in the FF repository.
Note 3:
Search engines somehow process and highlight useful text. I don't know exactly how. I suppose that this is done quite complicated with a bunch of formulas and training a neural network ... If it's not as difficult as it seems to me, I will be grateful for the information.
Note 4:
Another way: copying the selection in the browser, then pasting from the clipboard into OpenOffice and then saving to HTML. It turns out a clean beautiful document with formatting, all that remains is to insert your styles. But it takes a lot of time. That's why I remembered the flash - it allows you to work with the buffer ...
Thank you in advance to all who answered.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
R
Ranwise, 2014-09-22
@best_santa

check out evernote for a browser plugin

E
Eugene, 2014-09-22
@blasheevich

FF ScrahBook plugin, marker, ability to delete selection.

I
itech523, 2019-01-16
@itech523

I use the Unmht addon for the old Firefox (manually select what to save), it is saved in 1 single file with the .mht extension. There are Android apps for reading .mht file. Addon Web Editable Switch for cutting-pasting content (cutting ads) in the .mht file. It is convenient to insert what you need with the help of these addons when creating a PDF at https://www.printfriendly.com/. Old Firefox, because did not transfer addons to the new Rust.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question