How to parse the body text of a web page?

I

Ivan2016-10-31 05:51:08

Parsing

Ivan, 2016-10-31 05:51:08

It is necessary to get the actual text of the article from the HTML code of the web page without footer aside menus and other extra text. I think that the main text usually takes up more space than the rest of the elements. Suggest how to isolate it from the total mass. I don't need pictures and tags inside the article, they can be deleted.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

D

DevMan, 2016-10-31
@iwqn

любой dom-парсер в помощь.
пользоваться ими несложнее, чем jquery. a регулярками парсить - бред еще тот.

M

murlogen, 2016-10-31
@murlogen

If the site has microdata support for FB, etc. - then you're in luck.
Pulls out for once.
Looks beautiful.
Just what the author of the site intended.
I would start by trying to define micro-markup
If there is no micromarking, then a less accurate method is to parse manually - well, other respondents write how to do it for you.

A

Alexander Taratin, 2016-10-31
@Taraflex

php-readability Which port to choose?