I
I
Ivan2016-10-31 05:51:08
Parsing
Ivan, 2016-10-31 05:51:08

How to parse the body text of a web page?

It is necessary to get the actual text of the article from the HTML code of the web page without footer aside menus and other extra text. I think that the main text usually takes up more space than the rest of the elements. Suggest how to isolate it from the total mass. I don't need pictures and tags inside the article, they can be deleted.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
DevMan, 2016-10-31
@iwqn

любой dom-парсер в помощь.
пользоваться ими несложнее, чем jquery. a регулярками парсить - бред еще тот.

M
murlogen, 2016-10-31
@murlogen

If the site has microdata support for FB, etc. - then you're in luck.
Pulls out for once.
Looks beautiful.
Just what the author of the site intended.
I would start by trying to define micro-markup
If there is no micromarking, then a less accurate method is to parse manually - well, other respondents write how to do it for you.

A
Alexander Taratin, 2016-10-31
@Taraflex

php-readability Which port to choose?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question