Answer the question
In order to leave comments, you need to log in
How to parse the body text of a web page?
It is necessary to get the actual text of the article from the HTML code of the web page without footer aside menus and other extra text. I think that the main text usually takes up more space than the rest of the elements. Suggest how to isolate it from the total mass. I don't need pictures and tags inside the article, they can be deleted.
Answer the question
In order to leave comments, you need to log in
любой dom-парсер в помощь.
пользоваться ими несложнее, чем jquery. a регулярками парсить - бред еще тот.
If the site has microdata support for FB, etc. - then you're in luck.
Pulls out for once.
Looks beautiful.
Just what the author of the site intended.
I would start by trying to define micro-markup
If there is no micromarking, then a less accurate method is to parse manually - well, other respondents write how to do it for you.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question