K
K
Kuti2016-07-19 11:43:50
PHP
Kuti, 2016-07-19 11:43:50

How to properly implement a parser for news sites with Yandex News?

How to properly implement a parser for news sites with Yandex News? You need to get a clean title (no seo bells and whistles, you can look at the value of the tag), and clean text (do not take text from the footer, etc., only the text of the article).
The task would be simple if there was only one site. And there will be many sites, you need to come up with some kind of universal solution. If there was only one site, then I would place the key copy points and that's it (for example, I would know that the text of the article is in the #blablabla div). And the title can be easily obtained - all sites on Y. News have h1. So - how to get the clean text of an article without knowing which site the content is copied from? Or will you have to write a separate parser for each site?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
I
IceJOKER, 2016-07-19
@IceJOKER

RSS

T
trevoga_su, 2016-07-19
@trevoga_su

1. Stealing is bad. And so everyone has already dirtied it with their copy-paste
2. There is no universal way.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question