B
B
blueboar22013-03-17 09:15:29
Programming
blueboar2, 2013-03-17 09:15:29

How to automatically extract the text of the news

Is it possible to write a program so that it automatically determines where this news is from a web page with news, and, in fact, robs it into text.

It is clear that the accuracy will not be 100%. Basically, has anyone done this?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
ValdikSS, 2013-03-17
@ValdikSS

There is such a thing - Readability , and it has an API.

R
Resager, 2013-03-17
@Resager

The tag "artificial intelligence" pleased. I would give an example of what news, from which site. You can create a set of regular expressions for each site (type "Regular expressions" in Google) and pull them out of HTML. And in general still exist RSS.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question