Answer the question
In order to leave comments, you need to log in
How to collect documents from the site?
It is necessary to solve the following problem:
1. Every day, articles are published on the news outlets I need.
2. You need to take the texts of articles from a specific container.
3. It is important that articles are saved in txt format and signed with the date and time of collection.
4. Directory for saving, either immediately a computer or a cloud (such as mail.ru).
5. The sampling frequency is 1 time per day, or manually (by pressing the button).
How to collect documents?
Answer the question
In order to leave comments, you need to log in
many news resources have rss. You can request it on nodejs, parse it and add it to the cloud.
If you parse pages, you can use https://github.com/GoogleChrome/puppeteer
the first couple of lines of the article are usually put in rss, just so that it would be impossible to parse the full article from there
without ads
:) /h...
https://itnext.io/scraping-with-nodejs-and-cheerio...
request will bring html pages, with cheerio you can easily parse blocks with content by css selectors
puppeteer and other "headless browsers" usually not needed for this .
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question