How to start implementing the idea for a universal parser?

R

r4khic2019-09-10 06:24:26

Parsing

r4khic, 2019-09-10 06:24:26

Good afternoon ! The task was to think over the implementation (web application) of a universal parser for news portals that parses the title, date, news. That is (ideally, of course), let's say you enter a link to the news and the parser works. I understand the example is very extreme.
By this type:
Tyk #1
Tyk
#2 I would like to hear an opinion on the implementation. How exactly would you implement this kind of task? And what parsing methods would you recommend to make it suitable for most news resources ?
The main focus of the question is parsing methods for most news resources. To be like a universal parser From backend
tools I think to use python and such libraries and frames to them:
To send http requests, the requests library To
receive data from web pages, the Scrapy
framework To save paired results, I will use the Pymysql
library From the tools for the frontend , I have not decided yet.
PS: Yes, I understand the task is difficult. But as they say, behind a complex task lies great experience and knowledge. Peace and goodness to all :)

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

L

Lone Ice, 2019-09-10
@daemonhk

There is nothing universal, never was and never will be, otherwise it all turns into a harvester, whose wheels will then fall off...
What do you want to parse? Shops, sites, video hosting, XML files for exchange with 1C? Firstly, you need to decide what you want / are ready to offer the market, and secondly, break your super-duper-mega parser into modules, each responsible for its own section.
Well, tomorrow, Aliexpress, AliBaba and others will change the data structure, design, and all those who bought paid parsers will be left with a nose.

E

Evgen, 2019-09-10
@Verz1Lka

Look this way: https://scrapinghub.com/autoextract