Answer the question
In order to leave comments, you need to log in
What python library to parse Html?
Hey!
There are N number of sites with approximately the same information, this data is displayed in a table, I want to collect all this on one site in one table. Well, you understand, like a news aggregator, or something else ...
Which library is better suited here?
How to be not banned in such activities?
Answer the question
In order to leave comments, you need to log in
beautifulsoup4 - www.crummy.com/software/BeautifulSoup/index.html
For the third python Grub . I work with it, and inside I use sqlalchemy. It just comes out great.
I was in a similar situation (there were about 10 source sites with different data structures) using requests , lxml and XPATH expressions.
How to be not banned in such activities?If you use synchronous libraries (requests), then, in my opinion, you don’t have to worry too much about possible blocking if the servers hosting the sites are properly configured and you don’t access the sites too often. Just in case, you can prescribe an inconspicuous User-Agent.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question