N
N
Nikita Koshcheev2021-06-03 06:40:07
Books
Nikita Koshcheev, 2021-06-03 06:40:07

What to read about parsing?

Passionate about data parsing in python. What books do you recommend on this topic?

Answer the question

In order to leave comments, you need to log in

7 answer(s)
D
datka, 2021-06-03
@datka

Fundamentals of HTML, CSS . Documentation for BeautifulSoup , Requests . Documentation on working with List and Dict and loops in python. Selenium documentation , Chrome/Firefox developer console guides. Youtube videos. Google. At least you need to know how the site works.

B
BadCats, 2021-06-04
@BadCats

I’ll add to datka ’s answer - as a real and purely applied one - that you can still delve into the theory of formal languages, grammar (how compilers and interpreters work) - which, perhaps , will allow you to write more universal code - not so strongly tied to page layout - t .k you can try to parse html/xml - at the level of tokens and tokens, even with custom classes and attributes for elements. But this is a very complex area and it is possible that "the game is not worth the candle."

D
dmshar, 2021-06-03
@dmshar

Ryan Mitchell. Modern website scraping with Python. 2nd
int. edition . - St. Petersburg: Peter, 2021.
Anish Chapagain. Hands-On Web Scraping with Python.-2019.
Katharine Jarmul, Richard Lawson. Python web scraping. Fetching data from the web.-2017 Packt Publishing.
Richard Lawson. Web Scraping with Python Scrape.-2015 Packt Publishing

A
acwartz, 2021-06-04
@acwartz

Look towards the computer. vision and neural networks. Really tasty things for parsing which will give a lot, protect against all of the above. For example, displaying them as a stream of a video stream, or drawing data in WebGL somewhere on the backend. But that's not all...

V
Victor, 2021-06-04
@Levhav

Pay attention to parser generators. For example, find some python analogues for bison / flex

G
Gor1950, 2021-06-04
@Gor1950

Practical course of site parsing in Python
The course is aimed at developing and practical application of the main syntactic constructions of the Python language when parsing data from sites.
The course is designed for beginners, for whom it is important to gain experience with the language and learn how to use it freely.
The most accessible activity for any beginner, which will allow you to gain experience before taking on something more serious, is, in my opinion, web scraping. Moreover, this activity allows you to earn money.
For people with experience, this course is unlikely to be useful.
Contents:
0. Preparation: install pip3, BeautifulSoup, lxml, requests
1. Introduction. Basic example of working with BeautifulSoup
2. Parsing multiple data and exporting to csv file
3. Parsing tabular data
4-1. Working with site pagination (method 1)
4-2. Working with site pagination (method 2)
5. Reading and writing data to csv files
6. Advanced techniques for working with the BeautifulSoup library
7. Saving data to a Postgresql database using the PeeWee ORM
8. Parsing data loaded via AJAX part 1
9. Parsing data in several processes
10. Parsing data loaded with jQuery
11. Parsing data loaded by AJAX (part 2)
12. Using a proxy
13. Conclusion and a couple of tips for those who still decide to freelance with parsing data from sites
14. Parsing data loaded by AJAX requests (JQuery) on the example of the Steam site. UPDATE
15. Authorization with Requests and use of sessions. UPDATE
The cost is 1200 rubles.

M
Michael, 2021-06-05
@id194695811

Mining Social Media: Finding Stories in Internet Data
Web Scraping with Python, 2nd Edition

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question