Answer the question
In order to leave comments, you need to log in
Do you need literature, articles or websites about PHP web scraping (and web scraping in general)?
Hello!
I am writing a diploma on the topic "Development of a universal parser in PHP".
I am looking for all kinds of literature and articles on this topic, tell me who knows what?
Answer the question
In order to leave comments, you need to log in
The title of the diploma still lacks the word "Development of a universal site parser in PHP".
And the word "universal" I think will be superfluous)
You will hardly find literature on this topic, the articles are scattered and of different quality. Search try. I highly recommend trying to write some parsers in other languages. See how the Grab library for Python works. Its principles can be transferred to PHP.
In general, site parsing consists of approximately the following steps:
The parser itself may consist of the following parts:
Libraries are used to parse HTML (example https://github.com/Imangazaliev/DiDOM ). Regular HTML is not parsed, but, of course, it is also used to parse other data.
Sometimes you need to execute JS, for example with PhantomJS.
To bypass captchas, they use services such as anti-gate / anti-captcha.
Sometimes you need to log in or bypass the protection built on cookies.
Multicurl is used for multi-threaded parsing.
In general, PHP is not the most suitable language for web scraping. However, it is intended for other purposes. Python + Grab will be much more convenient and productive here. As, however, almost any desktop language that has the necessary libraries.
There is a book "PHP Web Scraping.Jacob Ward." You can download here it-ebooks.info/book/4297
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question