Do you need literature, articles or websites about PHP web scraping (and web scraping in general)?

V

Victoria2016-03-10 08:33:25

PHP

Victoria, 2016-03-10 08:33:25

Hello!
I am writing a diploma on the topic "Development of a universal parser in PHP".
I am looking for all kinds of literature and articles on this topic, tell me who knows what?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

S

Silm, 2016-03-10
@Silm

The title of the diploma still lacks the word "Development of a universal site parser in PHP".
And the word "universal" I think will be superfluous)
You will hardly find literature on this topic, the articles are scattered and of different quality. Search try. I highly recommend trying to write some parsers in other languages. See how the Grab library for Python works. Its principles can be transferred to PHP.
In general, site parsing consists of approximately the following steps:
The parser itself may consist of the following parts:
Libraries are used to parse HTML (example https://github.com/Imangazaliev/DiDOM ). Regular HTML is not parsed, but, of course, it is also used to parse other data.
Sometimes you need to execute JS, for example with PhantomJS.
To bypass captchas, they use services such as anti-gate / anti-captcha.
Sometimes you need to log in or bypass the protection built on cookies.
Multicurl is used for multi-threaded parsing.
In general, PHP is not the most suitable language for web scraping. However, it is intended for other purposes. Python + Grab will be much more convenient and productive here. As, however, almost any desktop language that has the necessary libraries.

W

warnerbrowsers, 2016-03-10
@warnerbrowsers

There is a book "PHP Web Scraping.Jacob Ward." You can download here it-ebooks.info/book/4297

E

Eugene, 2016-03-10
@Nc_Soft

This is a utopia.