R
R
romaaa322018-10-09 22:27:54
PHP
romaaa32, 2018-10-09 22:27:54

What is the best way to scrape websites?

1) Use CURL, or is there something better?
2) After receiving the site page (for example, via CURL), immediately take the necessary data from it or write the page to a file, and then parse it. Are there any differences in terms of saving RAM?
3) Use regular expressions or for example PHP Simple HTML DOM Parser? And if the latter, what are + its uses? There is not much data to parse from each page and the execution speed is not particularly needed.. The consumption of RAM is interesting.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
M
Maxim Osadchy, 2018-10-09
@romaaa32

1) I use Guzzle, there is the same curl inside, but in a convenient wrapper.
2) I first save the links to a file, then I go through them, if there are few pages and they are not heavy, I don’t save them.
3) I use regulars only in cases where I can’t get data using simple library methods, for example, on sites with a tabular layout without classes and identifiers, I use the phpquery library, it is faster than the one you specified.

X
xmoonlight, 2018-10-10
@xmoonlight

nightmare (headless browser library)

A
AleksandraSoy, 2020-02-23
@AleksandraSoy

I just signed up for a service that collects data of any complexity for me from any source. For those who often need to parse data, I advise. Although one-time use there seems to be there too. https://sssoydoff.wixsite.com/scraper

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question