How to parse Google search results without blocking (PHP + cURL)?

T

TillTill2016-10-15 18:37:36

Google

TillTill, 2016-10-15 18:37:36

I parse the output of Google (only the first page of the output), after about 30 requests, Google dumps the captcha. Is it possible to parse without blocking, without using a proxy? It is necessary that ~ 1500 requests be processed in no more than three hours.
I set pauses between requests, sent browser-like headers.

Reply

Answer the question

In order to leave comments, you need to log in

6 answer(s)

M

mrRiver, 2021-07-16
@mrRiver

There is an XMLRiver
service ~ 1500 requests can be collected in 10 minutes.

S

sim3x, 2016-10-15
@sim3x

Never
Use an API
Use a bunch of proxies
Use a real browser
....

R

Rou1997, 2016-10-15
@Rou1997

No, and moreover, a proxy, and even more so a VPS / VDS, may already be present in the list so that the captcha appears on almost every second request.

V

Vitaly, 2016-10-15
@vshvydky

Pay for captcha and don't worry

G

Golover, 2016-10-18
@Golover

На днях написали статью "How to check which URLs have been indexed by Google using Python"
Тут имеется в виду парсинг по списку URL, можно подшаманить и парсить по запросу.
ссылка

L

librevlad, 2019-08-31
@librevlad

Парсить с одного IP без блокировок не получится, но можно настроить сетку проксей на одну геолокацию для ровных результатов. А можно купить услуги готовых сервисов, например serpentine.io.