Answer the question
In order to leave comments, you need to log in
How to parse https correctly?
Hello, I’m making an https site parser
using guzzle with a proxy,
but still I catch a lock that disappears after 10 minutes. If done without a proxy, there will be such an error in the certificate
$response = $client->get('https://site/page/'.$i, ['verify' => false,'delay'=> 3000,'proxy'=> ''.$proxy.'']);
$response->getStatusCode() // возвращает 200 даже при блокировке.
$response->getReasonPhrase(); //ok, но при блокировке Fatal error: Uncaught GuzzleHttp\Exception\ConnectException
Answer the question
In order to leave comments, you need to log in
First of all, try to disguise yourself as a browser as much as possible (send headers similar to the browser ones with the request. Make sure that the browser User-agent and not something like "php-crawler" leaves you. Clear the jar cookies after fetching each page (very often helps) Make pauses between fetching pages, here you can experiment from a few seconds to minutes, make them random.Regarding the certificate, you can turn off certificate verification:
$this->client = new GuzzleClient(['verify' => false ]);
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question