J
J
jetf0x2020-04-30 10:36:29
PHP
jetf0x, 2020-04-30 10:36:29

How to set Page load strategy for Chrome driver in php-webdriver?

I am developing an online store parser in PHP. Task: it is necessary to get the html-code of the product catalog without waiting for the full loading of the site.

For this purpose I use PHP Webdriver with Chrome browser ( ChromeDriver 81.0.4044.69). To prevent the store from banning me, I use a proxy, which somewhat slows down the loading of the site. The problem is that the html-code of the catalog appears in the first seconds of loading, but the commands after that will not be executed until the store is fully loaded with all scripts, styles, pictures, etc. In order to increase the performance of the parser, it would be desirable that as soon as the html code of the catalog appears, interrupt the download and proceed to further analysis of the resulting html.$driver->get('....')

host = 'http://localhost:4444';
    $options = new ChromeOptions();

    $options->addArguments([
        '--window-size=1500,800',
        '-proxy-server=socks4://IPпрокси:ПортПрокси',
    ]);

    $desiredCapabilities = DesiredCapabilities::chrome();
    $desiredCapabilities->setCapability(ChromeOptions::CAPABILITY, $options);

    $driver = RemoteWebDriver::create($host, $desiredCapabilities);
    $content = $driver->get('https://www.какой-то-интернет-магазин.ru/')->getPageSource();
    // ждёт полной загрузки страницы и только потом выполняется дальше
    $catalog = ... анализ html: вытаскиваю из $content интересующий меня фрагмент каталога;
    file_put_contents('Catalog.html', $catalog);


The best solution to this problem without crutches is to use Page loading strategy in eager or none mode. I know it's possible since ChromeDriver version 77.0. I found a solution for other programming languages , but I do not have enough experience to implement the same thing on PHP Webdriver. It also talks about: 1) a
crutch method through a download waiting timeout, but not the fact that the element I need will have time to load in an explicitly set time;
2) the strategy I need to wait for the download, but again it's not clear how to implement this in PHP Webdriver.

Answer the question

In order to leave comments, you need to log in

6 answer(s)
X
xmoonlight, 2020-04-30
@xmoonlight

The usual PHP curl() can break in the middle of the stream, as soon as you get what you need.
Read the doc on php.net

J
jetf0x, 2020-05-18
@jetf0x

I asked the official developers of the library on Github for an answer, they answered me here: https://github.com/php-webdriver/php-webdriver/iss...

M
Maxim, 2013-03-19
@javax

Jira

S
Softlink, 2013-03-19
@Softlink

Look towards Redmine.
Screenshot from the control panel: clip2net.com/s/4LUhA2
I don't know about integration with GMail, but there is a calendar there.
The repository is a tracker. If I'm not confusing anything, then it can be linked to any repository.
Plus, it's free of charge.

V
Vladimir Sokolovsky, 2013-03-19
@inlanger

www.assembla.com There
is definitely integration with github, you can also store your local repositories.
Bug Tracking in the form of tickets.
Yes, and everything else seems to be there, except for Gant Chart and integration with Google services. But there are a lot of other charts :)

P
Puma Thailand, 2013-03-19
@opium

In general, redmine through plugins has it all.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question