A
A
Alexander2021-02-03 10:18:28
JavaScript
Alexander, 2021-02-03 10:18:28

How can I figure out why the page (puppeteer) is not loading in headless mode when I call it from a server without X?

I'm making a script that pulls up the latest information about the movement of containers
On the zim.com website, there were some difficulties. When turned on headless mode, the site identified the bot and denied access.
I set a large delay before execution, I don’t need to receive a lot of information, so this should not create problems for the site

I was able to overcome this by specifying the user agent and specifying some parameters for evaluateOnNewDocument
On the local computer, it helped, everything works correctly, but when I try to run it with its server - the page does not load at all.
How could the cause be determined? tell me pliz, I've been fighting for several days

so you can run the script
node bin/sealines/zim --container=FSCU8147907

const fs = require('fs').promises;
const puppeteer = require('puppeteer');
const path = require('path');
const params = require('optimist').argv;
const url = 'https://www.zim.com/tools/track-a-shipment';
const containerNumber = params.container;
const cookies = [
    {
        name: 'OptanonAlertBoxClosed',
        value: "2021-01-27T16:30:01.824Z",
        url: 'https://zim.com',
        domain: '.zim.com'
    },
];

let promise = (async () => {
    try {
        console.log('START SCRIPT');
        const browser = await puppeteer.launch({
            args: ['--no-sandbox'],
            headless: true,
            devtools: true,
            timeout: 120000 // иногда сайт не справляется за 60 секунд, удвоим таймаут
        });
        const page = await browser.newPage();
        await page.setDefaultTimeout(120000);

        await page.setExtraHTTPHeaders({
            'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
        });
        await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');

        await page.setCookie(...cookies);

        // Pass the Webdriver Test.
        await page.evaluateOnNewDocument(() => {
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false,
            });

            // этот помогает обойти защиту
            const originalQuery = window.navigator.permissions.query;
            return window.navigator.permissions.query = (parameters) => (
                parameters.name === 'notifications' ?
                    Promise.resolve({state: Notification.permission}) :
                    originalQuery(parameters)
            );
        });

        await console.log(url);
        const response = await page.goto(url, {
            waitUntil: ['networkidle0', 'networkidle2'],
            timeout: 20000
        });

        await page.screenshot({
            path: path.resolve(__dirname) + '/first.jpeg',
            type: 'jpeg',
            quality: 100,
        });

        await page.waitForSelector('#ConsNumber');
        await page.type('#ConsNumber', containerNumber);
        await page.keyboard.press(String.fromCharCode(13));
        await page.waitForNavigation({waitUntil: 'networkidle0'});

        const infoZim = await page.evaluate(() => {
            const el = document.getElementById('etaDate');
            const date = el ? el.innerText.replace('ETA:', "").trim() : null;
            const lastStopTD = $('.routing-table tr').last().children('td');
            const lastCity = $(lastStopTD).last().prev().prev().text().trim();
            const lastDate = $(lastStopTD).last().prev().text().trim();
            let latest = {}
            if (lastCity && lastDate) {
                latest.lastCity = lastCity
                latest.lastDate = lastDate
            } else {
                latest = null
            }
            const result = {
                date: date,
                latest: latest
            }

            console.log(result);
            return JSON.stringify(result);
        })

        await page.screenshot({
            path: path.resolve(__dirname) + '/screen.jpeg',
            type: 'jpeg',
            quality: 100,
        });
        await browser.close()
        return infoZim;

    } catch (err) {
        console.log(err)
    }
})();
promise.then(res => console.log(res)).catch(e => console.log(e))

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question