A
A
Alexander Vzlomed2020-11-24 14:53:02
Node.js
Alexander Vzlomed, 2020-11-24 14:53:02

What proxy connection modules are there for Puppeteer?

Good afternoon, there was a need to parse a large amount of information from one site. The site has about 500 thousand pages. I decided to use Puppeteer + Electron + some application methods for this . But I ran into a problem that after a while, the site bans by IP for an hour (it doesn’t ban completely, but it’s no longer possible to register an autoreg, and there is a restriction on one account). I needed to change my IP for each click on the link, or at least write the enumeration of these proxies myself if an error occurred. After looking at the Puppeteer documentation, I realized that it gives only the only possibility - to set a proxy through the config (... args:['--proxy-server=*proxy*']) and only at the start of the browser.I decided to use addons (puppeteer-proxy, puppeteer-page-proxy, etc.), but either because of my clumsiness or some other problem, they stubbornly refused to work as I needed (the module seemed to be connected, and did not give errors, but when it came to connecting my proxies, the site fell off with the error ERR: FAILED, the promise was displayed in the console with an error, but it did not work out). Here is an excerpt of what I tried to use, if necessary, I can throw off the whole code in the comments

let scrape = async () => {
        let browser = await puppeteer.launch({headless: false})
        let page = await browser.newPage()

        await page.setRequestInterception(true);

        page.on('request', async (request) => {
            await proxyRequest({
                page,
                proxyUrl: 'http://127.0.0.1:3000',
                request,
            });
        });
//дальше идёт скрипт парсинга

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question