A
A
Alexander Vzlomed2020-11-24 23:19:12
Parsing
Alexander Vzlomed, 2020-11-24 23:19:12

How to use proxy with Puppeteer?

Good afternoon, there was a need to parse a large amount of information from one site. The site has about 500 thousand pages. I decided to use Puppeteer + Electron + some application methods for this . But I ran into a problem that after a while, the site bans by IP for an hour (it doesn’t ban completely, but it’s no longer possible to register an autoreg, and there is a restriction on one account). I needed to change my IP for each click on the link, or at least write the enumeration of these proxies myself if an error occurred. After looking at the Puppeteer documentation, I realized that it gives only the only possibility - to set a proxy through the config (... args:['--proxy-server=*proxy*']) and only at the start of the browser.I decided to use addons (puppeteer-proxy, puppeteer-page-proxy, etc.), but either because of my clumsiness or some other problem, they stubbornly refused to work as I needed (the module seemed to be connected, and did not give errors, but when it came to connecting my proxies, the site fell off with the error ERR: FAILED, the promise was displayed in the console with an error, but it did not work out). Here is an excerpt of what I tried to use, if necessary, I can throw off the whole code in the comments

let scrape = async () => {
        let browser = await puppeteer.launch({headless: false})
        let page = await browser.newPage()

        await page.setRequestInterception(true);

        page.on('request', async (request) => {
            await proxyRequest({
                page,
                proxyUrl: 'http://127.0.0.1:3000',
                request,
            });
        });
//дальше идёт скрипт парсинга

Answer the question

In order to leave comments, you need to log in

5 answer(s)
N
Nazar Mokrinsky, 2016-10-10
@nakoneti

There are several aspects to this actually, but the very first thing to understand is that 100% anonymity is a myth.
If you want relative anonymity, you take the most popular version of Windows (including the locale), install it on a virtual machine that accesses the network only through Tor, install the currently most popular version of the browser on it (the penultimate version of Chrome, for example). In the virtual machine, set the currently most popular screen resolution to a fixed one and do not install extensions / plug-ins in the browser, do not change the default settings, do not install additional fonts - nothing that can change the browser fingerprint (if most have Flash Player - you also need will install it, for example), you can check later, for example, here:https://panopticlick.eff.org/ . Always use disposable browser profiles.
In this scenario, it will be very difficult to distinguish you from a bunch of other similar users (except that your exit nodes will be Tor nodes).
It's a little easier to generalize about input data - don't use services that can be accessed by providers/states. What limits you to Tor/I2P (maybe some other) sites, Tox (and analogues with full P2P connections without centralized servers) for chat/audio/video communication, and similar systems.
TL;DR: be more specific about what exactly you want, because 100% anonymity can only be in the absence of a device at all (even if it is disabled or in sleep mode, this does not mean that it does nothing), for example, you can look at security reports iron to understand the scale of the tragedy: https://www.youtube.com/watch?v=rcwngbUrZNg or https://www.youtube.com/watch?v=E6zOqznGn5o

C
CityCat4, 2016-10-10
@CityCat4

OMG, why do such questions pop up here with such enviable regularity? With the model of the violator that you have - and according to it, it is the state - it is impossible. Neither a torus, nor two toruses, nor even a bagel will save you.
You turned on the router - and the provider has a mark "Node X turned on the communication equipment." You went to the Tor - and the provider has a mark "Node X connected to the Tor server". If the router does something else, even a trifle - it updates the time, checks for new firmware versions - all this will be recorded by the provider. You turned off the router - the provider has a mark "Node X spent so much time in the Tor network". If the state became interested in you, it received this information and, without wasting too much time, sent a couple of people in black suits to you. Protection against thermorectal cryptanalysis has not yet been invented, so you yourself, voluntarily and with a song, hand over all your keys, tell where you went, etc.

Y
younghacker, 2016-12-03
@younghacker

Definition:
Anonymity is Nameless, unknown; default, name hiding.
There should be nothing in the path between you and the internet that can tie you to your connection. And you can't act like you normally do. In this case, your usual behavior at the same time should be recorded falsely.
Who you are you can find out
1) Financial footprint
acquisition of devices and services that provided your Internet access
receiving benefits from the Internet in the form of finances, goods and services
2) Electronic footprint
IP, MAC, time, built-in cameras, wifi, gsm, gps, microphone .
operating systems, programs, plug-ins and more.
these people scurrying around with mobiles in which there is software for collecting information not only about the owner of the mobile, but also about the surrounding radio air. GPS coordinates, GSM base stations, WiFi hotspots, bluethooth devices, etc. And there is a blonde talking on the phone, and her camera secretly captures that you looked in her direction. This is not because she is a spy, but because she puts everything on the phone indiscriminately.
3) Metadata trail
handwriting: speed, characteristic features of your work on the Internet. The style of typing on the keyboard has its own imprint. Spelling errors, correctable typos, punctuation, etc. The Google search string in any browser using JS (if it is allowed) is transmitted to the Google server continuously while you are typing. Consider that information about the nature of the set is transmitted to the Internet. Google does everything to know your face even if it has a mask on it. Don't forget the mouse or touchpad.
the information you are looking for without the anonymous mask may give you away when you try to do the same in the mask. You need to have clearly defined instructions on what not to do and clearly limited actions. Your anonymous life should be like that of a spy. This is self-discipline, this is work, this is the constant replenishment of knowledge and their application in practice. It is very difficult not to sleep in practice when you are being watched for 24 hours and do it without straining.
we regret to keep silent about the fact that your friends in front of your Nickname or phone number will carefully write your name, date of birth, relationship, photo and upload to Apple or Google, and all applications that have access to the address book (and only the lazy do not climb there) know this right away .
You can steal an Internet connection, buy a SIM card with GPRS from gypsies, but how can you hide from video cameras carefully placed around the world. RFID chips from banks, libraries, subways are carefully placed in your pockets. The identity card becomes biometric and its presence in a pocket in a public place is imposed by law.
The more modern the phone computer, the more likely it is to have a factory backdoor at the chip level, or a backdoor from a reseller or delivery service. You think that by installing Tails or Kali Linux you have solved the problem - you are mistaken, you also need to assemble a computer on lamps :). Or you carry your phone with you, it gives the provider information about where you were 24 hours a day. Give him your daily habits. Here Vasya is going to work, here he is from work. But suddenly Vasya disappeared from the radar, although usually at this time he travels along route A or B. Strange. Anomaly. And now, if all this information falls into one hand and is analyzed, what happens? It turns out that the circle of suspects is sharply narrowed. Vasya is found on cameras in Mitino, he buys a SIM card from gypsies, or stands near the library in a car with a laptop on his lap.
А то что Вася пользуется TOR, VPN и необычной операционкой это для провайдера не секрет. Просто ему до времени нет дела до Васи. Записанный трафик можно вскрывать и потом.
Так что подпишусь под словами АртемЪ
Хотите аномнимности в интернет, не пользуйтесь интернетом.

Александр Таратин, 2016-10-10
@Taraflex

Спасут ли сервера за пределами страны с перекачкой трафика через них и использование Тора или операционная система Tails.

Люди с шапочками из фольги на голове наиболее заметны в толпе прочих индивидуумов.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question