I
I
Ivan Yakushenko2019-07-25 17:29:42
Python
Ivan Yakushenko, 2019-07-25 17:29:42

What could be the ways to define Selenium automation?

Let's say there is a site example.com where I'm trying to automate the registration process. The success rate is extremely low. What I tried:
1. Proxy. Public lists, private lists, luminati, microleaves - all with little or no success. I settled on the option of raising the google cloud microservers and running a proxy through them - occasionally registration occurs.
2. User-agent and OS naturally change randomly, only current versions.
3. There is a lot of randomness in the script itself - each input form is hovered over, clicked on the form, user data is not entered immediately, but something like this:

fn_form = WebDriverWait(driver, 10).until(ec.visibility_of_element_located((By.XPATH, '//input[@name="firstname"]')))
action = ActionChains(driver)
action.move_to_element(fn_form).perform(); time.sleep(random.uniform(0.1, 0.5))
fn_form.click()
for character in user_info['first_name']:
    fn_form.send_keys(character)
    time.sleep(random.uniform(0.1, 0.3))

I tried switching between forms in addition to hovering the mouse and clicking by pressing Tab:
driver.find_element(By.XPATH, '//body').send_keys(Keys.TAB); time.sleep(random.uniform(0.1, 0.5))

There is nowhere to flip the page back and forth, because it is exactly the size of the screen.
4. I tried to run Selenium through my browser profile, in case there is a fingerprint check - dubious success.
5. I looked through all requests in the Network tab - all parameters and transmitted cookies are 100% identical to those transmitted upon successful registration either manually or through Selenium itself, just at some point when you click on the "Registration" button, several requests appear in the Network the following content:
?cid=102
Request URL: https://example.com/reg/submit/?cid=102
Request Method: POST
Status Code: 302
Remote Address: 34.94.235.219:3128
Referrer Policy: origin-when-cross-origin

https://example.com/sem_pixel/1/control/0/

Request URL: https://example.com/sem_pixel/1/control/0/
Request Method: GET
Status Code: 302
Remote Address: 34.94.235.219:3128
Referrer Policy: origin-when-cross-origin
access-control-allow-credentials: true
access-control-allow-methods: OPTIONS
access-control-allow-origin: https://example.com
access-control-expose-headers: X-FB-Debug, X-Loader-Length
cache-control: private, no-cache, no-store, must-revalidate
content-length: 0
content-security-policy: frame-ancestors 'self';
content-security-policy: default-src * data: blob: 'self';script-src *.example.com *.excdn.net *.example.com *.google-analytics.com *.virtualearth.net *.google.com 127.0.0.1:* *.spotilocal.com:* 'unsafe-inline' 'unsafe-eval' blob: data: 'self';style-src data: blob: 'unsafe-inline' *;connect-src *.example.com example.com *.fbcdn.net *.example.com *.spotilocal.com:*'self';
content-type: text/html; charset=utf-8
date: Thu, 25 Jul 2019 14:22:27 GMT
expires: Sat, 01 Jan 2000 00:00:00 GMT
location: https://example.com/confirmemail.php?next=https%3A...
pragma: no-cache
status: 302
strict-transport-security: max-age=15552000; preload; includeSubDomains
vary: Origin
x-content-type-options: nosniff
x-fb-debug: McXnUwM163ur8MkdlphCspXutCohCp808mkbnGwCnsYlKYPyhYm9xYhQTNyE7aCwv+DrQvYCQJBp5G9oz2FrOw==
x-xss-protection: 0
:authority: example.com
:method: GET
:path: /sem_pixel/1/control/0/
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
accept-encoding: gzip, deflate, br
accept-language: en,en_US;q=0.9
cookie: datr=Drs5XX3GHZ5kfWz0XfWdTt3s; sb=Drs5XYusCGzt1jiFHJVyzodj; c_user=100040010132196; xs=47%3AitezblUuA4TlFw%3A2%3A1564064545%3A-1%3A-1; fr=5SJ6UZc1le4OQSjBD.AWWSJPijLCZbTY9q_4Bm_0NlvO8.BdObsh.jT.AAA.0.0.BdObsh.AWWqN68S
referer: https://example.com/login/save-device/?login_sourc...
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0

The first request is also with successful registration 1 in 1, but it has a status of 200, the second request appears only with unsuccessful registration.
Can anyone tell me what other options are there to "pretend to be a user"?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
I
Ivan Yakushenko, 2019-07-26
@kshnkvn

Tried a bunch of different options, here are some that might work for some (not for me):
1. Very doubtful, but some people say it worked for them:
Changing the name of the js document variable used by Selenium to $cdc_ . To do this, just open the chromedriver.exe file in any hex editor (I used HxD) and change its name to any other. It didn't work for me, but chromedriver itself works fine after that. I also tried to change all the variables where there are the words driver , but it was a bad idea - chromedriver stopped running. You definitely cannot do without changing the sources, but I'm not sure that this can work.
2. This is a more effective option that gives at least some result. On this page , you can determine whether chromedriver is used or not, and when you run this page through selenium, it really shows that webdriver is used. Adding the following piece of code helped bypass this identification:
But it still didn't help me.
I also found a very dubious and most likely simply non-working solution:
Running a js code that changes the state of navigator variables, including navigator.webdriver.
This is how it starts:

js code itself
// overwrite the 'languages' property to use a custom getter
const setProperty = () => {
    Object.defineProperty(navigator, "languages", {
        get: function() {
            return ["en-US", "en", "es"];
        }
    });

    // Overwrite the 'plugins' property to use a custom getter.
    Object.defineProperty(navigator, 'plugins', {
        get: () => [1, 2, 3, 4, 5],
    });

    // Pass the Webdriver test
    Object.defineProperty(navigator, 'webdriver', {
      get: () => false,
    });
    callback();
};
setProperty();

The nonsense is that in Chrome there is no navigator.webdriver variable at all, you can verify this by entering navigator in the browser console, it is not there. But this variable is in Firefox, but this code does not change it, i.e. it just doesn't do anything, the Firefox navigator.webdriver variable is always set to true when run through selenium . In normal (manual) mode, it is false .
UPD . I don’t know how I overlooked it, but in the end everything came up against the reCAPTCHA v3 check . This check is almost always passed under the following conditions:
1. User-Agent is not used.
2. No proxy is used.
3. Notifications are not disabled.
4. Requests for permissions are not blocked.
4. This parameter is used here:
But with such parameters, it is not possible to register more than once from one IP. As I pointed out in my question - I used completely different proxies - from public to google cloud micro-servers, so it's not about the "quality" of the proxy, but purely about the fact of its use.

D
Danila NV, 2020-02-10
@DanyaMo

You can also use Puppeteer instead of Selenium with the puppeteer-extra-plugin-stealth plugin connected .
At least this test passes

D
Dimonchik, 2019-07-25
@dimonchik2013

they won’t tell me to sign
- now this is commercial info
, well, from what you can tell - less headless))
but - everything is much more complicated

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question