By what criteria can Amazon define a "headless browser"?

Z

Zimaell2019-08-24 17:24:57

Parsing

Zimaell, 2019-08-24 17:24:57

The question has been standing for more than a day, rummaged through some articles
https://intoli.com/blog/not-possible-to-block-chro...
https://intoli.com/blog/making-chrome-headless-und ...
They allegedly describe how to make a headless browser indistinguishable from a regular one, did everything as described there, according to their tests
https://intoli.com/blog/not-possible-to-block-chro...
https:// intoli.com/blog/making-chrome-headless-und...
I pass all the criteria, what I see in the browser for these tests shows if I use a headless (Puppeteer).
And yet, I log in from the same ip from the browser and through the headless one, I normally log in from the browser, and if from the headless one it gives out a captcha (and if you decide it will still give), that is, it distinguishes them, the question is how?
What other options might there be?
What else does he need to send?
I thought it might be cookies, but I cleared the cookies to zero and logged in normally from a regular browser.
Or even if you go from a "clean" browser, then it can still send something like that because of which there will be a difference?
In general, a matter of principle, I am looking for at least some clues ...

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

I

Ivan Shumov, 2019-08-24
@inoise

There will be no leads. There is artificial intelligence at the entrance and it collects a huge number of metrics: both from the request and from the client. This is not the first time we have discussed this. Amazon is very good at protecting itself from scraping.