A
A
AntonioK2012-02-27 17:25:15
JavaScript
AntonioK, 2012-02-27 17:25:15

HTTP request header: Origin?

Maybe someone else has figured this out before me?

The goal is to distinguish normal clicks on a href made by real people at the server level from ajax and other tricky requests made by spider bots or user scripts in browsers.

Is it generally possible to consider that the presence of the Origin header is a sufficient reason to consider the request does not exactly meet the criterion “normal click on a normal link”?

The task is to track and stop banner clicks.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
T
TheHorse, 2012-02-27
@TheHorse

> Is it generally possible to consider that the presence of the Origin header is a sufficient reason to consider the request does not exactly meet the criterion of “normal click on a regular link”?
No. Bots make exactly the same headers.
Dig towards the statistical analysis of the query population.

E
egorinsk, 2012-02-28
@egorinsk

To distinguish bots from people, you can use (since you have vaguely described the details of the task, I write everything at once): checking for Cookie, Expires, Last-Modified support, checking for HTTPS support, checking for the possibility and correct execution of JS / Flash code, checking for version compliance user agents and supported features, checking the loading of static resources, checking information about local network interfaces through a Java applet, behavioral analysis (analysis of recorded user actions: for example, the user must move the mouse or focus pointer to a link before following it), statistical analysis (detection of trends), analysis of request sources (for example, requests from China to a Russian-language site; requests with Ip from spam lists; requests from IP data centers; requests from computers with open proxy ports),comparison of client information with request parameters (for example, javascript shows the Russian locale and the Moscow time zone on the client, and the request comes from a Chinese IP with a German locale in the headers = a proxy is used + a header substitution tool).
To prevent bypassing the system, it is necessary to periodically change the analyzed factors (so that bots that have adapted to the old set of factors give themselves away).
A smart system takes into account all these factors, accumulating and analyzing them in accordance with a set of rules. It is unlikely that you will be able to do the same amount of work that, for example, the Yandex advertising team does. But even such a system, it seems to me, can be bypassed if desired, if you understand how it works. And if, for example, to involve a person like egorinsk in this business and give him a lot of money.
The method you suggested is primitive and manages with a few lines of code.

T
tangro, 2012-02-27
@tangro

it would be naive to consider the presence / absence of the origin header as a sign of at least something.

R
rPman, 2012-02-27
@rPman

Collect/build click statistics marked human/probably bot… build a table: records - clicks, attributes - click parameters (resource, time since session start, since previous click, presence of headlines,..)
Then connect mathematical apparatus to analyze this table ( you can train a neural network based on it).

B
bost84, 2012-02-27
@bost84

tokens can make life much harder for bots

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question