A
A
Alexander2014-02-24 19:35:22
Analytics
Alexander, 2014-02-24 19:35:22

How to solve the problem of cutting off bots in statistics?

There is an API, there are logs of this API. There was a need to somehow analyze these logs and I would like the final figures to be close to the number of real visitors.
Known bots (Yandex, Google, Bing, Mail.ru) were easily cut off - by a reverse DNS query we determine who owns the IP and filter by mask (for example, Google's addresses end in googlebot.com).
But what to do with lesser-known bots, which, as statistics show, are also quite a few, is unclear.
Any thoughts on how to calculate the bot based on the <date-time> data?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Andrew, 2014-02-24
@OLS

I would suggest that bots have almost deterministic URL request logic - you will always see the same sequence of operations from one IP. Create a directory of sets of sequences that you will regard as a bot entry.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question