Answer the question
In order to leave comments, you need to log in
What methods do you know for detecting suspicious user activity?
Hello! I have recently become interested in machine learning, in particular, the task of detecting anomalies, but a huge layer of information, many different algorithms and knowledge of mathematics that leaves much to be desired have simply confused me.
I am interested in the tasks of detecting suspicious user actions in some system, based on his previous actions. I would like to hear specific practical recommendations, ideally examples of solving any task that is similar in meaning. At least the name of the algorithm (method) most often used for this.
As an example: detection of suspicious activity by changing the login time or the average time of the user's work (naturally, provided that a pattern is found), a suspicious IP change.
Here I even dare to rephrase the question: how is this even implemented? Maybe I'm wrong and such tasks do not require machine learning? But if you do everything wisely, then too much if-else will come out, because it’s not just to sound the alarm when the IP has changed, but solely according to the previous statistics: maybe it’s not enough or for a given person this norm (changes every week), and preferably with the ability to at some point to understand that a person from that day on is a homebody, which means that one must forget about previous adventures.
Many thanks in advance for your time!
Answer the question
In order to leave comments, you need to log in
Initially, there is a set of chains of controlled parameters and average statistical indicators of the appearance of such chains.
As soon as it appears:
1. the chain is too unique (i.e., it has a difference of more than half of the parameters)
2. the average occurrence of the same chain deviates in any direction by more than 50% relative to its previous time intervals
3. when analyzing sessions, the sequence order of packets in a session has a similarity deviation of more than 50% from all other user sessions.
Then - alert is fixed!
Congratulations! You just took on a colossal task. The largest corporations spend thousands of man-hours, poring over this topic for years, and at the same time, the number of false positives and false negatives of their protection systems is simply annoyingly high.
Lots of methods too. Login time, work time, browser fingerprints, IP address ranges - it's on the surface, very simple and nothing productive. Ultimately, it results in the fact that a client somewhere on a business trip with the IP of a local provider and a beech issued at work is forced to fight with your system. Now they are digging in other directions. For example a mouse. For many it is not obvious, but the patterns of mouse cursor movements are unique for each person. There, after all, not only pointing at interface elements, i.e. productive movements are still very indicative of the so-called. idle movements - for example, the way you twist the cursor while waiting for something to load. Only here the person is not constant. As soon as you learn to distinguish a person from others by how he works with the mouse, in what rhythms he types, etc. ... how he rrrz! and got sick.
Why am I. I myself will be glad to hear from those who are in the subject, but do not expect to see something really worthwhile in the answers.
In general, each user should be assigned a set of metrics that identify their behavior. It is not difficult to make the classification itself. It is more difficult to find such metrics that identify "suspicious activity".
And you didn't define the term at all. It has something for everyone.
Создавать свои паттерны параметров исходя из текущей задачи. Основных вопроса два в вашем случае: что \кого именно отслеживаем на предмет аномалий и что именно считаем нормой. Вышеозначенные корпорации конечно имеют лучшие мозги мира в доступе, но лучшие мозги мира обычно начинают с простых вещей и строят методологию исходя из базовых моментов: что следим, зачем следим, что с нашей точки зрения аномалия + сбор стат. данных. Если к примеру вы строите свой сайт и хотите следить за юзерами на манер ВКонтакта (история входов с параметрами браузер+IP) то с этого и начинайте.
Начать надо с того, что определить - что есть "подозрительная активность" и что делать если она обнаружена :) Вот например, Стахановец ее может обнаруживать - так как он ее понимает. На тестовом компе я делаю переименование каталога, содержащего множество подкаталогов - например профиля Thunderbird - и получаю оповещение о подозрительной файловой активности!
В общих чертах, это - обучение без учителя. Когда агент пытается ответить на вопрос, насколько очередной пример похож на примеры из обучающего набора данных. Вот только что и как кодировать - большооой вопрос.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question