P
P
Pavel Osipov2013-01-23 19:08:12
data mining
Pavel Osipov, 2013-01-23 19:08:12

User Behavior Statistics Archive

Good afternoon
As part of my PhD and research, I am engaged in the detection of anomalous user behavior ( anomaly detection ) of information systems by creating behavior models. The model itself is already there and on our toy-problem (small artificially generated data) it shows a good detection result. But for a full-fledged study, real data is needed, but they are not.
In this regard, the question is, would someone like to share this type of data? In response, I can share both the results of the study and send already published articles about our approach.

Ideally, we need statistics on the behavior of a large number of users of this kind:
id - ideally, just autoincremented value
user_id
sessinon_id
transaction_id
datetime/timestamp (optional)

Where
user_id is a unique user ID
sessinon_id is a session ID of a user's work in the system the sequence of actions in the base also corresponded to the sequence of their commission).
transaction_id is a unique identifier for one of the possible actions in the system, i.e. for example, obtaining a person's profile is one type of transaction, regardless of whose specific profile is requested. Profile update, already different transaction_id...
datetime/timestamp(optional) - Needed in principle for training models with data in the correct sequence, corresponding to their accomplishment in real life.

And the second table is
user_id
user_role

role (set of roles) of the user within the system. For example, a secretary, an ophthalmologist, a math teacher...

Ideally, it would also be great to have both sets with known correct data, and with data in which anomalous activity is present. For testing and cross-validation… You know, dreaming is not harmful.

If anyone is interested, I will be eternally grateful. And of course, I will share the results of the study

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
Sergey, 2013-01-23
@bondbig

If the project is supposed to be commercial, then bring it to a more or less human form (work out the UI and integration with standard systems to a sane state) and offer at first for free to everyone. If testers come running, you will get tired of overclocking. Data for debugging algorithms will flow like a river.

A
alx49, 2013-01-28
@alx49

Hello Pavel!
Could you share articles that have already been published? Your topic is very interesting!
Thank you!

E
ezavialov, 2013-11-16
@ezavialov

Yandex has an Internet mathematics competition. They periodically post parts of the logs of user behavior on search results there. For example, http://switchdetect.yandex.ru/datasets

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question