T
T
The Whiz2018-07-19 17:31:22
Analytics
The Whiz, 2018-07-19 17:31:22

What algorithm can detect anomalies on a chart?

I have data on visits to a number of pages of the site, during the last 30 days. Looks something like this:

Страница 1: [1,2,0,4,6,1,7,4,7]
Страница 2: [3,4,12,1,7,1,2,0]

There are many such pages. I need to isolate pages that have experienced an unusual influx or churn of users at any given time. Which algorithm or sequence of algorithms is best for this problem?
UPD: while I'm looking at the anomalies detection machine learning algorithm, but perhaps there is a faster option, for example (thinking out loud) you can split data arrays into several equal parts and compare their percentage fluctuations. If everything is within 0, then we can assume that there are no anomalies, if there is a jump somewhere, then something went wrong. Most likely, I will do so.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
R
Roman Mirilaczvili, 2018-07-19
@modernstyle

It is possible to calculate the dispersion in a certain period of time (window) and if the value is outside the average +/- 3σ, then this is probably an anomaly.
https://www.slideshare.net/YoshihiroIwanaga/anomal...
https://stackoverflow.com/questions/2303510/recomm...

E
Evgeny Panin, 2018-07-19
@varenich

This approach will do nothing.
Learn cohort analysis and Lean metrics. They are also called AAARR.

D
dmshar, 2018-07-19
@dmshar

You can, of course, do the invention of the bicycle. And you can gain mind-reason, starting with theory. Moreover, it will come in handy in your life, because. the task that you described is found in various forms in economics, information security, medicine, technical diagnostics, marketing - including page visiting anomalies, like yours - and in dozens of other subject areas, and having studied this task, you will ensure a real interest in you as a specialist of dozens of employers in the future.
This theory is really called in different ways - "search and detection of anomalies", "changepoint detection", "detection of discords and outliers", etc. In the first approximation, it all comes down to the analysis of time series and classification methods, and the detection of changes in the models that describe the data ( "overshoot", going beyond 3sigma, etc. - these are only the most trivial and naive of the methods that are used today. Only, of course, not "percentage fluctuations"). Moreover, if you want to do everything in a serious way, then you need to study the parameters of the series themselves (not only the average and variance), check the correlation of visits to website pages, identify trends and seasonality, check for clustering in the data, etc. etc. etc.
Well, you can, of course, "quickly", if only there was something quasi-smart there. But there is something to show the customer. Then yes - they calculated the average, our deviations, drew a beautiful graph, impressed the customer, received rewards, profit. Everyone chooses his own path.

N
Nurlan, 2018-07-19
@daager

Anomaly detection in network monitoring data... - theory, names of algorithms, useful links

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question