B
B
bellerofonte2017-04-10 11:04:26
data mining
bellerofonte, 2017-04-10 11:04:26

How to efficiently estimate the median of an exponentially distributed quantity?

There is a time series in which server response times are collected. The times are approximately distributed exponentially (more precisely, closer to the gamma distribution). 95% of the values ​​float around some constant value of X, and the remaining 5% of the values ​​can exceed the value of X by several thousand times. The task is to recalculate the estimate X for each new value of the series. A simple estimate of X as the average value of the series is untenable. How to evaluate the value of X in the minimum number of mathematical operations? Only building a histogram comes to mind, but maybe there are some easier / faster ways? You will have to implement it in C ++, so there is no way to use advanced mathematical tools.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
N
Nicholas, 2017-04-10
@healqq

Building a histogram is pretty easy, but the information content there is not the best. Especially in your case.
You explain what you mean by value. Those. what the assessment should say. If the number of long requests is critical for you, then you can take as an estimate the number of requests that deviate from the mathematical expectation by some amount (see Chebyshev's inequality). If you are interested in something else - then indicate, we will think

S
Sergey, 2017-04-10
@begemot_sun

> A simple estimate of X as the mean of the series is untenable.
Why is it impossible to throw out lunges when calculating the average that are N times more / less than the current average?
I think you can find a recurrent formula for calculating the arithmetic mean on a window from the number of readings, or display it in a notebook.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question