Answer the question
In order to leave comments, you need to log in
How to evaluate the work of a function that calculates the probability of an event?
There is a function that calculates the probability of an event occurring (Yes or No). We carry out a series of calculations and correlate them with real data.
For example:
Estimated probability/Event occurred
0.9/Yes
0.4/Yes
0.8/No
0.1/No
0.5/Yes
0.8/Yes
How to evaluate the work of a function that calculates the probability of an event?
Answer the question
In order to leave comments, you need to log in
Alternatively, run the same test N times. The test result is either event E is executed or not.
Capture each time an event occurs. We get the number N'.
If the probability of occurrence of the event is P, then the number of triggers of the event with an infinitely large N is equal to N
*
P P + T
if after N trials the inequality is satisfied - the function adequately calculated the probability of an elementary event
. The larger N, the smaller the tolerance T can be applied.
LogLoss is not very good here because your model produces binary answers, and not the probabilities of the event. Look towards ROC-AUC. Unlike LogLoss, it always lies in the segment [0.5;1]. A value of 0.5 is equivalent to flipping a coin.
UPD. I messed up. Events are binary, and the model gives probabilities
If you are referring to the task of comparing two or more different function implementations, then you most likely need LogLoss . Please note that if you cannot apply different functions on the same data (that is, the check is carried out on different samples), then they must be sufficiently large and homogeneous, otherwise the comparison results will be unreliable.
Then the question arises. What could be the disadvantages of the function?
1. Bias towards "yes" or bias towards "no".
For all the events that happened, we multiply the probabilities respectively. outcome. The same goes for all those who didn't. We divide one by another, ideally we should get a unit.
You can work with logarithms, even very wide statistics will not overflow: there are six bytes per mantissa in double, and there are no two bytes by an order of magnitude. You can reset the order: we got, say, a number less than 1e−50 - multiply by this figure, remember: 50 orders in mind.
If there are a lot of statistics, it is worth clustering the inputs and calculating these statistics for each cluster.
2. Overconfidence. The function says "0.9" while the probability is maximum 0.7. I think it can be solved by the very clustering on the output of the function.
3. Insecure work. The function does not give statistical anomalies, but just works unsteadily, giving out "who the hell knows" too often. Vlad_Fedorenko suggests the area under the ROC curve. I would simply suggest the product of the probabilities of the respective outcomes. For example, if we have 6 starts, we can say: "probability is always 0.5", and get a result of 1/64≈0.016. And we can say: for three launches, the probability is 2/3, and for three - 1/3. If this happens, the result is 2 4 / 3 6 ≈ 0.022. What the normalizing coefficient will be, I can’t say yet.
UPD3. You can also try information entropy.
UPD4. This design, perhaps, will cope with excessive self-confidence. If she tells these triplets 0.9 and 0.1, it will be (0.9 0.9 0.1)² < 0.01.
UPD5. The ideal is, of course, 1 (the event is told by 100% if it happens, and 0% if it doesn't).
UPD. Corrected 1 - I meant it, but I missed it.
UPD2. Added insecure work.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question