I
I
Igor Kolontar2017-05-04 10:58:23
Machine learning
Igor Kolontar, 2017-05-04 10:58:23

Am I training the model correctly?

Hello. There was a need to make a prototype model. The model should predict the probability of fraud (fraud) when fulfilling customer orders. As data for analysis - numbers and boolean parameters (the number of reassignments, whether it is due, etc.), which, in my opinion, can affect the likelihood of a fraud. Uploaded a sample of 140 confirmed fraud cases to azure. After training the model and testing on the same sample (70/30 ratio), the evaluation of the model shows that all values ​​are true positive. When testing the model on a sample of 30,000 already unknown claims, the evaluation of the model reports that all rows are false positive. I read a lot of documentation, tried different combinations of algorithms and selectable data, but I can’t get a sane result. I understand, that there is still very little data for training and you need at least 20 times more. But am I doing the right thing and how to understand the results correctly? With such a small sample for training, I won’t achieve anything at all, or am I setting up the model incorrectly in azure ml studio, and still something sane can be obtained at least for a raw prototype?
The algorithm that was last used is two classes boosted decision tree.
I would be grateful at least for comments from people who also tried to do something like this. Thank you!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
vasiliev, 2017-05-06
@Kotsubid

Did you only send examples of fraud for training or something, without examples of "non-fraud"? It is necessary to send both positive and negative examples for two-class classification. The amount of data is desirable more, of course, but even with a small amount of correct data, there should not be results like yours.

S
Sergey, 2017-05-04
@begemot_sun

There should be more data. 140 is nothing.
Alternatively, you can generate data.
We take each line, and change one parameter. That. from one line you can generate 10-20 new ones.
But this is so - a method on the forehead.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question