How are such tasks solved and what tool is better?

D

diiimonn2019-02-12 17:03:05

Mathematics

diiimonn, 2019-02-12 17:03:05

Greetings!
There is a data set: Age, Gender, Language.
There is a knowledge that:
[50, F, ru] is not good.
[40, F, ru] - is the best combination.
[35, F, ru] - a little worse.
[35, M, ru] - even worse.
[20, M, en] - very bad.
It is necessary to apply a combination to the input and get the percentage of matching with the best one. I don't know how to approach the problem. I have PHP FANN available, but I can still work with Python.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

R

Roman Mirilaczvili, 2019-02-12
@diiimonn

You need to encode the input data of the vector.
Age can be normalized in the range of 20..100 years (well, or less). So 100 years is 1.0 and 20 years is 0.0. Everything in between is a fraction of the maximum, given the lower threshold. By the way, for simplicity, any age above the maximum can be taken as a maximum. The same goes for the minimum.
Gender is encoded simply: 0 (M) or 1 (F)
Language is encoded based on a set of languages. If in the simplest case of only 2, then it is similar to gender coding. If more than 2, then in order to expand the set of languages, it makes sense to encode with a vector.
Suppose there is a set of languages Rus, Eng, Jap, then the Jap language is encoded as a vector (0, 0, 1), where the order of the languages is important.
Thus, for [50, F, ru]
Age: (50-20)/(100-20)=0.375
Gender: 1
Language: (1, 0, 0)
We build the data in order into the final input vector: (0.375, 1, 1, 0, 0).
Each verbal description of the result in terms of gradation from bad to excellent is given a score from 0 to 1 (in percent).
Thus, for each input vector, we obtain the result of the gradation evaluation.
If the data is not contradictory, then training on a set of representative data (80%), in the end, you can check the correctness of the trained model by testing the remaining 20% of the data.
In addition to machine learning, there are also statistical models, decision tables, decision trees, and many other interesting ways to solve a problem. I guess in banks when issuing a loan and calculating risks in insurance companies is done not only by machine learning, since it can only be true for a certain set of data.