What are the algorithms for finding the optimal sample/slice?

I

ifaceman2014-04-29 10:50:06

Database

ifaceman, 2014-04-29 10:50:06

Welcome all!
There is a database for users with n-th number of fields containing information about them (gender, age, occupation, etc.). The fields are filled randomly for each user or not filled at all.
There are also some statistics for each user (for example, the number of system logins per month).
Accordingly, for any combination of parameters, it is possible to compile average statistics (for example: gender+age, gender+age+marital_status, ...+...+*) - 30-year-old men logged into the system on average 32 times a month. Thus, a cut is formed.
What algorithms are there to determine the slice that most closely matches a particular user? That is, knowing certain data about him, we can assume about his statistics, looking at the average for the most appropriate cut.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Andrew, 2014-04-29
@ifaceman

k-Nearest Neighbors (kNN)
Your task in terms of this algorithm corresponds to the questions:
1) how to adjust the weights (significance) of the influence of parameters on the distance between neighbors
2) which kernel to choose
3) how to determine the optimal k for this kernel
All three have specific answers in the form of algorithms - there is a lot of literature.

A

Andrey Vershinin, 2014-04-29
@WolfdalE

Regression analysis