Is it possible to assess the relative importance of features?

S

sha2562019-01-14 16:11:24

big data

sha256, 2019-01-14 16:11:24

Friends, hello everyone!
I ask you to help me figure out the problem statement, what is the point:
There is a set of data on the sale of goods in a clothing store. How can one evaluate the influence of criteria on decision making in each specific case?
Men, for example, are guided by the material of the product, women are more focused on color. How to rank these conditions?
The first thing I thought was:
Gradient Boosting feature_importance
But it doesn't fit because it's doing an estimate for the entire training set. It is necessary for me concerning each example.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

A

Arseny Kravchenko, 2019-01-15
@sha256

https://github.com/TeamHG-Memex/eli5

T

Therapyx, 2019-01-14
@Therapyx

Perhaps you mean this?
https://www.datarobot.com/wiki/feature-impact/
https://stats.stackexchange.com/questions/38831/wh...
https://en.wikipedia.org/wiki/Feature_selection
If yes, then in my memory and from the links above - this is almost unrealistic. In our project, the question also seemed to be raised, they say, how to identify the most important factors influencing the final result from thousands of different sources. But it seems that they did not come to a decision.
But specifically in your case, I would generally think very hard - is machine learning necessary at all?
You can also think about the classification of product parameters and make a relation to the final election of men and women.

D

dmshar, 2019-01-14
@dmshar

I hope you are familiar with the concept of "correlation". At the same time, the correlation is not only classical, Pearsonian, measured on quantitative data, but also special, adapted to work with rank data (Kendell, Spearman correlations), with nominal data, with dichotomous data, and with their combinations.
Thus, for your example, it is possible to formally determine that, for example, in the group of men with the binary attribute "bought / did not buy", the nominal attribute "material" is more correlated than the nominal attribute "color", and vice versa in the group of women.
The problem is quite well known and well described in any course of modern statistical analysis.

A

abbaboka, 2019-01-14
@abbaboka

Men, for example, are guided by the material of the product, women are more focused on color. How to rank these conditions?

I would start with an adequate definition of feedback.
And not with the technical implementation.
Well, that's why you think that exactly as you wrote? For what reasons?
Why don't you consider intermediate values?