Answer the question
In order to leave comments, you need to log in
Which model to choose for estimating the salary of a technology developer?
I downloaded the StackOverlow survey data, there is 60mb of csv data. Of these, 6K records for the Russian Federation, including developers' salaries, technologies used and age. Each case can have one or more technologies. Based on these data, I want to make a simple site where the user enters technology, age and can evaluate his salary in the market.
Problem: I don't know which model to use for stretching data. Tried converting technology labels to binary columns:
js css java ... salary
0 0 1 2k
1 0 0 2.3k
etc
Answer the question
In order to leave comments, you need to log in
Maybe take the most similar sets of skills and average their salaries, taking into account the “distance” from the sample? Those. no ML, just search.
For example, a salary is searched for a set of skills [A, B, C]. Found in the database with at least 2 of the required skills:
A, B, C: $X 1 (exact match, distance 0)
A, B, C, D: $X 2 (1 extra skill)
A, C: $X 3 (1 skill missing)
A, C, F: $X 4 (1 extra, 1 missing = distance 2)
"Distance" is the number of skills that differ (extra + missing). For example, squaring the distance of the set to the required one and dividing by (1 + Dist 2 )
Expected salary: ($X 1 /(1+0) + $X2 /(1+1 2 ) + $X 3 /(1+1 2 ) + $X 4 /(1+2 2 ) ) / 4
Or to deviate more sharply from the left data: divide by the number e to the power of Dist.
($X 1 /e 0 + $X 2 /e 1 + $X 3 /e 1 + $X 4 /e 2 + ... + $X n /e Dist n ) / n
It is necessary to solve a system of linear equations and find the technology complexity coefficients (age has nothing to do with it):
k11*x11+...+k1N*x1N=b1
.....
kN1*xN1+...+kNN*xNN=bN,
where kNN - technology complexity coefficients xNN Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question