C
C
Chichi2016-01-26 06:19:22
Machine learning
Chichi, 2016-01-26 06:19:22

Which regression algorithm to choose for noisy (scattered) data?

I want to build a regression with multiple variables (multiple features). In my data, I have n = 23 variables and m = 13000 training examples. Here is a graph of my training data (apartment size vs price):
9c15f08184ed46a98306e3237e8aeee5.png
13,000 training data are plotted here. As you can see, this is quite noisy data. My question is which regression algorithm is more suitable and reasonable to use in my case. I mean is it logical to use a simple linear regression or is it better to use some non-linear regression algorithm.
For clarity, I will give examples. Here is an abstract example of linear regression:
345851461a5040f0830f1dc517a13873.png
And also an abstract example of non-linear regression: 30fc6dd11c1b41c4965c0d887050aca7.png
And here are examples with hypothetical regression lines for my data:d5c4dfe89ce94f1e99e84fb94a69697f.png
As far as I understand, primitive linear regression for my data will produce a large total error (error cost), since this data is noisy and scattered. On the other hand, there is also no distinct non-linear dependence (for example, sinusoidal) here. Which regression algorithm is more rational to use in my case (prices for apartments) in order to get a more accurate price prediction. And why is this algorithm (linear or non-linear) more rational?
Addendum:
This is how my graph of the linear dependence of the price on all 23 parameters looks like, displayed on the price-area data:
51e576e9520c4f85adf5e7bdc8d21c21.jpg
I don't know what the NON-linear dependence would look like in this case. And it would be more rational than linear.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
V
Vasily, 2016-01-26
@Foolleren

and you make an equation like y=t; t= sum(k1*x1^2+k2*x1+k3)

�
âš¡ Kotobotov âš¡, 2016-01-26
@angrySCV

what the hell is on the graph all the parameters are mixed?
it is quite obvious that each parameter contributes to the model with a certain weight, you need to choose weights for these parameters.
start with a linear relationship, then you can go to polynomials of the 2nd or 3rd order if you really can't wait.
you can estimate the total error (cost) everywhere, which model with a smaller error, this type is better.

A
Alexander, 2016-01-26
@Grebenshchikov_Alex

take the average of each vertical unit.
then you can try to change the step in order to see the overall trend.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question