# Thread: A Better Linear Regression

1. ## A Better Linear Regression

I am trying to regress y=3x with one bad data point.
x = 0,2,4,6,8 y=0,6,12,18,8

The linear regression is y=1.4x + 3.2 and it properly minimizes the least square error, but I would like to find a regression that better fits the real data and gives less emphasis to the bad data (x=8). In the real world, it is not known which is the bad data, so weighting is not an acceptable option.

If I look at the area between the curves, that area appears to be minimized by fitting through the correct data which is the answer I would be looking for.

Can anyone refer me to regregression equations (and derivation would be great) that minimizes the area between a linear fit and the experimental data (x independent)? I want to learn how to do this, so a MatLab solution does not help.

2. Originally Posted by TheWizard
I am trying to regress y=3x with one bad data point.
x = 0,2,4,6,8 y=0,6,12,18,8

The linear regression is y=1.4x + 3.2 and it properly minimizes the least square error, but I would like to find a regression that better fits the real data and gives less emphasis to the bad data (x=8). In the real world, it is not known which is the bad data, so weighting is not an acceptable option.

If I look at the area between the curves, that area appears to be minimized by fitting through the correct data which is the answer I would be looking for.

Can anyone refer me to regregression equations (and derivation would be great) that minimizes the area between a linear fit and the experimental data (x independent)? I want to learn how to do this, so a MatLab solution does not help.

In real life no one will regress from a give equation and end up with one bad number. If one ended with a bad number and force it to fit, it will give nothing but false information.

Suppose that the number you obtained from experiment is x = 0,2,4,6,8 y=0,6,12,18,8 and you know that they are close enough as anticipated based on your hypothesis, then you should have a good feel for which equation to fit.

Suppose that you did your experiment with no clue about the outcome, and you got x = 0,2,4,6,8 y=0,6,12,18,8. Then you must first graph you scattered diagram. If the diagram tells you that it is close to a straight line, you use straight-line equation. To confirm the goodness of fit, you find the coefficient of correlation. If the coefficient of correlation is too poor, you can go to next degree curve. You go to fit the quadratic curve. If quadratic curve’s coefficient of correlation is still poor, you can to up the next degree, i.e. the Cubic curve. Segue; you can go up to the most exotic Logistic curve.