# Thread: Linear Regression: Mean square error (MSE)

1. ## Linear Regression: Mean square error (MSE)

Simple linear regression model:
Y_i = β0 + β1*X_i + ε_i , i=1,...,n
where n is the number of data points, ε_i is random error

Let σ^2 = V(ε_i) = V(Y_i)

Then an unbiased estimator of σ^2 is
s^2 = (1/n-2)[∑(e_i)^2]

where e_i's are the residuals

s^2 is called the "mean square error" (MSE).

My concerns:
1) The GENERAL formula for sample variance is s^2 = (1/n-1)[∑(y_i - y bar)^2], it's defined on the first pages of my statistics textbook, I've been using this again and again, now I don't see how this general formula (which always holds) can reduce to the formula in red above? How come we have (n-2) and e_i in the formula for s^2?

2) From what I've learnt in previous stat courses, the "mean square error" of a point estimator is by definition
MSE(θ hat) = E[(θ hat - θ)^2]

Is this the same MSE as the one in red above? Are they related at all?

Any help is greatly appreciated!

note: also under discussion in Talk Stats forum

2. My textbook also says that the sample s^2 = (1/n-1)[∑(y_i - y bar)^2] has n-1 in the denominator because it has n-1 degrees of freedom.

And s^2 = (1/n-2)[∑(e_i)^2] has n-2 in the denominator because it has n-2 degrees of freedom.
Now I am puzzled...what is "degrees of freedom"? Why does it have n-2 degrees of freedom? What is the simplest way to understand this?

Thanks!