Thread: Deriving the Least Squares Estimates

1. Deriving the Least Squares Estimates

Hello everyone!

My question is about the steps one takes to derive least square estimators. Its a question which is part of a topic I am studying at uni. Obviously its important I do this myself and learn how to, so I am only posting the first 2 questions, hopefully with some help I can do the rest myself and maybe check my answers with the brains in this forum

Thanks

2. I have no idea what this has to do with least sqaures.

But $\displaystyle \sum_{k=1}^n\bar x=\bar x\sum_{k=1}^n =\bar xn$

and by definition $\displaystyle \bar x={\sum_{k=1}^nx_k\over n}$, so $\displaystyle n\bar x=\sum_{k=1}^nx_k$.

Thus $\displaystyle \sum_{k=1}^nx_k=n\bar x=\sum_{k=1}^n\bar x$.

3. Thanks very much for the help.

I know what you mean - this doesnt have relevance to least squares - yet. The question is sort of leading me by the hand through the process to derive least squares estimators.

This was only the beginning of the questions, and I didnt really think posting all the questions would be the right thing to do.

4. the suspense is killing me

5. Haha!

I mean I would feel bad by posting the entire thing - sort of guilty and that I should do it myself to learn.

I think what throws me out is the sigmas, they confuse me a lot. Looking at your answer its really easy to follow but I just didnt even think to do that.

The part I am struggling with right now is:

Just getting my head around what it means is proving to be difficult, a/b are just constants (could be any number)? Is the idea to try to make LHS(the bit on the left with y's) = RHS? How would I start off doing this?

6. I figured this was next and I was going to state it yesterday.
All you need to do is expand the sum.
BUT, from ......$\displaystyle \sum_{k=1}^nx_k=\sum_{k=1}^n\bar x$ we have....

$\displaystyle \sum_{k=1}^n(x_k-\bar x) =\sum_{k=1}^nx_k-\sum_{k=1}^n\bar x=0$.

7. Deriving the Least Squares Estimates

Speaking of this topic, I am having trouble with this question:

Prove that by choosing $\displaystyle b_0$ and $\displaystyle b_1$ to minimize $\displaystyle \sum_{k=1}^n (y_i - b_0 - b_1x_1)^2$ you obtain the least squares estimators, namely:

$\displaystyle b_1=beta_1={\sum_{k=1}^n(x_i-\bar x)(y_i-\bar y)\over \sum_{k=1}^n(x_i-\bar x)^2}$

$\displaystyle b_0=beta_0=\bar y-b_i\bar x$

wow that syntax was a pain to work out...

Thanks for any help.

8. Few correction....
can't use i and k, only one
beta_1 not i
x_i not x_1....

Originally Posted by Campari

Prove that by choosing $\displaystyle b_0$ and $\displaystyle b_1$ to minimize $\displaystyle L=\sum_{i=1}^n (y_i - b_0 - b_1x_i)^2$ you obtain the least squares estimators, namely:

$\displaystyle b_1={\sum_{i=1}^n(x_i-\bar x)(y_i-\bar y)\over \sum_{i=1}^n(x_i-\bar x)^2}$

$\displaystyle b_0=\bar y-b_1\bar x$

AND most importantly $\displaystyle b_1\ne \beta_1$. $\displaystyle \beta_1$ is an unknown parameter and $\displaystyle b_1$ is a rv that estimates it.

IF you are asking how to show these are the LSEs just differentiate L wrt the two parameters and set equal to zero. This is just calc one.

9. Thanks for your help MathEagle, I am sort of struggling with the concept, so to get the proof rolling i have to:

Minimise $\displaystyle SSE \hat {\beta}_0, \hat {\beta}_1, = \sum_{i=1}^n (y_i-\hat {\beta}_0-\hat {\beta}_1x_i)^2$

Thus

$\displaystyle {\delta SSE \over \delta \hat {\beta}_0} = 2 \sum_{i=1}^n (y_i-\hat {\beta}_0-\hat {\beta}_1x_i) (-1) = 0$

$\displaystyle {\delta SSE \over \delta \hat {\beta}_1} = 2 \sum_{i=1}^n (y_i-\hat {\beta}_0-\hat {\beta}_1x_i) (-x_i) = 0$

Then derive the OLS estimates of $\displaystyle \beta_0 , \hat {\beta}_0$ and $\displaystyle \beta_1 , \hat {\beta}_1$ in order to obtain the least sqaures estimators that was given in the question?

10. For the complete derviation of the least square estimates, look at
Simple linear regression - Wikipedia, the free encyclopedia

I was learning this last week. The proof is very understandable and complete, and it also includes the 2nd dervative test which is great.

Good luck!

11. The best way to solve least squares for ANY model is to use matrices.

Writing the model $\displaystyle y=\beta_0+\beta_1x_1+\cdots\beta_kx_k+\epsilon$ as $\displaystyle Y=\beta X+\epsilon$

where $\displaystyle Y$ is a column vector of your responses, $\displaystyle \beta$ a column vector of ALL of you parameters and $\displaystyle X$, your design matrix

the least squares solution is $\displaystyle \hat\beta =(X^tX)^{-1}X^tY$.

Now $\displaystyle \hat\beta$ is unbiased for $\displaystyle \beta$ and SSE =$\displaystyle Y^tY-\hat\beta X^tY$

and the unbiased estimator of $\displaystyle \sigma^2$ is MSE=SSE/(n-(k+1)), where k+1 is just the number of parameters.

12. But back to your basic model of $\displaystyle y=\beta_0+\beta_1x+\epsilon$

the SSxy term can be written three ways since....

$\displaystyle \sum_{k=1}^n(x_k-\bar x) =0$ or better yet $\displaystyle \sum_{k=1}^n(y_k-\bar y) =0$.

SSxy can be written as $\displaystyle \sum_{k=1}^n (x_k-\bar x) (y_k-\bar y)$

or $\displaystyle \sum_{k=1}^n (x_k-\bar x) y_k$ which is useful when deriving the statistical properties of the beta's

or $\displaystyle \sum_{k=1}^n x_k(y_k-\bar y)$.

13. Originally Posted by matheagle
and the unbiased estimator of $\displaystyle \sigma^2$ is MSE=SSE/(n-(k+1)), where k+1 is just the number of parameters.
Typically, to estimate V(X_i), we use the sample standard deviation S^2 = (1/n-1)[∑(X_i - X bar)^2].

Now, by the definition of variance, V(ε_i) = E[( ε_i-E(ε_i) )^2], so to estimate $\displaystyle \sigma^2$ = V(ε_i) in the context of simple linear regression, shouldn't we use MSE = S^2 = (1/n-2)[∑(ε_i - ε bar)^2] ? This form looks much more similar and analogous to the formula for sample standard deviation above (compare the parts in red).

However, I know that the estimator of V(ε_i) [which is the MSE] is based on the residuals e_i, not ε_i. What is the reason behind it? What explains the discrepency?

Thank you~

14. Nope, this is a totally different animal.

Here SSE $\displaystyle \sim\sigma^2\chi^2_{n-p}$ where n is the number of observations and p is the number of parameters.

BUT, if your model is $\displaystyle y=mx+\epsilon$, then MSE is SSE/(n-1).

It's the distributions that count.
We need our t distribution and F's in order to make tests and obtain confidence and prediction intervals.

15. S^2 = (1/n-1)[∑(X_i - X bar)^2] (estimator for V(X_i), the GENERAL formula for sample standard deviation used throughout ch.1-10 in Wackerly, which I believe it ALWAYS holds)

S^2 = (1/n-2)[∑(Y_i - Y_i hat)^2] (estimate for V(ε_i) = V(Y_i) )

Why are we using Y_i hat here instead of Y bar(the sample mean)? The sample mean Y bar is always the best unbiased estimator of the population mean μ = E(Y_i), so shouldn't we always use Y bar in calculating the sample standard deviation? What makes V(ε_i) so different from ch.1-10 in Wackerly?

Thank you~

Page 1 of 2 12 Last