I have no idea what this has to do with least sqaures.
But
and by definition , so .
Thus .
Hello everyone!
My question is about the steps one takes to derive least square estimators. Its a question which is part of a topic I am studying at uni. Obviously its important I do this myself and learn how to, so I am only posting the first 2 questions, hopefully with some help I can do the rest myself and maybe check my answers with the brains in this forum
Thanks
Thanks very much for the help.
I know what you mean - this doesnt have relevance to least squares - yet. The question is sort of leading me by the hand through the process to derive least squares estimators.
This was only the beginning of the questions, and I didnt really think posting all the questions would be the right thing to do.
Haha!
I mean I would feel bad by posting the entire thing - sort of guilty and that I should do it myself to learn.
I think what throws me out is the sigmas, they confuse me a lot. Looking at your answer its really easy to follow but I just didnt even think to do that.
The part I am struggling with right now is:
Just getting my head around what it means is proving to be difficult, a/b are just constants (could be any number)? Is the idea to try to make LHS(the bit on the left with y's) = RHS? How would I start off doing this?
Few correction....
can't use i and k, only one
beta_1 not i
x_i not x_1....
AND most importantly . is an unknown parameter and is a rv that estimates it.
IF you are asking how to show these are the LSEs just differentiate L wrt the two parameters and set equal to zero. This is just calc one.
Thanks for your help MathEagle, I am sort of struggling with the concept, so to get the proof rolling i have to:
Minimise
Thus
Then derive the OLS estimates of and in order to obtain the least sqaures estimators that was given in the question?
Thanks for your time.
For the complete derviation of the least square estimates, look at
Simple linear regression - Wikipedia, the free encyclopedia
I was learning this last week. The proof is very understandable and complete, and it also includes the 2nd dervative test which is great.
Good luck!
The best way to solve least squares for ANY model is to use matrices.
Writing the model as
where is a column vector of your responses, a column vector of ALL of you parameters and , your design matrix
the least squares solution is .
Now is unbiased for and SSE =
and the unbiased estimator of is MSE=SSE/(n-(k+1)), where k+1 is just the number of parameters.
Typically, to estimate V(X_i), we use the sample standard deviation S^2 = (1/n-1)[∑(X_i - X bar)^2].
Now, by the definition of variance, V(ε_i) = E[( ε_i-E(ε_i) )^2], so to estimate = V(ε_i) in the context of simple linear regression, shouldn't we use MSE = S^2 = (1/n-2)[∑(ε_i - ε bar)^2] ? This form looks much more similar and analogous to the formula for sample standard deviation above (compare the parts in red).
However, I know that the estimator of V(ε_i) [which is the MSE] is based on the residuals e_i, not ε_i. What is the reason behind it? What explains the discrepency?
Thank you~
Nope, this is a totally different animal.
Here SSE where n is the number of observations and p is the number of parameters.
BUT, if your model is , then MSE is SSE/(n-1).
It's the distributions that count.
We need our t distribution and F's in order to make tests and obtain confidence and prediction intervals.
S^2 = (1/n-1)[∑(X_i - X bar)^2] (estimator for V(X_i), the GENERAL formula for sample standard deviation used throughout ch.1-10 in Wackerly, which I believe it ALWAYS holds)
S^2 = (1/n-2)[∑(Y_i - Y_i hat)^2] (estimate for V(ε_i) = V(Y_i) )
Why are we using Y_i hat here instead of Y bar(the sample mean)? The sample mean Y bar is always the best unbiased estimator of the population mean μ = E(Y_i), so shouldn't we always use Y bar in calculating the sample standard deviation? What makes V(ε_i) so different from ch.1-10 in Wackerly?
Thank you~