β
ε
note: also under discussion in "Talk Stats Forum"
For a linear model with one explanatory variable, $\displaystyle b_{0}, \;\ b_{1}$ are the parameters.
Like in any line equation, say, $\displaystyle y=5x+8$
5 is the slope and 8 is the y intercept.
Without them, you don't have much. Just an arbitrary x value.
$\displaystyle \epsilon$ is the residuals. That is the difference between the actual and estimated y values when we do a line of best fit.
1) But why do we need to ESTIMATE βo and β1? Just do a least-square line of best fit on the scattered plot, then we have a line, so we know the values of βo and β1. Why are βo and β1 unknown? This is what I don't seem to understand.
I can see the difference between population mean and sample mean. So I guess βo and β1 are the true POPULATION parameters while bo and b1 are the sample estimates?? But what is the "population" in this case?
2) I think I have a pretty clear concept and picture in my mind of what a "residual" is. I can see a scattered plot with lots of points and a fitted line. The residual for each point is just the (signed) vertical distance or vertical deviations between each point and the fitted line.
However, I still don't understand what a random error (ε) is. What is the meaning of it? How can we calculate the value of ε? And how can it be displayed graphically?
Thank you for answering!
$\displaystyle \hat \beta_0$ and $\displaystyle \hat \beta_1$ are unbiased estimators of the unknown parameters $\displaystyle \beta_0$ and $\displaystyle \beta_1$.
Just like you will never know $\displaystyle \mu$ , but you can estimate it based on some data via $\displaystyle \bar X$.
1) OK, so Yhat = b0 + b1*X is the sample regression equation based on our observed data points (observed sample) and
E(Y) = β0 + β1*X is the population regression equation based on the entire population.
For example, if we have height v.s. age (Y v.s. X). The population is ALL the data points from the ENTIRE population and we can IMAGINE a population line of best fit going through all those data points, but we will never actually know what it is (and we will never know the exact values of β0 and β1). And the sample would be, say, 10 data points, so the scattered plot will have 10 points, and the sample line of best fit is based on bo and b1. Right?
1) I think I get it now. In short, we can't get β0 and β1 because we don't have the scatter plot for the ENTIRE population. We can only get bo and b1 based on a particular SAMPLE.
Another question:
3) "E(Y) = β0 + β1*X
Y hat = bo+b1*X
where bo and b1 are estimators of β0 and β1, respectively.
Then Y hat is clearly an estimator of E(Y)"
(i) Why is Y hat clearly an estimator of E(Y)?
(ii) Also, if Y hat is an estimator of E(Y), shouldn't the notation be
^
[E(Y)]
where the hat is taken over the the whole E(Y)? Using the notation Y hat as an estimator of E(Y) doesn't seem to be consistent with the common usage of "hat", a hat above something usually means that it is estimating the thing under the hat, but here we have Y hat instead of "[E(Y)] hat". How come?
Thanks for answering!