# Thread: Regression analysis and the error term

1. ## Regression analysis and the error term

In a regression analysis of the form:

y = a + bx + E

where E is the error term, what is the error term composed of? I know it is the unexplained variation in observed y when using x to predict y, but is it the average variation or the total variation? For example suppose you have a regression analysis that explains most of the variation in observed and you have a million data points. The variation of each is small but the total is large due to the availability of so many data points.

Now compare this to a regression equation with a few data points but a large amount of unexplained variation. The average is large but the total small, at least compared to the total of the million data points. Thefore the equation based on lots of data looks like it has a larger error.

This doesn't seem right to me so is the error term based on the average error or some some similar measure?

2. ## Re: Regression analysis and the error term

it may be clearer to write

$\displaystyle y_i = a + bx_i + e_i$

$\displaystyle e_i$ is the unexplained variation for that particular data point i. It's a random variable which (according to the model) takes different realisations at each data point.

Your comments about "total" errors being large/small in different scenarios are a bit misguided, it's always the case that $\displaystyle \sum e_i = 0$ in any OLS regression fit. (Try proving this if you dont believe me!)

3. ## Re: Regression analysis and the error term

Ah okay I think I understand. So the error term refers to each individual data point, not the total error from all data points combined (which I assume would be zero from the equation you posted).