Can someone please explain how to fit a simple regression model?
My professor believes watching an old 80's telecourse will teach us statistics, and that Excel will do all the work so he doesn't explain anything. My basic stats knowledge only goes so far on this one.
I have a data set of baseball stats (salary, hits, home runs, etc) for 346 players. I already have the mean, median, standard deviation, max and min for the data (provided by Excel).
The problem says to fit the simple regression model :
salary=β0 + β1 (home runs) + εi
Obtain the R2 results, the regression coefficients, the standard deviations of β0 and β1, and the t-statistics. Use the regression option in the data analysis package found in Excel's Tools menu.
When I use the regression option it spits out a huge sheet but I don't understand where to find the info. I also know there is a way to do this with an equation isn't there?
There is no textbook for the class, he says it is self explanatory using Excel.
Can anyone help me with this by providing some formulas, or explaining what β0 , β1 (home runs) , or εi are?
My reading of the problem
Here's a guess as to what the problem is asking for.
You have a lot of data saying what each player's salary is, and how many home runs they've hit.
A simple model would say that the more home runs the player hit, the higher the salary they should get. For example,
S = HR (HR = # home runs, S = salary in $1000)
would say that for each home run they hit, they get $1000. So a person who hit 5 home runs would get $5000, a person who hit 8 home runs gets $8000, etc.
You might have a more sophisticated model, e.g.,
S = $5000 + 2000($/HR)*HR.
So they all get a base salary, and for every home run they hit, get an extra $2000. In this model, a person who hit no home runs gets $5000, and a person who hits 7 home runs gets $19000.
Instead of making the model first, and then calculating salaries, we could start off with the data you have (try INSERT CHART in Excel) and figure out if there's a straight line through the data. If there is, it will give you a model of the form:
S = A + B*HR
Your A and B are regression coefficients. In the model of the previous paragraph, A = $5000 and B = $2000/HR.
Most real world data doesn't come on a nice straight line. Instead, there's a small difference between each data point and the nice straight line. We put all of this in a fudge factor called the error on each data point, possibly called Ei (e sub i). Then your model would be:
S = A + B*HR + Ei
and of course it would fit perfectly because the Ei takes care of all the inaccuracies.
So hopefully this matches the model your professor is asking you to use.