# Thread: Regression Help - Using Excel

1. ## Regression Help - Using Excel

I need some help with regression analysis and in particular with the figures which Excel displays.

I have a sheet which lists test scores and I'd like to create a formula for predicting what a candidates first year salary would be based on their test scores.

Code:
Candidate Maths IT   Science History Salary
----------------------------------------------------
Joe     78.7  82.5  81.7  95.0  $20,000 Mary 87.3 85.4 86.6 68.2$24,000
Alison  74.6  83.4  79.2  73.1  $19,000 Sarah 86.8 91.5 90.2 54.2$24,000
Alan    63.5  72.4  74.5  74.2  $17,000 Dane 78.3 62.8 61.0 85.2$19,000
Beth    92.5  97.6  95.4  72.3  $27,000 Eliza 84.7 88.3 87.7 75.5$23,000
Corey   79.3  87.2  87.1  82.2  $21,000 Don 91.3 85.2 86.3 95.4$26,000
What I am trying to do is to use regression to see if I can construct a formula to predict a candidates salary based on their test results.

Salary = A*Maths Score + B*IT Score + C*Science Score + D*History Score

1. How can I determine if each test score is significant? Excel will state figures such as Multiple R, R Squar, Standard Error, t Stat and P Value but what do they mean!

2. How can I determine if the results are independent.

e.g. It's quite likely that Students with high Maths scores could also have high IT scores and thus there could be an error due to both results not being independent.

Here are the figures from Excel

Code:
Regression Statistics
-----------------------
Multiple R	0.98368756
R Square	0.967641215
Standard Error	796.3869571
Observations	10
Code:
ANOVA
df	SS	MS	F	Significance F
Regression	4	94828839.07	23707209.77	37.37938615	0.00064401
Residual	5	3171160.927	634232.1854
Total	9	98000000
What are both tables telling me? Is this a good test or not?

Code:
Coefficients
Intercept	-8559.315329
Maths	313.8348267
IT	-203.3732385
Science	276.2479898
History	-12.81029006
I presume these are the figures I need to use to construct the formula?

Salary = -8559 + 313.8 * Maths -203.3 * IT + 276.2 * Science - 12.8 * History

Also on that last table is shown:

Code:
	Standard Error
Intercept	3539.055274
Maths	41.49049481
IT	148.8447209
Science	145.5849295
History	23.53289212

t Stat
Intercept	-2.418531124
Maths	7.564017449
IT	-1.366344989
Science	1.897504026
History	-0.544356809

P-value
Intercept	0.060226461
Maths	0.000640424
IT	0.230071381
Science	0.116235029
History	0.609583227
Does this tell me if some of the test scores should or should not be used?

TIA

2. I recommend graphing it, using the average test score and the salary, then let excel pick the line of best fit

3. Are these ordered pairs? Or more.

If they are ordered pairs, I have software that does method of least square. And then I can give you all the results quickly if you provide me with the list of points.

4. If I knew what an ordered pair was I could tell you

This is more of a basics learning exercise. I understand what regression is and what it can be used for. What I don't know is what the figures are telling me. I can't find out if xyz having a value of 0.1 is good or bad; I can't find out if abc having a value of 9.99 is good or bad either.

Using the example I gave above what do the figures say?

I think I'm right in assuming that the R Squared figure of 0.967 means that the the relationship between Y and all the X values is 96.7% good?

What I'm really trying to determine is

a) Is this a reliable method of estimating salary when only test score results are known.

b) Which individual test scores will best estimate the salary. In otherwords should I drop History and possibly look to include other test results such as Geography or Art.

c) Are there two results scores which are corelated and thus causing an error in the figures. e.g. is having Maths and IT a bad combination because it could be correlated and thus 'double counting'?

I just don't know what the numbers (the excel results std error, t-test, p-value etc.) are telling me, or what I should be looking for.

5. Originally Posted by biscayne
I need some help with regression analysis and in particular with the figures which Excel displays.

I have a sheet which lists test scores and I'd like to create a formula for predicting what a candidates first year salary would be based on their test scores.

Code:
Candidate Maths IT   Science History Salary
----------------------------------------------------
Joe     78.7  82.5  81.7  95.0  $20,000 Mary 87.3 85.4 86.6 68.2$24,000
Alison  74.6  83.4  79.2  73.1  $19,000 Sarah 86.8 91.5 90.2 54.2$24,000
Alan    63.5  72.4  74.5  74.2  $17,000 Dane 78.3 62.8 61.0 85.2$19,000
Beth    92.5  97.6  95.4  72.3  $27,000 Eliza 84.7 88.3 87.7 75.5$23,000
Corey   79.3  87.2  87.1  82.2  $21,000 Don 91.3 85.2 86.3 95.4$26,000
What I am trying to do is to use regression to see if I can construct a formula to predict a candidates salary based on their test results.

Salary = A*Maths Score + B*IT Score + C*Science Score + D*History Score

1. How can I determine if each test score is significant? Excel will state figures such as Multiple R, R Squar, Standard Error, t Stat and P Value but what do they mean!

2. How can I determine if the results are independent.

e.g. It's quite likely that Students with high Maths scores could also have high IT scores and thus there could be an error due to both results not being independent.

Here are the figures from Excel

Code:
Regression Statistics
-----------------------
Multiple R    0.98368756
R Square    0.967641215
Standard Error    796.3869571
Observations    10
Code:
ANOVA
df    SS    MS    F    Significance F
Regression    4    94828839.07    23707209.77    37.37938615    0.00064401
Residual    5    3171160.927    634232.1854
Total    9    98000000
What are both tables telling me? Is this a good test or not?

Code:
Coefficients
Intercept    -8559.315329
Maths    313.8348267
IT    -203.3732385
Science    276.2479898
History    -12.81029006
I presume these are the figures I need to use to construct the formula?

Salary = -8559 + 313.8 * Maths -203.3 * IT + 276.2 * Science - 12.8 * History

Also on that last table is shown:

Code:
   Standard Error
Intercept    3539.055274
Maths    41.49049481
IT    148.8447209
Science    145.5849295
History    23.53289212

t Stat
Intercept    -2.418531124
Maths    7.564017449
IT    -1.366344989
Science    1.897504026
History    -0.544356809

P-value
Intercept    0.060226461
Maths    0.000640424
IT    0.230071381
Science    0.116235029
History    0.609583227
Does this tell me if some of the test scores should or should not be used?

TIA
The R values are telling you that the model is highly predictive.

The p-values are telling you that most of the prediction comes from the
intercept and the maths score.

Rerun the model using only the intercept and maths scores in the model

Look at the residuals of this and the rerun model, are they significantly different
(they will be worse with the simpler model, so you will have to decide if the
difference is significant)

RonL