# Regression Help - Using Excel

• Nov 12th 2006, 04:26 AM
biscayne
Regression Help - Using Excel
I need some help with regression analysis and in particular with the figures which Excel displays.

I have a sheet which lists test scores and I'd like to create a formula for predicting what a candidates first year salary would be based on their test scores.

Code:

```Candidate Maths IT  Science History Salary ---------------------------------------------------- Joe    78.7  82.5  81.7  95.0  \$20,000 Mary    87.3  85.4  86.6  68.2  \$24,000 Alison  74.6  83.4  79.2  73.1  \$19,000 Sarah  86.8  91.5  90.2  54.2  \$24,000 Alan    63.5  72.4  74.5  74.2  \$17,000 Dane    78.3  62.8  61.0  85.2  \$19,000 Beth    92.5  97.6  95.4  72.3  \$27,000 Eliza  84.7  88.3  87.7  75.5  \$23,000 Corey  79.3  87.2  87.1  82.2  \$21,000 Don    91.3  85.2  86.3  95.4  \$26,000```
What I am trying to do is to use regression to see if I can construct a formula to predict a candidates salary based on their test results.

Salary = A*Maths Score + B*IT Score + C*Science Score + D*History Score

1. How can I determine if each test score is significant? Excel will state figures such as Multiple R, R Squar, Standard Error, t Stat and P Value but what do they mean!

2. How can I determine if the results are independent.

e.g. It's quite likely that Students with high Maths scores could also have high IT scores and thus there could be an error due to both results not being independent.

Here are the figures from Excel

Code:

```Regression Statistics -----------------------        Multiple R        0.98368756 R Square        0.967641215 Adjusted R Square        0.941754187 Standard Error        796.3869571 Observations        10```
Code:

```ANOVA         df        SS        MS        F        Significance F Regression        4        94828839.07        23707209.77        37.37938615        0.00064401 Residual        5        3171160.927        634232.1854                Total        9        98000000```
What are both tables telling me? Is this a good test or not?

Code:

```Coefficients Intercept        -8559.315329 Maths        313.8348267 IT        -203.3732385 Science        276.2479898 History        -12.81029006```
I presume these are the figures I need to use to construct the formula?

Salary = -8559 + 313.8 * Maths -203.3 * IT + 276.2 * Science - 12.8 * History

Also on that last table is shown:

Code:

```        Standard Error Intercept        3539.055274 Maths        41.49049481 IT        148.8447209 Science        145.5849295 History        23.53289212         t Stat Intercept        -2.418531124 Maths        7.564017449 IT        -1.366344989 Science        1.897504026 History        -0.544356809         P-value Intercept        0.060226461 Maths        0.000640424 IT        0.230071381 Science        0.116235029 History        0.609583227```
Does this tell me if some of the test scores should or should not be used?

TIA
• Nov 12th 2006, 05:03 AM
Quick
I recommend graphing it, using the average test score and the salary, then let excel pick the line of best fit
• Nov 12th 2006, 07:00 AM
ThePerfectHacker
Are these ordered pairs? Or more.

If they are ordered pairs, I have software that does method of least square. And then I can give you all the results quickly if you provide me with the list of points.
• Nov 12th 2006, 07:43 AM
biscayne
If I knew what an ordered pair was I could tell you :)

This is more of a basics learning exercise. I understand what regression is and what it can be used for. What I don't know is what the figures are telling me. I can't find out if xyz having a value of 0.1 is good or bad; I can't find out if abc having a value of 9.99 is good or bad either.

Using the example I gave above what do the figures say?

I think I'm right in assuming that the R Squared figure of 0.967 means that the the relationship between Y and all the X values is 96.7% good?

What I'm really trying to determine is

a) Is this a reliable method of estimating salary when only test score results are known.

b) Which individual test scores will best estimate the salary. In otherwords should I drop History and possibly look to include other test results such as Geography or Art.

c) Are there two results scores which are corelated and thus causing an error in the figures. e.g. is having Maths and IT a bad combination because it could be correlated and thus 'double counting'?

I just don't know what the numbers (the excel results std error, t-test, p-value etc.) are telling me, or what I should be looking for.
• Nov 12th 2006, 08:41 AM
CaptainBlack
Quote:

Originally Posted by biscayne
I need some help with regression analysis and in particular with the figures which Excel displays.

I have a sheet which lists test scores and I'd like to create a formula for predicting what a candidates first year salary would be based on their test scores.

Code:

```Candidate Maths IT  Science History Salary ---------------------------------------------------- Joe    78.7  82.5  81.7  95.0  \$20,000 Mary    87.3  85.4  86.6  68.2  \$24,000 Alison  74.6  83.4  79.2  73.1  \$19,000 Sarah  86.8  91.5  90.2  54.2  \$24,000 Alan    63.5  72.4  74.5  74.2  \$17,000 Dane    78.3  62.8  61.0  85.2  \$19,000 Beth    92.5  97.6  95.4  72.3  \$27,000 Eliza  84.7  88.3  87.7  75.5  \$23,000 Corey  79.3  87.2  87.1  82.2  \$21,000 Don    91.3  85.2  86.3  95.4  \$26,000```
What I am trying to do is to use regression to see if I can construct a formula to predict a candidates salary based on their test results.

Salary = A*Maths Score + B*IT Score + C*Science Score + D*History Score

1. How can I determine if each test score is significant? Excel will state figures such as Multiple R, R Squar, Standard Error, t Stat and P Value but what do they mean!

2. How can I determine if the results are independent.

e.g. It's quite likely that Students with high Maths scores could also have high IT scores and thus there could be an error due to both results not being independent.

Here are the figures from Excel

Code:

```Regression Statistics -----------------------    Multiple R    0.98368756 R Square    0.967641215 Adjusted R Square    0.941754187 Standard Error    796.3869571 Observations    10```
Code:

```ANOVA     df    SS    MS    F    Significance F Regression    4    94828839.07    23707209.77    37.37938615    0.00064401 Residual    5    3171160.927    634232.1854        Total    9    98000000```
What are both tables telling me? Is this a good test or not?

Code:

```Coefficients Intercept    -8559.315329 Maths    313.8348267 IT    -203.3732385 Science    276.2479898 History    -12.81029006```
I presume these are the figures I need to use to construct the formula?

Salary = -8559 + 313.8 * Maths -203.3 * IT + 276.2 * Science - 12.8 * History

Also on that last table is shown:

Code:

```  Standard Error Intercept    3539.055274 Maths    41.49049481 IT    148.8447209 Science    145.5849295 History    23.53289212     t Stat Intercept    -2.418531124 Maths    7.564017449 IT    -1.366344989 Science    1.897504026 History    -0.544356809     P-value Intercept    0.060226461 Maths    0.000640424 IT    0.230071381 Science    0.116235029 History    0.609583227```
Does this tell me if some of the test scores should or should not be used?

TIA

The R values are telling you that the model is highly predictive.

The p-values are telling you that most of the prediction comes from the
intercept and the maths score.

Rerun the model using only the intercept and maths scores in the model

Look at the residuals of this and the rerun model, are they significantly different
(they will be worse with the simpler model, so you will have to decide if the
difference is significant)

RonL