# Thread: [SOLVED] Multiple regression equation trouble

1. ## [SOLVED] Multiple regression equation trouble

Hello,
Ive generated a multiple regression equation using a tool i got online. It takes a list of dependent variables and a few independant variables and gives me a multiple regression equation. The equation is trying to predict defects in a software program. The dependant variables is defect count

the equation i got is

0.32*Avg analysis hr +-0.12*Avg. Dev hrs + 0.00*Budget +-0.19*Duration + 19.40 (+/- 21.56)
These independant variables are project details.

What im trying to do is put one of the projects details back into the equation and get a value for defects.

My problem is that when i put these values in (to test the equation) i get a value which is no where near the actual defects that that project had. even though this project was used in the actual making of the equation.

Any ideas??

The R squared of this equation is 91%
Standard error is 21.56

2. Originally Posted by vancottier
Hello,
Ive generated a multiple regression equation using a tool i got online. It takes a list of dependent variables and a few independant variables and gives me a multiple regression equation. The equation is trying to predict defects in a software program. The dependant variables is defect count

the equation i got is

0.32*Avg analysis hr +-0.12*Avg. Dev hrs + 0.00*Budget +-0.19*Duration + 19.40 (+/- 21.56)
These independant variables are project details.

What im trying to do is put one of the projects details back into the equation and get a value for defects.

My problem is that when i put these values in (to test the equation) i get a value which is no where near the actual defects that that project had. even though this project was used in the actual making of the equation.

Any ideas??

The R squared of this equation is 91%
Standard error is 21.56
It sounds like either
• the project in question is an outlier (one with an unusual number of defects), or
• there is a mistake in the calculation.

Either way, to get comfortable with the regression equation and your calculations, I suggest that you calculate the residuals (actual defects minus predicted defects) for each project in your data and then inspect them, say with a graph or histogram. Since your R squared is a very strong 91%, most of the residuals should be small in relation to the standard deviation of the dependent variable. An R squared of 91% means precisely that

$\displaystyle \frac{\text{average squared residual}}{\text{variance of dependent variable}} = 1 - .91 = .09.$

You can do this calculation to check you are using the regression correctly. Also, the average of the residuals should be 0 exactly.