1. ## Urgent regression problem

Hi there, I am attempting to predict a dependent variable from 3 independent variables. All 4 variables are log normaly distributed and so I have transformed all the variables (dependent and response) by adding a constant to bring the minimum value of each variables to 1 so that the natural logarithm of the variable and the coinstant can be taken.

I then regressed the results to get the following formua, shown below for just 1 independent variables:

ln(Y+K) = B*ln(x + a) + c , where a and k are the constants to bring the variables minimum above 1.

I then rearranged and solved for Y:

Y = exp(b1) * (x + a) + exp(c) - k

Is this correct?

When I compare the prediction of ln(Y + K) to ln(Y + K) for all predicted values I get very accurate results i.e. pearson correlation coefficient of 0.917, slope of .84 for prediction vs origional and the sum of predicted / sum of the origional equal to 1(roughly), however after solving for Y as shown above the predicted values are not as accurate, pearson correlation of 0.737, slope of 0.2 and quotient of sums = 2.67.

Really annoying as the prediction of ln(Y + K) is nearly perfect but after solving for Y the results really are not accurate anymore.

Any help would be absolutely fantastic

Cheers

Ben

2. If you are doing regular old linear regression, there is no need to transform the x variables on the predictor side of the equation because the assumption of normality is for the residual error term so the y variable is usually the one that is transformed. That said, I don't think there is any harm in transforming the x variables. It just changes the interpretation of the regression coefficients.

As far as the exponentiating, it is not correct because the exponential of a sum is not the sum of the exponentials. Check the laws of exponents.

I think your idea is to do the regression on the log-transformed variables, and then exponentiate the resulting regression equation to interpret the coefficients on the original scale. You can't do that because when you exponentiate, the coefficients are no longer add, they are multiplied. When you log-transform the outcome, you are pretty much stuck interpreting the coefficients on the log scale of the outcome. At least I think . . .

3. Hi Bill, your right about the exponentiating, I solved for Y yesturday as below, is this correct?

Ln(Y + K) = B1*ln(X1 + a1) + B2*ln(X2 + a2) + c

Y = e^c*(X1 + a1)^*B1(X2 + a2)^B2 - k

The predicted results against actual results are better now than before p=0.84, quotient of sums = 1.4 but still no where near as good as ln(Y+K) against predicted ln(Y + K).

If this equation is correct then this is the best the model can do, but I'm kind of hoping it isn't so I can hope for better results!!! Any ideas?