# Thread: Determining important factors in a regression

1. ## Determining important factors in a regression

In the question i'm confused about it asks
"According to the estimated regressions, which of the seven factors Pod,pinst, pfive, oboard, fsch, D/v, pacq are "important" in explaining the value of a firm q(dependent variable)? as part of your answer explain in what sense they are important?

We are given a table of 9 different regressions, including a regression for each of the above factors listed by themselves along with the other coefficients. including the intercept and 4 other variables(that are mentioned in all the regressions).
We are also given all the t-statistics, for all factors in all the regressions and we are given an Rsquared value for each of the regressions.

We also have another regression that includes all these factors, and another one that includes one with pod^2?

Anybody know how i go about telling which of these is most important?
The Rsquared value are all very similar and only differ by 0.01, Some of the factors have a negative value and i know this means they have a negative correlation? But i'm not 100% sure if using these two things is right....

I read another example that says the fact that i'm missing variables from the regression could lead to simultaneous causality bias or omitted variable bias, so should i be comparing the regressions that only have one factor with the regression that includes all these factors??

2. ## Re: Determining important factors in a regression

Hey froggy124.

Have you tried fitting the model by deleting some of the factors. re-fitting the model, and comparing with the other models?

Also try doing what is called a Principal Components Analysis on the data: This will create an orthogonal set (un-correlated but not necessarily independent) of random variables by "rotating" the data set to achieve this, and it will sort the components by decreasing order of variance.

Higher variance means higher contribution of that variable to the variance in the entire data set and lower variance means lower contribution.

It is useful as an additional tool to answer your question.

The other thing is the test for interaction components between the factors and tests for correlation between sets of random variables.

The PCA helps with regards to correlation in some regard (in terms of identification) when you do a PCA plot to see how variables are related.

The interaction components are just standard tests to see if the product of two factors is statistically significantly different from zero (or not). This also establishes the possibility of bias and is used in experimental design analyses.

In terms of worring about missing variables that contribute to causality, this is the reason why we say that correlation doesn't imply causation: it's basically a pessimistic (but needed) view that if you don't have all the data that truly represents the process, you can't establish causality with certainty.

There are techniques called forward, backwards, and step-wise regressions that you should check.

Finally you also want to think about the data in context of both the process involved and how the data really represents that process: if it misrepresents the process for any reason then that misalignment will cause bad inferences. This is the hardest thing because you can be really careful and have a lot of experience but still miss critical factors that are hidden between layers of things outside of your awareness.

It's definitely not an easy problem.

Deleted.

4. ## Re: Determining important factors in a regression

Originally Posted by chiro
Hey froggy124.

Have you tried fitting the model by deleting some of the factors. re-fitting the model, and comparing with the other models?

Also try doing what is called a Principal Components Analysis on the data: This will create an orthogonal set (un-correlated but not necessarily independent) of random variables by "rotating" the data set to achieve this, and it will sort the components by decreasing order of variance.

Higher variance means higher contribution of that variable to the variance in the entire data set and lower variance means lower contribution.

It is useful as an additional tool to answer your question.

The other thing is the test for interaction components between the factors and tests for correlation between sets of random variables.

The PCA helps with regards to correlation in some regard (in terms of identification) when you do a PCA plot to see how variables are related.

The interaction components are just standard tests to see if the product of two factors is statistically significantly different from zero (or not). This also establishes the possibility of bias and is used in experimental design analyses.

In terms of worring about missing variables that contribute to causality, this is the reason why we say that correlation doesn't imply causation: it's basically a pessimistic (but needed) view that if you don't have all the data that truly represents the process, you can't establish causality with certainty.

There are techniques called forward, backwards, and step-wise regressions that you should check.

Finally you also want to think about the data in context of both the process involved and how the data really represents that process: if it misrepresents the process for any reason then that misalignment will cause bad inferences. This is the hardest thing because you can be really careful and have a lot of experience but still miss critical factors that are hidden between layers of things outside of your awareness.

It's definitely not an easy problem.

Thanks for this Chiro, will definitely look into PCA....I was reading the stock watson book on econometrics, and in it it seems to talk about using the hypothesis tests on the t-statistics and checking to see if they are significant? Would this be another way of doing it?*

*I only ask as we have not covered the PCA at all in class....

5. ## Re: Determining important factors in a regression

Well you could use the t-statistics and its probability to check that the regression term for a set of predictors is close to zero (statistically under some confidence level) and if it is you take it out of the model.

So as an example, in the model Y = B0 + B1*X2 + B2*X2 we can get a probability for B0 which is say 0.34 for H0: B0 = 0 and in this case we retain H0 and delete B0 from the model.

6. ## Re: Determining important factors in a regression

Originally Posted by chiro
Well you could use the t-statistics and its probability to check that the regression term for a set of predictors is close to zero (statistically under some confidence level) and if it is you take it out of the model.

So as an example, in the model Y = B0 + B1*X2 + B2*X2 we can get a probability for B0 which is say 0.34 for H0: B0 = 0 and in this case we retain H0 and delete B0 from the model.
How would you get the probabilities though? The p-values??

7. ## Re: Determining important factors in a regression

You get them from the regression output or you calculate them with t-table with the appropriate degrees of freedom by using the estimate and the standard error of the estimate where (t - mean)/se ~ t_n

8. ## Re: Determining important factors in a regression

Originally Posted by chiro
Well you could use the t-statistics and its probability to check that the regression term for a set of predictors is close to zero (statistically under some confidence level) and if it is you take it out of the model.

So as an example, in the model Y = B0 + B1*X2 + B2*X2 we can get a probability for B0 which is say 0.34 for H0: B0 = 0 and in this case we retain H0 and delete B0 from the model.
Ok cool I know how to get the probabilities from the t-tables.....The lecturer i have seems to give very limited t-tables as they only go down to the 1% (two-sided) and 0.5% (1 sided tables)!! And since some of the t-tests are like 8 with(80 degrees of freedom) seems like i'm going to be rejecting a lot of them.....

In your example above, probability for B0 is 0.34? Is that not a relatively high probability in relation to probabilities i.e. it is 0.34 chance in 1 of happening? Why would you not say this is an important variable??

For example if i had a t-statistic of 2.35 with 80 degrees of freedom. The probabilities would be 0.01<P(t = 2.35) <0.025. So this would be considered an important t-statistic because it's in the tables lecturer has given us.....

On another note....which t-statistics do you use i.e from which regression? The regression which contains all the factors you want to check are important or each regression that only contains one of the factors you are checking to see if they are important?!

Apologies for all the questions, test tommorrow!!!

9. ## Re: Determining important factors in a regression

You will need to choose a significance level of your choosing. What you do with that is you calculate the confidence interval for that parameter and if 0 is in that region, you can retain the null hypothesis that the value is potentially zero and remove it if you choose to.

Usually we choose 95% intervals so under a 95% interval, if 0 is in the interval itself we can retain H0.

However if the standard error is really big, then it means that the model itself is a bad fit to begin with.

10. ## Re: Determining important factors in a regression

Thanks for that! Sounds good