# What type of regression should I use?

• Aug 24th 2011, 07:19 AM
nyalex
What type of regression should I use?
Based on my sample data shown below I am trying to understand the following and I am not sure what sort of regression report I should use or if there is another report type that I can generate instead.

1) Which variables (columns C-F, in green) affect rank (column B, in orange) and by how much does each variable affect it (the weighting)?

2) Those records that have identical categories (column A, in orange), they will never have the same ranking. That being said, once I answer question #1, I'd still like to know if the statistics are different based on each category. For example, if GC is weighted 50% for rank, does that mean it has the same weight for items in astronomy as those in electronics categories?

3) Some variables are yes/no. How do I include them in my analysis? Should I change them to 0's and 1's? Would that even give me a statistical significant result since the values are not very changing?

http://f.cl.ly/items/0w0C2r3Y46421z2...14.18%20AM.png

Any ideas on how to approach this would be very much appreciated. Thanks in advance.
• Sep 3rd 2011, 07:59 PM
bryangoodrich
Re: What type of regression should I use?
If your question is to find out by how much the independent variables (C-F) impact Rank, then you may want to treat Rank as a factor (categorical) variable. Ordinary least squares will not help you because the dependent variable is supposed to be continuous meeting certain assumptions. Categorical dependent variables violate those assumptions. Instead, you would want to look to general linear models (GLMs) like logistic and Poisson regressions that work with categorical variables or counts (integers) for dependent variables. As for RPO, you would usually encode the "yes" and "no" with 0 and 1. Otherwise, how would you interpret its numeric impact on a model as a character? From the look of it, GC is also a categorical variable, too, and so is RF. You will want to treat these appropriately (as factors, not as numeric variables, because they're not).

Frankly, the model required to appropriately answer your question is very complex and requires a lot of sophistication to set up and interpret. You may be able to get away with setting Rank as a numeric variable, but there will be some serious issues with the remedial measures required to make your model work right and in how to interpret the results. In all likelihood, a logistic regression of some sort would suit your needs.
• Sep 4th 2011, 07:09 AM
nyalex
Re: What type of regression should I use?
Quote:

Originally Posted by bryangoodrich
Instead, you would want to look to general linear models (GLMs) like logistic and Poisson regressions that work with categorical variables or counts (integers) for dependent variables.

Thank you for your complete response. I will look into the logistic and Poisson regressions that you suggested. A lot of the data still needs to be captured and before I start the work to build my database, I want to make sure that all of this can be calculated in the way that I expect.