1. ## Regression Model Building

Hi,

I have a statistics homework assignment which I'm not sure how to best
begin. My teacher gave me a set of data (car crashes at sites) that
includes thousands of observations of count data (one dependent
variable) with a large set of predictor variables (about 25). The
dependent variable data has many zeros (many places didn't have
crashes). I have to fit a parsimonious model that best explains the
variation in the dependent variable with the smallest set of
predictors.

I'm not asking how to do the regression, but rather how to attack this
problem. How do I decide what variables to keep? Do I start with all
of the variables and consider those with the lowest t-statistic
(highest P-value)? Or build up from variables which I think are
important? How should I consider the R-square value? What else should
I look for?

I have begun with a negative binomial regression and trying out
various models, but I'm not sure how to get to the best model. I'm
using STATA to do the analysis.

Any insight would be greatly appreciated! Thanks.

2. Originally Posted by Artemis
Hi,

I have a statistics homework assignment which I'm not sure how to best
begin. My teacher gave me a set of data (car crashes at sites) that
includes thousands of observations of count data (one dependent
variable) with a large set of predictor variables (about 25). The
dependent variable data has many zeros (many places didn't have
crashes). I have to fit a parsimonious model that best explains the
variation in the dependent variable with the smallest set of
predictors.

I'm not asking how to do the regression, but rather how to attack this
problem. How do I decide what variables to keep? Do I start with all
of the variables and consider those with the lowest t-statistic
(highest P-value)? Or build up from variables which I think are
important? How should I consider the R-square value? What else should
I look for?

I have begun with a negative binomial regression and trying out
various models, but I'm not sure how to get to the best model. I'm
using STATA to do the analysis.

Any insight would be greatly appreciated! Thanks.
Google principle component analysis (PCA) and or factor/analysis

CB