Originally Posted by

**Artemis** Hi,

I have a statistics homework assignment which I'm not sure how to best

begin. My teacher gave me a set of data (car crashes at sites) that

includes thousands of observations of count data (one dependent

variable) with a large set of predictor variables (about 25). The

dependent variable data has many zeros (many places didn't have

crashes). I have to fit a parsimonious model that best explains the

variation in the dependent variable with the smallest set of

predictors.

I'm not asking how to do the regression, but rather how to attack this

problem. How do I decide what variables to keep? Do I start with all

of the variables and consider those with the lowest t-statistic

(highest P-value)? Or build up from variables which I think are

important? How should I consider the R-square value? What else should

I look for?

I have begun with a negative binomial regression and trying out

various models, but I'm not sure how to get to the best model. I'm

using STATA to do the analysis.

Any insight would be greatly appreciated! Thanks.