I am using the Matlab 'glmfit' function, with a binomial distribution. My y variable is Bernoulli (0 or 1) and my X variables are for the most part continuous, with one categorical variable.
I have a large potential range of explanatory variables. The output to the function gives various statistics including t-values, p values, beta coefficients for each variable and dispersion parameters.
I would firstly like to know the best way of selecting which variables to keep and which variables to get rid of. Is there a test I can do before putting them into the regression to start with a smaller amount of variables, or should I take them out one by one after each regression analysis - maybe based on the beta coefficients closest to 0, or their t-test values?
After the test has run, is it the 'deviance of fit' that I am comparing between models (I am used to R-squared type values, but here I am getting values of around 500, I'm not sure how to ascertain whether this is good or not) ... there is also a dispersion parameter which has values of around 0.44. What does this tell me and is this a comparative measure between models?
Thanks for any help


LinkBack URL
About LinkBacks