Hi there,
I am trying to structure my validation of a binary logistic regression analysis. The extra dimension is that variables are geographic. I want to be able to:
1) Examine the spatial coherance of regression coefficients. I thought if I split the data into three sections (west, centre, east), and then performed stratified bootstrapping with geographical region as the strata variable this would allow me to look at variation in regression variables / Beta between geographic regions (thus supporting / not supporting prediction outside geographic areas included in the sample, and commenting on the applicability of a global regression equation). Is this sensible?
2) If I use this method, I am unsure whether to use stepwise regression, or forced entry to choose my independent variables. The advantage of using stepwise is that I get around 6 or 7 variables for my end equation instead of around 20 (this is after VIF exclusion!). However the suit of variables I end up with might change due to (a) random variation or (b) geographic region. I think I can say something about (a) by undertaking another, but this time simple (random), bootstrap procedure, as this would allow investigation of the chosen variable suit variation over a large number of runs (>1000). Am I understanding correctly?
The advantage of forced entry is that I get to see the variation in more variables, and it won't be affected by random variation.