# Thread: Is an un-bootstrapped model ever better than a bootstrapped one?

1. ## Is an un-bootstrapped model ever better than a bootstrapped one?

Hi,
I have made a transfer function (WA-PLS model) that infers water-table depths of peat bogs based on water-table depths that have already been measured and the testate amoebae species found at these w-t depths.
Now to the statistics part of this; I have a two models, an un-bootstrapped model and a bootstrapped one (bootstrapped to 1000 cycles). The un-bootstrapped model gives the better results, should I use this one because this model makes better predictions (r2 0.7689 versus 0.6364) or should I use the bootstrapped model just because using a validated model, even if it isn't as good, is better?
mdc

2. Generally speaking, R^2 isn't the best thing to use for model selection - you can make R^2 large just by throwing in meaningless predictors. Why can't you cross-validate the "un-bootstraped" model? IF your goal is to make good predictions you should probably look at something like an estimate of mean squared prediction error.

It isn't clear (at least to me) what you mean by a bootstraped model. Bootstrapping is a very general technique. What exactly is going on with the "bootstrapped" model?

3. Ah sorry, I'll try and be a bit more specific.
I have the two models, one model is un-bootstrapped and the other is the same model but has been cross-validated via bootstrapping. They're both using exactly the same data.
I have the Root Mean Square Error values as well, they are better for the un-bootstrapped model as well (6.9940 versus 9.5847 in the bootstrapped model).
With the actual bootstrapping process, I'm performing it in using a command in [R] and, in this sense, it is using the original set of samples I already have to create a 'new' sample subset which is then tested against the original sample set to see how well it predicts water-table. It does this 1000 times and then generates the R^2 and RMSE values.
Sorry if that isn't what you meant, I'm an environmental scientist rather than a mathematican and am not completely clued up on the exact maths of bootstrapping.
Yes, my goal is to produce a model that can make good predictions and wondered if it is acceptable to present a final model that has not been validated, because validation actually makes it perform worse.

Thanks for your help so far!

4. Okay, I think I'm beginning to understand. We expect the performance of the model to drop when we cross-validate because the model is made specifically to fit the data while bootstrapping is (ideally) similar to taking a new sample from the population. We expect the model not to predict as well in practice as it suggests that it will when we fit the model - hence the need for things like cross validation.

In other words, before you bootstrap it reports r^2 = .7, but interpreted as a measure of how well it will fit new data, it is overly optimistic. The .7 is more of an assessment of how well it fits the data that you have (which the model was specifically made to fit well) than an assessment of how well it will fit new data.