I have a simple question. Perhaps not. I have a dataset with one dependent variable, and a large number of dependent variables. I have split the dataset into two parts - one part for creating models, and another for model testing/validation.

I have modelled the data set using a range of different modelling techniques using a fixed set of training points extracted from the dataset. Modelling techniques are both linear (Least squares, Principal component regression, partial least squares regression) and nonlinear (neural networks, guassian process regression) etc.

The models have then all been tested on the previously unseen data points (the points not used during the model building.)

At this point, i have a range of different prediction values for the test data point. Some are clearly better than others, but many give similar R^2 and MSE values.

I would like to formally approach this problem with a hypothesis test comparing the prediction accuracy of the models.

I'm not sure on how to formulate the statistical test for this problem. As far as I can see, its the F-test i should use? But i'm not sure, and would appreciate any advice! Also, i'd like to know if i'm completely off track here!

Is there a statistical test i can use on the residuals from each model to objectively compare the models? e.g. comparing the variance etc.