# Regression analysis Methodology - Is this the right approach?

• Aug 10th 2011, 06:45 AM
chogo
Regression analysis Methodology - Is this the right approach?
Hi I have a set of 10 data sets sampled over 10 years. For each year i estimate a statistic from the data.

example

x = 0,1,2,3,4,5,6,7,8,9,10
y= 0.5,0.86,3,4,6,8.7,9.3,9.9,11 <-estimated statistic from my data

For each estimated value I have a bootstrap distribution.

I then fit several linear and nonlinear models to this data using least squares (e.g linear,polynomial,exponential,logistic).

I am not thinking of using the chi square goodness of fit formula to find the best fit

$\chi^2 = 1/v \sum_{i=1}^{10}\frac{(O_i-E_i)^2}{\sigma_i^2}$ where $v$ is the degrees of freedom and $\sigma$ is the variance from my bootstraps.

The model with $\chi^2$ closest to 1 is the model which best fits the data.

-----------

Does this methodology sound valid. I think the chi square test assumes my error distribution/bootstraps are normal. Can anyone recommend something better? or any criticisms
• Aug 10th 2011, 10:39 PM
CaptainBlack
Re: Regression analysis Methodology - Is this the right approach?
Quote:

Originally Posted by chogo
Hi I have a set of 10 data sets sampled over 10 years. For each year i estimate a statistic from the data.

example

x = 0,1,2,3,4,5,6,7,8,9,10
y= 0.5,0.86,3,4,6,8.7,9.3,9.9,11 <-estimated statistic from my data

For each estimated value I have a bootstrap distribution.

I then fit several linear and nonlinear models to this data using least squares (e.g linear,polynomial,exponential,logistic).

I am not thinking of using the chi square goodness of fit formula to find the best fit

$\chi^2 = 1/v \sum_{i=1}^{10}\frac{(O_i-E_i)^2}{\sigma_i^2}$ where $v$ is the degrees of freedom and $\sigma$ is the variance from my bootstraps.

The model with $\chi^2$ closest to 1 is the model which best fits the data.

-----------

Does this methodology sound valid. I think the chi square test assumes my error distribution/bootstraps are normal. Can anyone recommend something better? or any criticisms

I think we will need more context and/or detail about what you are trying to do here.

CB
• Aug 11th 2011, 03:05 AM
chogo
Re: Regression analysis Methodology - Is this the right approach?

So as I described I have these 10 data sets sampled at times $t_1,t_2,....,t_{10}$. Specifically each data set is genetic data from influenza patients at time $t_i$.

For each data set I calculate a statistic which summarizes the data. This statistic is just a simple counting method. Lets call the results from this statistic $y_1,_y_2,....,y_{10}$

so now I am in a position to do some regression analysis on data $[(t_1,y_1),....,(t_{10},y_{10})]$

Ok now because i wanted to know the distribution of my statistic I used a bootstrapping approach. Therefore for each time point I have a vector with the bootstrap distribution $y^*_1,......,y_*_{10}$.

-----------------

My Question/Goal - I want to fit a series of models to my data and then test which of these models best fits my data. i.e lets say an exponential model best fits my data.

I though of using the chi square goodness of fit test for each model, but also i could use leave one out cross validation.

Is this enough information? Thanks for your time, I hope this is slightly more clear