Hi All,
Say I have a time series X of length T that is split into two subperiods: X1 = X(0),X(1),...,X(T-n) and X2 = X(T-n+1),X(T-n+2),...,X(T), where the length of X1 > the length of X2.
Now, say I want to use the bootstrap to test whether the mean of X2 is the same as the mean of X1. Can I perform the following procedure:
1. Compute mean of X2, call it m2
2. Generate B bootstrap samples of X1
3a. For each bootstrap sample, I obtain the mean of a randomly selected subsample of the series that has the same length as X2, call it m1*, or...
3b. If the length of X1 is, say, more than 3 times that of X2, I can obtain 3 nonoverlapping means from each bootstrap sample of X1, call them m11*,m12* and m13*
4. Use these bootstrapped means to form a distribution (for 3a, there would be B bootstrapped means, but for 3b, there would be 3B bootstrapped means) and generate the one-tail p-value for X2 from this distribution, e.g. the fraction of m1* larger than m2
I know that when deriving inferences from bootstrap resampling, one must use the same lengths for the bootstrap sample and the original sample. So ideally, X1 and X2 should have the same lengths. However, in practice, this will not always be the case. Would appreciate the help!
Best,
RZ
Hi CB... so it does not matter that X2 is shorter or has less observations than X1? What if the problem is horizon dependent. For instance, say the X2 has 10 observations and X1 has 100 observations and I want to know if X2 is from the same process as X1.
Clearly, I can try to compare their means, std. deviations, etc. as above, but lets just say I choose to use the sum of X2 (i.e. summing all 10 observations)? How will I use the bootstrap to generate the appropriate distribution now? That is, can I generate N bootstrap samples of X1 and from each sample, take any random sum of 10? Hope it's clear what I mean... Thanks again!
RZ
No because we are generating our bootstrap samples with the appropriate lengths and under the null hypothesis the difference in the means is zero, and we are testing this against a bootstrap sample of differences of means of samples of exactly those lengths.
You are going to do a test of some kind, so decide on a test statistic and generate a bootstrap sample of that statistic to use in the test (and by the way a test will only allow you to reject a hypothesis it won't confirm it).What if the problem is horizon dependent. For instance, say the X2 has 10 observations and X1 has 100 observations and I want to know if X2 is from the same process as X1.
this last paragraph is incoherent.Clearly, I can try to compare their means, std. deviations, etc. as above, but lets just say I choose to use the sum of X2 (i.e. summing all 10 observations)? How will I use the bootstrap to generate the appropriate distribution now? That is, can I generate N bootstrap samples of X1 and from each sample, take any random sum of 10? Hope it's clear what I mean... Thanks again!
RZ
The best thing to do is try generating bootstrap samples and computing appropriate statistics and see how things go.
CB
Thanks for the response again CB... Ok, Im gonna try the last paragraph one more time
Assume I have 10 months' worth of daily data series, X1, and I define the 11th month's daily data series as X2 (hence, length of X1 is 10 times that of X2). Now, say I am interested in the monthly sum, i.e. aggregate or sum the daily series for each month, giving me 10 values for X1, call it the vector M1, and 1 value for X2, call it m2. Now I want to know whether m2 is generated by the same process as M1.
So if I want to use the bootstrap, can I generate B bootstrapped samples of X1 (each bootstrapped sample will have 10 monthly observations), using the monthly observations (10B in total) from each bootstrap sample to form the distribution of m1 and see where m2 falls on that distribution?
I hope I have made the problem clearer. If not, I apologise and please ignore it!
Thanks,
RZ