I am having trouble understanding the intuition behind the so called stationary bootstrap. The original paper is from Politis/Ramano (1994), you can find it here.
Before I explain what the stationary bootstrap is all about from my point of view, I would like to post my questions first:
- Is my understanding correct?
- Why is the block length variable? That is: Why does it follow a geometric distribution?
- What is the optimal value of p, if my data are weakly dependent, but the weakness is still quite strong, i.e. the autocorrelation fades out very slowly and reaches zero at: for .
These are the steps of the stationary bootstrap how I understand it:
1.) Determine a variable block length using a geometric distribution with probability value p, which is supposed to be optimal at , .
2.) Keep determining blocks until the sum of the block lengths reaches N.
3.) Determine varibale starting points from where the data will be extracted. The starting points are from 1, ..., N.
4.) Resample the data, i.e. run the stationary bootstrap using the in 2.) determined blocks and the in 3.) determined starting points.
5.) Re-run steps 1.)-4.) x times to get x new resampled series.
Let's make an example to clarify. Below - in table 1 - is the table of the original data. The data are a, b, c, ... which makes it easier to follow the procedure later on. You can imagine them as any values.
Obs. original data 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i
Now we determine random block lengths, that is run steps 1.) and 2.) and write the results in table 2.
Table 2: Blocks 2 4 3
As you can see the block lengths are variable and the sum is exactly equal to the number of observations.
We now determine random starting points for the blocks which are drawn from a uniform distribution of 1,...,N. The resuls are in table 3.
Table 3: Starting Points 8 7 3
Now step 4.). Resample the data using the results of table 2 and table 3. The output is in table 4:
Table 4 original data resampled data a h b i c g d h e i f a g c h d i e
Notes to Table 4: The first block in the column "resampled data" is h, i. That is because the starting point was determined in table 3 to be at 8. So we look at observation nr. 8 in in the original data and find h. Moreover in table 2 the block length was determined to be of length 2. So we resample data from the original data from observation no. 8 to no. 9 (length=2). That is h and i. We continue with the next starting point in table 3, which is 7. The block length in table 2 is 4. So we resample the original data from obs. 7, 8, 9 and 1 (wrapping the tail to the head). That is: g, h, i, and a. For the last resampling process we start at obs. 3 and block length is 3, so we get: c, d, and e.
Now if we re-run everything x times (step 5). We wil get an output like this.
table 5 original data resample 1 res 2 res 3 res 4 ... res x a h e f b ... g b i f g c ... h c g g a i ... f d h h b a ... g e i a b b ... a f a b c c ... a g c f d d ... b h d g a f ... c i e c b g ... e
That's the stationary process how I understand it.