# Thread: Conditioning on Random Variables?

1. ## Conditioning on Random Variables?

Let T be a constant, and let N be a random variable.
Suppose {X_1,X_2,...} are independent, and each X_i follows continuous uniform(0,T) distribution.
I would like to compute Var[(NT-(X_1+X_2+...X_N))|N].

Attempt:
Var[(NT-(X_1+X_2+...X_N))|N]
=(-1)^2 *Var[((X_1+X_2+...X_N))|N] (since we are given N, NT is treated as a constant and I am using the fact that Var(aX+b)=a^2 *Var(X) )

=Var[((X_1+X_2+...X_N))|N]
=N Var(X_1) (since the X_i's are i.i.d.)
=N (T^2)/12

Am I right?? (in particular the reasoning in the step colored in red?)

[note: also under discussion in talk stats forum]

2. I checked over my work again and I think the answer should be correct.
But do we also need the assumption of independence between N and those X_i ?
If so, in which step do we actually have to USE this independence?

Thanks!

3. Originally Posted by kingwinner
I checked over my work again and I think the answer should be correct.
But do we also need the assumption of independence between N and those X_i ?
If so, in which step do we actually have to USE this independence?

Thanks!
I too think this is correct. The independence is used in this step: ${\rm Var}(X_1+\ldots+X_N|N)=N{\rm Var}(X_1)$. Indeed, not only are the $X_i$ i.i.d., but above all (that's what you're using) they are i.i.d. conditionally to $N$ (because they are independent of $N$), which gives ${\rm Var}(X_1+\ldots+X_N|N)=N{\rm Var}(X_1|N)$. And they are independent of $N$, hence ${\rm Var}(X_1|N)={\rm Var}(X_1)$.

NB: you could also say that $NT-(X_1+\cdots+X_N)=(T-X_1)+\cdots+(T-X_N)$ and the r.v.'s $T-X_i$ are i.i.d. uniform on $[0,T]$ again (we performed a symmetry, hence they have same law as the $X_i$'s. Thus (again using the independence with $N$): ${\rm Var}(NT-(X_1+\ldots+X_N)|N)={\rm Var}(X_1+\cdots+X_N|N)$, which double-checks your step in red.

4. Originally Posted by Laurent
I too think this is correct. The independence is used in this step: ${\rm Var}(X_1+\ldots+X_N|N)=N{\rm Var}(X_1)$. Indeed, not only are the $X_i$ i.i.d., but above all (that's what you're using) they are i.i.d. conditionally to $N$ (because they are independent of $N$), which gives ${\rm Var}(X_1+\ldots+X_N|N)=N{\rm Var}(X_1|N)$. And they are independent of $N$, hence ${\rm Var}(X_1|N)={\rm Var}(X_1)$.
OK, then I think the idea is this:
P(Z_N|N=k)=P(Z_k|N=k)=P(Z_k), where the last equality is true only if we assume the independence between N and those Z_i, and in this case, we can drop the condition N=k in the last step.

But here we have something more complicated: Var[(NT-(X1+X2+...XN))|N]. There is NT in it, it is conditional on N, and NT & N are not independent (even though the X_i's and N are independent), then how can we drop the condition N in this case?

NB: you could also say that $NT-(X_1+\cdots+X_N)=(T-X_1)+\cdots+(T-X_N)$ and the r.v.'s $T-X_i$ are i.i.d. uniform on $[0,T]$ again (we performed a symmetry, hence they have same law as the $X_i$'s. Thus (again using the independence with $N$): ${\rm Var}(NT-(X_1+\ldots+X_N)|N)={\rm Var}(X_1+\cdots+X_N|N)$, which double-checks your step in red.
1) Why are the r.v.'s (T-X_i) i.i.d. uniform[0,T] ? How do you know this?
2) Why does this imply that ${\rm Var}(NT-(X_1+\ldots+X_N)|N)={\rm Var}(X_1+\cdots+X_N|N)$?

Could you please explain a little more on these?

Thank you! I am learning a lot from you

5. Originally Posted by kingwinner
OK, then I think the idea is this:
P(Z_N|N=k)=P(Z_k|N=k)=P(Z_k), where the last equality is true only if we assume the independence between N and those Z_i, and in this case, we can drop the condition N=k in the last step.

But here we have something more complicated: Var[(NT-(X1+X2+...XN))|N]. There is NT in it, it is conditional on N, and NT & N are not independent (even though the X_i's and N are independent), then how can we drop the condition N in this case?
I liked your explanation in your first post: conditionally to N, NT is an additive constant hence it doesn't affect the variance. Writing it in your way above: for any Z, ${\rm Var}(NT-Z|N=k)={\rm Var}(kT-Z|N=k)$, and ${\rm Var}(kT+Z)={\rm Var}Z$ for any measure since kT is a constant.

1) Why are the r.v.'s (T-X_i) i.i.d. uniform[0,T] ? How do you know this?
2) Why does this imply that ${\rm Var}(NT-(X_1+\ldots+X_N)|N)={\rm Var}(X_1+\cdots+X_N|N)$?
1) It is a symmetry of the distribution. You would agree that if X is Bernoulli of parameter 1/2, then 1-X is also Bernoulli of parameter 1/2 (it is like switching between tails and heads). This is almost the same. You can prove it from the distribution function for instance: since 0<X<T, you also have 0<T-X<T, and for any 0<t<T, P(T-X<t)=P(X>T-t)=P(T-t<X<T)=t, which is the distribution function of a uniform distribution on [0,T].

2) For any k, since X_1,...,X_k are independent, T-X_1,..., T-X_k still are independent. And they have same law as X_1,...,X_k because of 1). So that (X_1,...,X_k) has same joint distribution as (T-X_1,...,T-X_k). In particular, their sums have same distributions, and therefore same variance. Hence:

Var(NT-(X_1+...+X_N)|N=k)
= Var(kT-(X_1+...+X_k)) (using independence between X_1,... and N)
= Var((T-X_1)+...+(T-X_k))
= Var(X_1+...+X_k) (because of the above-mentioned equality in distribution).

But these points are useless after your clever remark about Var(Z+b)=Var(Z) That was just a little remark.

6. Is it possible to weaken the assumption of independence between N and those X_i ? i.e. can we replace the assumption of independence between N and those X_i by something that is easier to satisfy and leads to the same conclusion?

7. Originally Posted by kingwinner
Is it possible to weaken the assumption of independence between N and those X_i ? i.e. can we replace the assumption of independence between N and those X_i by something that is easier to satisfy and leads to the same conclusion?
Well, you always have ${\rm Var}(NT-(X_1+\cdots+X_N)|N)={\rm Var}(X_1+\cdots+X_N|N)$ (using your argument). Then you always have, for all $n$:
${\rm Var}(X_1+\cdots+X_N|N=n)={\rm Var}(X_1+\cdots+X_n|N=n)$ $={\rm Var}(X_1|N=n)+\cdots+{\rm Var}(X_n|N=n)+\sum_{i\neq j} {\rm Cov}(X_i,X_j|N=n)$.
(where ${\rm Cov}(X,Y|A)=E[(X-E[X|A])(Y-E[Y|A])|A]$)
Therefore, the conclusion still holds as soon as ${\rm Var}(X_i|N)={\rm Var}(X_i)$ for all $i$, and ${\rm Cov}(X_i,X_j|N)=0$ for all $i\neq j$.

For instance, this holds if $X_1,X_2,\ldots$ are independent conditionally to $N$ and ${\rm Var}(X_i|N)={\rm Var}(X_i)$; this is weaker, but it is not easy to find an example where this could apply...

8. Let's consider a similar problem in which we compute the expectation rather than the variance.
I was looking in my textbook, and it says:
Definition:
Let Xo,X1,X2,... be random variables and N E{0,1,2,...} be a counting random variable. If {N=n} depends only on Xo,X1,...,Xn, then we call N a "stopping time" for the sequence. (note: {Nn} can be used)

Wald's equation:
Let Xo=0, and X1,X2,... be i.i.d. with mean E(X1). Let N be a "stopping time". Then E(∑Xn)=E(X1)E(N) where the sum is from n=0 to n=N.

The result is exactly the same as in the case if we assume that N is independent of the Xi's, so the above discussion says that the assumption can be weakened?

I also don't understand the idea of a "stopping time" as defined. What does it mean by "{N=n} depends only on Xo,X1,...,Xn"?

Thank you!

9. Originally Posted by kingwinner
Wald's equation:
Let Xo=0, and X1,X2,... be i.i.d. with mean E(X1). Let N be a "stopping time". Then E(∑Xn)=E(X1)E(N) where the sum is from n=0 to n=N.

The result is exactly the same as in the case if we assume that N is independent of the Xi's, so the above discussion says that the assumption can be weakened?
Note that there is no conditionning by N here, hence this is not really a generalization of $E[X_1+\cdots+X_N|N]=NE[X_1]$.

I also don't understand the idea of a "stopping time" as defined. What does it mean by "{N=n} depends only on Xo,X1,...,Xn"?
A stopping time is what the name says: it is a time when you can decide to stop. In other words, suppose you discover the values X0,X1,... one after each other; then in order to stop at time n (i.e. to decide whether N=n), you can only look at the values X0,...Xn, not at the "future".
For instance, $N=\inf\{i\geq 0|X_1+\cdots+X_i>5\}$ is a stopping time: you can stop at time $N$ by waiting until a value exceeds 5.
On the other hand, $N=\sup\{i\leq 10|X_i<2\}$ is not a stopping time because you have to look at X0,...,X10 before you know where you should have stopped.
If you think about it, you'll see that the condition of "being able to stop at time N" is equivalent to "for all n, the event $\{N=n\}$ can be expressed in terms of X0,...,Xn".
The usual formal definition of a stopping time uses sigma-algebras (filtrations): for all $n$, $\{N=n\}\in\mathcal{F}_n$ where $\mathcal{F}_n=\sigma(X_0,\ldots,X_n)$ is the $\sigma$-algebra generated by $X_0,\ldots,X_n$.

10. Originally Posted by Laurent
Note that there is no conditionning by N here, hence this is not really a generalization of $E[X_1+\cdots+X_N|N]=NE[X_1]$.

A stopping time is what the name says: it is a time when you can decide to stop. In other words, suppose you discover the values X0,X1,... one after each other; then in order to stop at time n (i.e. to decide whether N=n), you can only look at the values X0,...Xn, not at the "future".
For instance, $N=\inf\{i\geq 0|X_1+\cdots+X_i>5\}$ is a stopping time: you can stop at time $N$ by waiting until a value exceeds 5.
On the other hand, $N=\sup\{i\leq 10|X_i<2\}$ is not a stopping time because you have to look at X0,...,X10 before you know where you should have stopped.
If you think about it, you'll see that the condition of "being able to stop at time N" is equivalent to "for all n, the event $\{N=n\}$ can be expressed in terms of X0,...,Xn".
The usual formal definition of a stopping time uses sigma-algebras (filtrations): for all $n$, $\{N=n\}\in\mathcal{F}_n$ where $\mathcal{F}_n=\sigma(X_0,\ldots,X_n)$ is the $\sigma$-algebra generated by $X_0,\ldots,X_n$.
1) By saying that {N=n} depends only on Xo,X1,...,Xn, does it mean that N is a function of Xo,X1,...,Xn only? (i.e. N=N(Xo,X1,...,Xn)? Is n equal to N here?)

2) "If {N=n} depends only on Xo,X1,...,Xn, then we call N a stopping time for the sequence." <-----here, does this have to be true for ALL n=0,1,2,... in order for N to be a stopping time?

3) In the definition, it says that "If {N=n} depends only on Xo,X1,...,Xn, then we call N a stopping time for the sequence. (note: {Nn} can also be used.)" <------Why can {Nn} also be used?

Thanks for explaining!

11. Originally Posted by kingwinner
1) By saying that {N=n} depends only on Xo,X1,...,Xn, does it mean that N is a function of Xo,X1,...,Xn only? (i.e. N=N(Xo,X1,...,Xn)? Is n equal to N here?)
What would that even mean? No, it means that the event {N=n} can be expressed by conditions on X0,...,Xn, like $\{\varphi(X_0,\ldots,X_n)\in A\}$. Or, equivalently, $1_{\{N=n\}}=\psi(X_0,\ldots,X_n)$: the indicator function of the event {N=n} is a function of X0,...Xn only.

And so for every n.

3) In the definition, it says that "If {N=n} depends only on Xo,X1,...,Xn, then we call N a stopping time for the sequence. (note: {Nn} can also be used.)" <------Why can {Nn} also be used?
You could have figured this out: since $\{N=n\}=\{N\le n\}\setminus\{N\le n-1\}$, if $\{N \le n\}$ and $\{N\le n-1\}$ depend on X0,...Xn (resp. on X0,...,Xn-1), then both depend on X0,...,Xn, and therefore their difference as well.

12. Thanks Laurent, this clarifies the idea of a stopping time.

But now I have some concern about the original problem, actually the following is the original context that leads to the above problem. Thinking about the problem for a second time is making me feeling puzzled...

Let {N(t): t≥0} be a Poisson process of rate λ. The points are to be thought of as being the arrival times of customers to a store which opens at time t=T. The customers arriving between t=0 and t=T have to wait until the store opens. Let Y be the total times that these customers have to wait. Calculate Var(Y).

N(T)=N
N(T)~Poisson(λT)
(T_1,T_2,...,T_N) is equal in joint distribution to (X_(1),X_(2),...,X_(N)), where the order statistics are coming from X_1,X_2,...,X_N which are i.i.d. uniform(0,T).
=> T_1+T_2+...+T_N is equal in distribution to X_(1)+X_(2)+...+X_(N) = X_1+X_2+...+X_N
Var(Y)=Var(total waiting time)
=Var[(T-T1)+(T-T2)+...+(T-TN)]
=Var(NT-X_1-X_2-...-X_N)
=E[Var(NT-X_1-X_2-...-X_N |N)] + Var[E(NT-X_1-X_2-...-X_N |N)]
and the red part leads to my original problem in the top post.

But here are the N and the Xi's really independent? The problem is that I don't think the time of occurence of the points in a Poisson process & the number of points in [0,T] are independent. Are they? But if the N and the Xi's are not independent, I have no idea how to continue with the calcualtions and compute Var(Y).

The theorem is: $(T_1,\ldots,T_N)$ is equal in joint distribution to $(X_{(1)},\ldots,X_{(N)})$, where $N$ is a Poisson random variable of parameter $\lambda T$, and $(X_{(1)},\ldots,X_{(N)})$ are the ordered statistics of the first $N$ r.v.'s of the sequence $(X_i)_{i\geq 1}$, which is a family of independent uniformly distributed r.v.'s on $[0,T]$, independent of $N$.
Then $X_{(1)}$ depends on $N$, but $X_1$ doesn't. Since $X_{(1)}+\cdots+X_{(N)}=X_1+\cdots+X_N$, you can reduce to independent random variables that are independent of $N$ and do what you were doing first.