# Central Limit Theorem.

• May 24th 2009, 05:02 AM
panda*
Central Limit Theorem.
Another one of those review exercises questions!

A box contains an unknown number of white and black balls. We wish to estimate the proportion p of white balls in the box. To do so, we draw n successive balls with replacement. Let $\displaystyle Z_n$ be the proportion of white balls obtained after n drawings.

(i) Show that for all $\displaystyle \epsilon > 0$,
$\displaystyle \mathbb P (|Z_{n} - p| \geq \epsilon) \leq \frac{1}{4n \epsilon^2}$
(ii) Using the result in part (i), find the smallest value of n such that with probability greater than 0.95, the proportion $\displaystyle Z_{n}$ in the sample will estimate p within 0.1.

(iii) Same question as in (ii) using the central limit theorem.

Thank you again guys! Been great help, really!
• May 24th 2009, 05:39 AM
Moo
Hello,
Quote:

Originally Posted by panda*
Another one of those review exercises questions!

A box contains an unknown number of white and black balls. We wish to estimate the proportion p of white balls in the box. To do so, we draw n successive balls with replacement. Let $\displaystyle Z_n$ be the proportion of white balls obtained after n drawings.

(i) Show that for all $\displaystyle \epsilon > 0$,
$\displaystyle \mathbb P (|Z_{n} - p| \geq \epsilon) \leq \frac{1}{4n \epsilon^2}$

Use Chebyshev's inequality.
To compute the variance of $\displaystyle Z_n$, consider the rv $\displaystyle X_i$, which equals 1 if you get a white ball at the i-th drawing, 0 otherwise.
You can see that the $\displaystyle X_i$ are independent and that $\displaystyle Z_n=\frac 1n \sum_{i=1}^n X_i$
So now it's easy to find the variance.

Quote:

(ii) Using the result in part (i), find the smallest value of n such that with probability greater than 0.95, the proportion $\displaystyle Z_{n}$ in the sample will estimate p within 0.1.
Take the complementary of the above inequality. It may help you see where you're going :
$\displaystyle \mathbb{P}(|Z_n-p|<\epsilon)\geq 1-\frac{1}{4n\epsilon^2}$

"will estimate p within 0.1" means that we let $\displaystyle \epsilon=0.1$
Then, find the smallest n such that $\displaystyle 1-\frac{1}{4n\epsilon^2}\geq 0.95$, that is $\displaystyle \frac{1}{4n\epsilon^2}\leq 0.05$

Quote:

(iii) Same question as in (ii) using the central limit theorem.
What does the central limit theorem says ?
It should be easy with the way I defined $\displaystyle Z_n$, isn't it ?

Quote:

Thank you again guys! Been great help, really!
Guys, guys... What about cows ??? :D

• May 25th 2009, 06:04 AM
panda*
Hello Moo! I tried on the question with your advise and this is what I got so far! I am not sure if I am on the right track, but correct me if I am wrong! (:

For part (i)

According to the Chebyshev's inequality, if X has mean $\displaystyle \mu$ and variance, $\displaystyle \sigma^2$, then,
$\displaystyle \mathbb P (|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$
From the question,
$\displaystyle \mathbb P(|\mathbb Z_n - p| \geq \epsilon) \leq \frac{1}{4n \epsilon^2}$

$\displaystyle \Rightarrow \mu = p k = 2 \sqrt n \epsilon \sigma = \frac{2}{2\sqrt n}$

$\displaystyle \sigma^2 = E[(z_n - p)^2]$
$\displaystyle = \int^\infty_{-\infty} (z_n - p)^2 f(x) dx$
$\displaystyle \geq \int_{|z_n - p| \geq \epsilon} (z_n - p)^2 f(x) dx$
$\displaystyle \geq (2 \sqrt n \epsilon)^2(\sigma^2) \int_{|z_n - p| \geq \epsilon} f(x) dx$
$\displaystyle \geq 4n\epsilon^2(\sigma^2) \int_{|z_n - p| \geq \epsilon} f(x)$
$\displaystyle \geq 2n \epsilon^2 (\sigma^2) \mathbb P(Z_n - p) \geq \epsilon)$

By dividing both sides with $\displaystyle \sigma^2$,
$\displaystyle \Rightarrow \mathbb P(|Z_n - p| \geq \epsilon) \leq \frac{1}{4n\epsilon^2}$
For part (ii),
$\displaystyle \mathbb P(|Z_n - P) \leq \epsilon)$
$\displaystyle \Rightarrow \mathbb P(|Z_n - p| \leq \epsilon) \geq 1 - \frac{1}{4n\epsilon^2}$
$\displaystyle \Rightarrow 1 - \frac{1}{4n\epsilon^2} \geq 0.95$
$\displaystyle \Rightarrow \frac{1}{4n\epsilon^2} \leq 0.05$
$\displaystyle \Rightarrow 4n\epsilon^2 \geq 20$
By estimating p within 0.1 means we let $\displaystyle \epsilon = 0.01$,
$\displaystyle \Rightarrow 4n(0.01)^2 \geq 20$
$\displaystyle \Rightarrow 4n \geq 200000$
$\displaystyle \Rightarrow n \geq 50000$
$\displaystyle \Rightarrow n = 50001$.
Am I doing alright so far?

Hm, Central Limit Theorem says ...

Let $\displaystyle X_1, X_2, ...$ be indepedent, identically distributed random variables with $\displaystyle E(X_i) = \mu$ and $\displaystyle V(X_i) = \sigma^2$, and let $\displaystyle S_n = X_1 + X_2 + ... + X_n$. Then,
$\displaystyle Z_n = \frac {S_n - n\mu}{\sigma \sqrt n} \rightarrow N(0,1),$ as n $\displaystyle \rightarrow \infty$.
Not really sure how to proceed from here!
• May 25th 2009, 10:43 AM
Moo
Hi !
Quote:

Originally Posted by panda*
Hello Moo! I tried on the question with your advise and this is what I got so far! I am not sure if I am on the right track, but correct me if I am wrong! (:

For part (i)

According to the Chebyshev's inequality, if X has mean $\displaystyle \mu$ and variance, $\displaystyle \sigma^2$, then,
$\displaystyle \mathbb P (|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$
From the question,
$\displaystyle \mathbb P(|\mathbb Z_n - p| \geq \epsilon) \leq \frac{1}{4n \epsilon^2}$

$\displaystyle \Rightarrow \mu = p$
$\displaystyle k = 2 \sqrt n \epsilon$
$\displaystyle \sigma = \frac{2}{2\sqrt n}$

Okay, I had some problems with this part, but while quoting, I saw what you mean. You can't make spaces with the latex.

Quote:

$\displaystyle \sigma^2 = E[(z_n - p)^2]$
$\displaystyle = \int^\infty_{-\infty} (z_n - p)^2 f(x) dx$
$\displaystyle \geq \int_{|z_n - p| \geq \epsilon} (z_n - p)^2 f(x) dx$
$\displaystyle \geq (2 \sqrt n \epsilon)^2(\sigma^2) \int_{|z_n - p| \geq \epsilon} f(x) dx$
$\displaystyle \geq 4n\epsilon^2(\sigma^2) \int_{|z_n - p| \geq \epsilon} f(x)$
$\displaystyle \geq 2n \epsilon^2 (\sigma^2) \mathbb P(Z_n - p) \geq \epsilon)$

By dividing both sides with $\displaystyle \sigma^2$,
$\displaystyle \Rightarrow \mathbb P(|Z_n - p| \geq \epsilon) \leq \frac{1}{4n\epsilon^2}$

But in these two previous quotes, I have to say that I don't agree... Or maybe I just didn't understand what you did (especially for the integrals) :(

You want to prove that it's $\displaystyle \leq \frac{1}{4n\epsilon^2}$, you're not asked to identify every element of it o.O

Anyway, use this version of Chebyshev's inequality :

$\displaystyle \mathbb{P}(|Z_n-\mathbb{E}(Z_n)|>\epsilon)\leq \frac{\text{Var}(Z_n)}{\epsilon^2}$

$\displaystyle \mathbb{E}(Z_n)=\mathbb{E}\left(\frac 1n\sum_{i=1}^n X_i\right)=\frac 1n \sum_{i=1}^n \mathbb{E}(X_i)$

But as I wrote before (or at least what I meant), the $\displaystyle X_i$ follow a Bernoulli distribution, with parameter p.

This means that $\displaystyle \sum_{i=1}^n\mathbb{E}(X_i)=n\mathbb{E}(X_1)=np$

So indeed, $\displaystyle \mathbb{E}(Z_n)=p$

From Chebyshev's inequality, we can say that $\displaystyle \mathbb{P}(|Z_n-p|>\epsilon)\leq \frac{\text{Var}(Z_n)}{\epsilon^2}$

Now, what is $\displaystyle \text{Var}(Z_n)$ ?

$\displaystyle \text{var}(Z_n)=\text{Var}\left(\frac 1n\sum_{i=1}^n X_i\right)=\frac{1}{n^2}\text{Var}\left(\sum_{i=1} ^n X_i\right)$

Since the Xi are independent, and identically distributed, we have :
$\displaystyle \text{Var}(Z_n)=\frac{1}{n^2}\cdot\left(n \text{Var}(X_1)\right)=\frac{pq}{n}$

where q=1-p. Because the variance of a Bernoulli distribution is pq.

Now, note that $\displaystyle \forall x\in[0,1] ~,~ x(1-x)\leq \frac 14$

From here, the inequality you're looking for just appears !

Quote:

For part (ii),
$\displaystyle \mathbb P(|Z_n - P) \leq \epsilon)$
$\displaystyle \Rightarrow \mathbb P(|Z_n - p| \leq \epsilon) \geq 1 - \frac{1}{4n\epsilon^2}$
$\displaystyle \Rightarrow 1 - \frac{1}{4n\epsilon^2} \geq 0.95$
$\displaystyle \Rightarrow \frac{1}{4n\epsilon^2} \leq 0.05$
$\displaystyle \Rightarrow 4n\epsilon^2 \geq 20$
By estimating p within 0.1 means we let $\displaystyle \epsilon = 0.01$,
$\displaystyle \Rightarrow 4n(0.01)^2 \geq 20$
$\displaystyle \Rightarrow 4n \geq 200000$
$\displaystyle \Rightarrow n \geq 50000$
$\displaystyle \Rightarrow n = 50001$.
Am I doing alright so far?
Yes, except that it should be $\displaystyle \epsilon=0.1$ isn't it ?
And $\displaystyle n\geq 50000$ --> you can take n=50000.

Quote:

Hm, Central Limit Theorem says ...

Let $\displaystyle X_1, X_2, ...$ be indepedent, identically distributed random variables with $\displaystyle E(X_i) = \mu$ and $\displaystyle V(X_i) = \sigma^2$, and let $\displaystyle S_n = X_1 + X_2 + ... + X_n$. Then,
$\displaystyle Z_n = \frac {S_n - n\mu}{\sigma \sqrt n} \rightarrow N(0,1),$ as n $\displaystyle \rightarrow \infty$.
Not really sure how to proceed from here!
You can see that $\displaystyle Z_n$ is in the same form as $\displaystyle S_n$ and that it satisfies all the required conditions.

Thus $\displaystyle \frac{X_1+\dots+X_n-np}{\sigma\sqrt{n}}=\sqrt{n}\cdot\frac{Z_n-p}{\sigma}$ converges to the std normal distribution. (in probability)

But this means that the cumulative density function of $\displaystyle \sqrt{n}\cdot\frac{Z_n-p}{\sigma}$ converges to the cumulative density function of the std normal distribution.

So $\displaystyle \mathbb{P}\left(\sqrt{n}\cdot\frac{Z_n-p}{\sigma} \in[a,b]\right)\xrightarrow[]{n\to\infty} \int_a^b \frac{1}{\sqrt{2\pi}}\cdot e^{-t^2}{2} ~dt$

Does that help you ? Have you ever heard of confidence intervals ?

Sorry it's a bit long...
• May 26th 2009, 03:48 AM
panda*
It's okay Moo! Long is good, it just means its more detailed. So part (iii) of this question is just asking me to show whatever you have showed me above? Is that the solution that they are asking more? Thank you again though!
• May 26th 2009, 09:20 AM
Moo
Quote:

Originally Posted by panda*
It's okay Moo! Long is good, it just means its more detailed. So part (iii) of this question is just asking me to show whatever you have showed me above? Is that the solution that they are asking more? Thank you again though!

Hmm I don't get your question.. ? (Worried)
Sorry, I'm a bit slow sometimes :D

For part iii), use the fact that for a large n, $\displaystyle \mathbb{P}\left(\left|\sqrt{n}\cdot\frac{Z_n-p}{\sigma}\right|\geq\epsilon\right) \approx 1-\int_{-\epsilon}^\epsilon \frac{1}{\sqrt{2\pi}}\cdot e^{-t^2/2} ~dt$

(use a z-table)
• May 27th 2009, 03:58 AM
panda*
Oh! So we just have to evaluate the integral and subbing the boundaries of $\displaystyle \epsilon = 0.1$?

Anyway, as for the integral proof I used earlier to prove the inequality was provided in my lecture notes for a continuous random variable where, $\displaystyle \sigma^2 = E[(X-\mu)^2]$. Is it better to use the way you suggested or would the way I used work too?
• May 27th 2009, 04:59 AM
panda*
Regarding your proof, I tried it out, there are couple of places where I don't understand how did we get there, for example,

How did you get like,

$\displaystyle \mathbb E(Z_n) = \mathbb E(\frac{1}{n} \sum^n_{i=1} X_i)$ and $\displaystyle Var(Z_n) = \frac{1}{n^2}(n.Var(X_i))$?

Also, what is the relation for $\displaystyle \forall x \in [0,1], x(1-x) \leq \frac{1}{4}$ to the final step?

And I still don't know how to do part (iii)!

Thank you for your time again!