1. ## Central Limit Theorem.

Another one of those review exercises questions!

A box contains an unknown number of white and black balls. We wish to estimate the proportion p of white balls in the box. To do so, we draw n successive balls with replacement. Let $Z_n$ be the proportion of white balls obtained after n drawings.

(i) Show that for all $\epsilon > 0$,
$\mathbb P (|Z_{n} - p| \geq \epsilon) \leq \frac{1}{4n \epsilon^2}$
(ii) Using the result in part (i), find the smallest value of n such that with probability greater than 0.95, the proportion $Z_{n}$ in the sample will estimate p within 0.1.

(iii) Same question as in (ii) using the central limit theorem.

Thank you again guys! Been great help, really!

2. Hello,
Originally Posted by panda*
Another one of those review exercises questions!

A box contains an unknown number of white and black balls. We wish to estimate the proportion p of white balls in the box. To do so, we draw n successive balls with replacement. Let $Z_n$ be the proportion of white balls obtained after n drawings.

(i) Show that for all $\epsilon > 0$,
$\mathbb P (|Z_{n} - p| \geq \epsilon) \leq \frac{1}{4n \epsilon^2}$
Use Chebyshev's inequality.
To compute the variance of $Z_n$, consider the rv $X_i$, which equals 1 if you get a white ball at the i-th drawing, 0 otherwise.
You can see that the $X_i$ are independent and that $Z_n=\frac 1n \sum_{i=1}^n X_i$
So now it's easy to find the variance.

(ii) Using the result in part (i), find the smallest value of n such that with probability greater than 0.95, the proportion $Z_{n}$ in the sample will estimate p within 0.1.
Take the complementary of the above inequality. It may help you see where you're going :
$\mathbb{P}(|Z_n-p|<\epsilon)\geq 1-\frac{1}{4n\epsilon^2}$

"will estimate p within 0.1" means that we let $\epsilon=0.1$
Then, find the smallest n such that $1-\frac{1}{4n\epsilon^2}\geq 0.95$, that is $\frac{1}{4n\epsilon^2}\leq 0.05$

(iii) Same question as in (ii) using the central limit theorem.
What does the central limit theorem says ?
It should be easy with the way I defined $Z_n$, isn't it ?

Thank you again guys! Been great help, really!
Guys, guys... What about cows ???

3. Hello Moo! I tried on the question with your advise and this is what I got so far! I am not sure if I am on the right track, but correct me if I am wrong! (:

For part (i)

According to the Chebyshev's inequality, if X has mean $\mu$ and variance, $\sigma^2$, then,
$\mathbb P (|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$
From the question,
$\mathbb P(|\mathbb Z_n - p| \geq \epsilon) \leq \frac{1}{4n \epsilon^2}$

$\Rightarrow \mu = p
k = 2 \sqrt n \epsilon
\sigma = \frac{2}{2\sqrt n}$

$\sigma^2 = E[(z_n - p)^2]$
$= \int^\infty_{-\infty} (z_n - p)^2 f(x) dx$
$\geq \int_{|z_n - p| \geq \epsilon} (z_n - p)^2 f(x) dx$
$\geq (2 \sqrt n \epsilon)^2(\sigma^2) \int_{|z_n - p| \geq \epsilon} f(x) dx$
$\geq 4n\epsilon^2(\sigma^2) \int_{|z_n - p| \geq \epsilon} f(x)$
$\geq 2n \epsilon^2 (\sigma^2) \mathbb P(Z_n - p) \geq \epsilon)$

By dividing both sides with $\sigma^2$,
$\Rightarrow \mathbb P(|Z_n - p| \geq \epsilon) \leq \frac{1}{4n\epsilon^2}$
For part (ii),
$\mathbb P(|Z_n - P) \leq \epsilon)$
$\Rightarrow \mathbb P(|Z_n - p| \leq \epsilon) \geq 1 - \frac{1}{4n\epsilon^2}$
$\Rightarrow 1 - \frac{1}{4n\epsilon^2} \geq 0.95$
$\Rightarrow \frac{1}{4n\epsilon^2} \leq 0.05$
$\Rightarrow 4n\epsilon^2 \geq 20$
By estimating p within 0.1 means we let $\epsilon = 0.01$,
$\Rightarrow 4n(0.01)^2 \geq 20$
$\Rightarrow 4n \geq 200000$
$\Rightarrow n \geq 50000$
$\Rightarrow n = 50001$.
Am I doing alright so far?

Hm, Central Limit Theorem says ...

Let $X_1, X_2, ...$ be indepedent, identically distributed random variables with $E(X_i) = \mu$ and $V(X_i) = \sigma^2$, and let $S_n = X_1 + X_2 + ... + X_n$. Then,
$Z_n = \frac {S_n - n\mu}{\sigma \sqrt n} \rightarrow N(0,1),$ as n $\rightarrow \infty$.
Not really sure how to proceed from here!

4. Hi !
Originally Posted by panda*
Hello Moo! I tried on the question with your advise and this is what I got so far! I am not sure if I am on the right track, but correct me if I am wrong! (:

For part (i)

According to the Chebyshev's inequality, if X has mean $\mu$ and variance, $\sigma^2$, then,
$\mathbb P (|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$
From the question,
$\mathbb P(|\mathbb Z_n - p| \geq \epsilon) \leq \frac{1}{4n \epsilon^2}$

$\Rightarrow \mu = p$
$k = 2 \sqrt n \epsilon$
$\sigma = \frac{2}{2\sqrt n}$
Okay, I had some problems with this part, but while quoting, I saw what you mean. You can't make spaces with the latex.

$\sigma^2 = E[(z_n - p)^2]$
$= \int^\infty_{-\infty} (z_n - p)^2 f(x) dx$
$\geq \int_{|z_n - p| \geq \epsilon} (z_n - p)^2 f(x) dx$
$\geq (2 \sqrt n \epsilon)^2(\sigma^2) \int_{|z_n - p| \geq \epsilon} f(x) dx$
$\geq 4n\epsilon^2(\sigma^2) \int_{|z_n - p| \geq \epsilon} f(x)$
$\geq 2n \epsilon^2 (\sigma^2) \mathbb P(Z_n - p) \geq \epsilon)$

By dividing both sides with $\sigma^2$,
$\Rightarrow \mathbb P(|Z_n - p| \geq \epsilon) \leq \frac{1}{4n\epsilon^2}$
But in these two previous quotes, I have to say that I don't agree... Or maybe I just didn't understand what you did (especially for the integrals)

You want to prove that it's $\leq \frac{1}{4n\epsilon^2}$, you're not asked to identify every element of it o.O

Anyway, use this version of Chebyshev's inequality :

$\mathbb{P}(|Z_n-\mathbb{E}(Z_n)|>\epsilon)\leq \frac{\text{Var}(Z_n)}{\epsilon^2}$

$\mathbb{E}(Z_n)=\mathbb{E}\left(\frac 1n\sum_{i=1}^n X_i\right)=\frac 1n \sum_{i=1}^n \mathbb{E}(X_i)$

But as I wrote before (or at least what I meant), the $X_i$ follow a Bernoulli distribution, with parameter p.

This means that $\sum_{i=1}^n\mathbb{E}(X_i)=n\mathbb{E}(X_1)=np$

So indeed, $\mathbb{E}(Z_n)=p$

From Chebyshev's inequality, we can say that $\mathbb{P}(|Z_n-p|>\epsilon)\leq \frac{\text{Var}(Z_n)}{\epsilon^2}$

Now, what is $\text{Var}(Z_n)$ ?

$\text{var}(Z_n)=\text{Var}\left(\frac 1n\sum_{i=1}^n X_i\right)=\frac{1}{n^2}\text{Var}\left(\sum_{i=1} ^n X_i\right)$

Since the Xi are independent, and identically distributed, we have :
$\text{Var}(Z_n)=\frac{1}{n^2}\cdot\left(n \text{Var}(X_1)\right)=\frac{pq}{n}$

where q=1-p. Because the variance of a Bernoulli distribution is pq.

Now, note that $\forall x\in[0,1] ~,~ x(1-x)\leq \frac 14$

From here, the inequality you're looking for just appears !

For part (ii),
$\mathbb P(|Z_n - P) \leq \epsilon)$
$\Rightarrow \mathbb P(|Z_n - p| \leq \epsilon) \geq 1 - \frac{1}{4n\epsilon^2}$
$\Rightarrow 1 - \frac{1}{4n\epsilon^2} \geq 0.95$
$\Rightarrow \frac{1}{4n\epsilon^2} \leq 0.05$
$\Rightarrow 4n\epsilon^2 \geq 20$
By estimating p within 0.1 means we let $\epsilon = 0.01$,
$\Rightarrow 4n(0.01)^2 \geq 20$
$\Rightarrow 4n \geq 200000$
$\Rightarrow n \geq 50000$
$\Rightarrow n = 50001$.
Am I doing alright so far?
Yes, except that it should be $\epsilon=0.1$ isn't it ?
And $n\geq 50000$ --> you can take n=50000.

Hm, Central Limit Theorem says ...

Let $X_1, X_2, ...$ be indepedent, identically distributed random variables with $E(X_i) = \mu$ and $V(X_i) = \sigma^2$, and let $S_n = X_1 + X_2 + ... + X_n$. Then,
$Z_n = \frac {S_n - n\mu}{\sigma \sqrt n} \rightarrow N(0,1),$ as n $\rightarrow \infty$.
Not really sure how to proceed from here!
You can see that $Z_n$ is in the same form as $S_n$ and that it satisfies all the required conditions.

Thus $\frac{X_1+\dots+X_n-np}{\sigma\sqrt{n}}=\sqrt{n}\cdot\frac{Z_n-p}{\sigma}$ converges to the std normal distribution. (in probability)

But this means that the cumulative density function of $\sqrt{n}\cdot\frac{Z_n-p}{\sigma}$ converges to the cumulative density function of the std normal distribution.

So $\mathbb{P}\left(\sqrt{n}\cdot\frac{Z_n-p}{\sigma} \in[a,b]\right)\xrightarrow[]{n\to\infty} \int_a^b \frac{1}{\sqrt{2\pi}}\cdot e^{-t^2}{2} ~dt$

Does that help you ? Have you ever heard of confidence intervals ?

Sorry it's a bit long...

5. It's okay Moo! Long is good, it just means its more detailed. So part (iii) of this question is just asking me to show whatever you have showed me above? Is that the solution that they are asking more? Thank you again though!

6. Originally Posted by panda*
It's okay Moo! Long is good, it just means its more detailed. So part (iii) of this question is just asking me to show whatever you have showed me above? Is that the solution that they are asking more? Thank you again though!
Hmm I don't get your question.. ?
Sorry, I'm a bit slow sometimes

For part iii), use the fact that for a large n, $\mathbb{P}\left(\left|\sqrt{n}\cdot\frac{Z_n-p}{\sigma}\right|\geq\epsilon\right) \approx 1-\int_{-\epsilon}^\epsilon \frac{1}{\sqrt{2\pi}}\cdot e^{-t^2/2} ~dt$

(use a z-table)

7. Oh! So we just have to evaluate the integral and subbing the boundaries of $\epsilon = 0.1$?

Anyway, as for the integral proof I used earlier to prove the inequality was provided in my lecture notes for a continuous random variable where, $\sigma^2 = E[(X-\mu)^2]$. Is it better to use the way you suggested or would the way I used work too?

8. Regarding your proof, I tried it out, there are couple of places where I don't understand how did we get there, for example,

How did you get like,

$\mathbb E(Z_n) = \mathbb E(\frac{1}{n} \sum^n_{i=1} X_i)$ and $Var(Z_n) = \frac{1}{n^2}(n.Var(X_i))$?

Also, what is the relation for $\forall x \in [0,1], x(1-x) \leq \frac{1}{4}$ to the final step?

And I still don't know how to do part (iii)!

Thank you for your time again!