# Need help with Central Limit Theorem

• Sep 20th 2008, 09:31 AM
chopet
Need help with Central Limit Theorem
Regardless of the distribution of the initial population distribution, as long as we take a sample of size n>30 from the population, and form a sampling distribution with repeated sampling m times:

1) It will be normal.
2) It will share the same mean as the population mean.
3) It will have less variance (σ2/n) than the population variance by a factor of n.

Is this definition correct? So the repeated sampling of m times is not important. Rather its the sample size n that's important?
• Sep 20th 2008, 10:52 AM
Laurent
Quote:

Originally Posted by chopet
Regardless of the distribution of the initial population distribution, as long as we take a sample of size n>30 from the population, and form a sampling distribution with repeated sampling m times:

You don't need that $\displaystyle m$ in the statement. It works this way: Take $\displaystyle n$ independent random variables with same distribution of mean $\displaystyle \mu$ and variance $\displaystyle \sigma^2$ (or a sample of size $\displaystyle n$ from the population, if you prefer), where $\displaystyle n$ is large (in practical cases, $\displaystyle n\geq 30$ is often enough for the precision needed). Then the CLT tells that the empirical mean $\displaystyle \frac{X_1+\cdots+X_n}{n}$ behaves almost like a normal random variable with mean $\displaystyle \mu$ and variance $\displaystyle {\sigma^2\over n}$.

Perhaps the $\displaystyle m$ was introduced to consider $\displaystyle m\times n$ random variables leading to $\displaystyle m$ empirical means (by taking them $\displaystyle n$ by $\displaystyle n$). This way you end up with $\displaystyle m$ independent random variables that are approximately normal.
• Sep 20th 2008, 05:54 PM
chopet
Thanks. I got it now. The sample mean is a RV with characteristic:
1) $\displaystyle E[ \overline{X}] = \mu$
2) $\displaystyle Var[ \overline{X}] = { \sigma^2 \over n}$

But we have not enough information about the distribution of the sample mean at this stage (or the sampling distribution). It takes whatever form the parent distribution is.

But CLT states that as long as n>30, the sampling distribution is approximately normal. Or in your words, $\displaystyle \overline{X}$ behaves like a normal rv.

On the other hand, sample variance $\displaystyle s^2$ also forms a distribution: the chi-square distribution with degrees of freedom n-1. Since the chi-square only has 1 parameter, we have completely characterised the distribution of the sample variance. And this one applies regardless of the size of n. (CLT does not apply here).

Am I correct?
• Sep 22nd 2008, 12:11 AM
Laurent
Your summary about the CLT for the sample mean is correct (except that the variance is $\displaystyle {\sigma^2\over n}$, remember it tends to 0).
However, in the same way that there is no reason for the distribution of the sample mean to be normal (even if this is approximately true), the distribution of the sample variance is not chi-square distributed, in general. If $\displaystyle X_1,\ldots,X_n$ are independent normal r.v. of mean $\displaystyle \mu$ and variance $\displaystyle \sigma^2$, then the sample variance $\displaystyle s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2$ is such that $\displaystyle \frac{n-1}{\sigma^2}s^2$ is a chi-square random variable with $\displaystyle n-1$ degrees of freedom. This may be used as an approximation in the case of a general distribution of $\displaystyle X$ (as long as it has a variance).
In fact, you can even use the CLT (or rather a slightly enhanced version of it) for the sample variance. Indeed, $\displaystyle \bar{X}$ converges almost surely to $\displaystyle \mu$, so that (thanks to the CLT and Slutzky's lemma) $\displaystyle s^2$ is approximately distributed like a normal random variable with mean $\displaystyle \sigma^2$ and variance $\displaystyle \frac{1}{n}Var((X-\mu)^2)$. The chi-square may be a better approximation than the Gaussian, but I don't know exactly.
• Sep 22nd 2008, 02:34 AM
CaptainBlack
Quote:

Originally Posted by chopet
But CLT states that as long as n>30, the sampling distribution is approximately normal. Or in your words, $\displaystyle \overline{X}$ behaves like a normal rv.

The CLT says nothing about samples of size greater than 30, that such a sample size is often adequate to allow a normal approximation to be used it is only a guide. There are plenty of distributions where a sample size of 30 is inadequate to allow us to use a normal approximation.

RonL