# Math Help - standard error equation

1. ## standard error equation

Hi,

Can someone point out a good reference to understand the derivation (or proof) of the standard error of the sampling distribution of the sample mean, i.e. SE = SD/n1/2?

Even though this is such a key equation, there are very few derivations online, and many of them invoke the variance sum law, but leaving out the covariance part of the equation. The are also steps in introducing 1/n^2 into the formulas where the transition from sums squared to variances is not spelled out.

Any help would be appreciated, including, of course, the actual derivation.

Thank you.

2. ## Re: standard error equation

Hey parex.

Here is the basic idea: If your mean is (roughly) Normally distributed with mean mu and variance sigma^2, then the estimator of the mean will be N(mu,sigma^2/n). Here is the proof:

Let X_bar = 1/n * (X1 + X2 + ... + Xn).

E[X_bar]
= E[1/n * (X1 + X2 + ... + Xn).]
= 1/n * (E[X1]+ E[X2] + ... + E[Xn]).
= 1/n * n * mu = mu

Var[X_bar]
= Var[1/n * (X1 + X2 + ... + Xn)]
= 1/n^2 * (Var[X1] + Var[X2] + ... + Var[Xn])
= 1/n^2 * n*sigma^2 (Assuming all Xi's are independent which means covariance is 0)
= sigma^2/n

Since our variance is sigma^2/n, our standard error is the square root of this which is sigma/SQRT(n) and that completes the proof of that result plus normality.

If you have a big enough sample, you use the central limit theorem for normality. If you are using an estimate of the standard deviation, then you use a t-distribution (but again if you have enough of a sample, you can use a normal distribution approximation regardless).

3. ## Re: standard error equation

Thank you. I follow your argument, except for when you say "Assuming all Xi's are independent which means covariance is 0.

Clearly, Var (Xbar distribution) = Covar ([1/n * (X1 + X2 + X3 + ... + Xn] , [1/n * (X1 + X2 + X3 + ... + Xn]). In other words, the variance of the mean of the sampling distribution of the sample means of size n equals the covariance of the mean of samples of size n with themselves.

Distributing, Var (Xbar) = 1/n^2 * (Var X1 + Var X2 + Var X3 + ... + Var Xn + Cov (X1,X2) + Cov (X1,X3) + ... + Cov (Xn-1,Xn).

If the covariances are zero, then Var (Xbar) = 1/n^2 * (Var X1 + Var X2 + Var X3 + ... + Var Xn). Since all the X variables come from the same population, the Var X1 = Var X2 = Var X3 = Var Xn. And they all equal the variance of the population, sigma^2.

Hence Var (Xbar) = 1/n^2 * n * sigma = 1/n * sigma^2; and the SD (Xbar) = sigma/n^1/2.

But, I don't understand how we can consider the covariance to be zero when the samples are taken from one single population - shouldn't they be highly correlated - and therefore have a high covariance?

4. ## Re: standard error equation

We typically assume that all sample observations are independent from each other.

If they are not, then we can't use any of the results that assume they are independent and we have to instead look at joint distributions which are a lot more complicated.

The Variance operator is defined as Var[X+Y] = Var[X] + Var[Y] + 2*Cov[X,Y] in general so if your results were not independent you would need to factor in all the covariance terms and once you do that you will get a result for the standard deviation (or standard error) of the mean.

Assuming that everything has a joint multivariate normal distribution, doing the above will give the distribution of the mean as Normal(mu,sigma^2) where sigma^2 may depend on the covariance terms.

Note that positive co-variance's make the result larger and negative co-variances make it smaller.

5. ## Re: standard error equation

Yes, but we do equate cov = 0 in deriving the SE of the sampling distribution of the sample means. It's not a choice, but a math certainty, that allows ua to arrive at SE = SD / n^1/2. Unfortunately, I'm stuck, and can't see why the individual values of the random variables that will form samples are i.i.d.'s when they come from the same population. Shouldn't instead be perfectly correlated? In which case the standard error formula wouldn't be so nice...

6. ## Re: standard error equation

In statistics we often assume a random sample that is IID (independent and identically distributed) and from those assumptions, we assume that Cov(Xi,Xj) = 0 if i != j.

Remember that the definition of independent is P(A = a, B = b) = P(A = a)P(B = b). Intuitively if the observation and value of A doesn't in any way affect the value of B then you have independence.

You can prove Cov(X,Y) = 0 if X and Y are independent by using P(A = a, B = b) and Fubinis Theorem to prove E[XY] = E[X]E[Y] if X is independent from Y and then by using Cov(X,Y) = E[XY] - E[X]E[Y] you have then proven Cov(X,Y) = 0 if X is independent from Y.

If you assume that all measurements don't affect the values of the others, then you have an IID random sample and the result for the standard deviation (or standard error) is in line with the standard statistical results.

You can not not always assume this and for this reason there are branches of statistics known as longitudinal, time series, and multivariate statistics that deal with various forms of correlation and these areas are a lot harder than the introductory stuff you are doing right now.

So if you want to see what its like when things are not independent, then check out the above areas.

7. ## Re: standard error equation

Thank you. Please let's stay with very basic for one more attempt at seeing this intuitively.

We draw elements (experiments?) from a population to construct samples, and we average the values of the elements in each sample.

Isn't what we are doing in this sampling distribution really an example of choosing elements without replacement: in picking an instance of X1 from the population we eliminate that particular instance as a possibility for X2, and so on for the Xn elements of the sample. Hence X1 is not independent from the other X's.

I know my question has to stem from a fundamental misunderstanding, and I apologize for "wasting" your time with beginner's issues.

8. ## Re: standard error equation

Your question is a good one.

In terms of the sampling procedure, if you have an independent sample that is IID then sampling with or without replacement won't change the probabilities at all since they are independent.

But in other cases, this may not be true. For example if I am sampling say a ball from an urn and I don't put the ball back, then the probabilities in this case won't be independent and you won't have the situation where P(A = a, B = b) = P(A)P(B).

Depending on the situation, we may use independence as a good enough approximation (especially we have a lot of balls in the urn and for each color as well) or we may have to use the joint distribution function for all observations which is what we typically do in the urn type situation.

With regard to sampling, the whole thing depends on the joint distribution. Typically when we are dealing with sampling issues specifically, we are dealing with cases where the nature of the population is such that independence is just not going to work in general.

If this is the case then its going to be because the population is possibly very small, or because individual probabilities are small enough that without replacement has a massive effect on the likelihood (i.e. the probability given all values in the sample).

Again your question is not a bad one and you aren't wasting my time: it's a very good one and understanding this will help you make sense of this subject and also make you realize that real situations are often more complicated than the toy IID scenarios that you are taught.

The reason you are taught this is because you need to understand the basic version before you go to the more complicated version: if you did the complicated version straight away, you'd be lost in the complexity and would probably be focusing on the wrong sorts of things. We do it this way because its easier to get intuition and organization of concepts rather than if you saw the whole general way at once.

9. ## Re: standard error equation

Originally Posted by chiro
Here is the basic idea: If your mean is (roughly) Normally distributed with mean mu and variance sigma^2, then the estimator of the mean will be N(mu,sigma^2/n).
I assume you mean "random variable is (roughly) Normally. . ." And this is not a condition that needs to be satisfied. If the random variable itself is normal, then any sum or scaling (such as the mean) will itself be normal. However an estimator (like the sample mean) need not come from a normal distribution to itself be normal (it is in fact asymptotically normal).

Originally Posted by parex

I know my question has to stem from a fundamental misunderstanding, and I apologize for "wasting" your time with beginner's issues.

I think you are conflating topics which is leading to some frustration. Depending on the "experimental unit" and what is being measured, observations from the same population can be independent. In most basic statistics courses, experiments involving people usually involve assuming each measurement on an individual person is independent. It simply makes things easier in the analysis. Who a person votes for, for instance, is generally taken to be an independent response (though it's quite clear knowledge of - say - your friends voting habits would probably inform us of yours).

10. ## Re: standard error equation

Sorry I meant to say the mean of the distribution (but then it should have been sigma^2/n instead of sigma) due to CLT.

11. ## Re: standard error equation

Thank you. I think I get it. ANDS! comments were also key to addressing my misunderstanding.