# Thread: Is there a statistic like standard deviation for bounded variables?

1. ## Confidence estimates for bounded variables?

Is there a sample statistic that gives us easy confidence intervals about the sample mean of bounded variables?

I'm looking at large sample sets for a variable that is non-negative. I can compute a standard deviation, but presumably I can't use anything like the 68-95-99.7 rule from it because the data are skewed.

Is there an analogous measure for skewed, non-parametric variables? Especially in the case where the variable can take any positive value?

2. ## Re: Is there a statistic like standard deviation for bounded variables?

If you sample from a normal distribution the distribution of the sample mean is exactly normal. According to the central limit theorem the distribution of the sample mean from ANY distribution tends to a normal distribution, this applies to sampling from any distribution even if it does not have negative variables, after enough samples the negative parts of the normal approximation to the sample mean distribution will be negligible. With this in mind, if you have enough samples then the standard deviation will be accurate for the 68-95-99.7 rule.

3. ## Re: Is there a statistic like standard deviation for bounded variables?

Sorry, Shakarri, I just restated the problem to make it more clear. In this case I'm looking at range statistics, and their samples have a sort of Gamma-like positive skewness.

The sample mean can't be normal, because samples are concentrated near 0, but can never be negative. I don't know how this breaks the central limit theorem, but obviously there's zero probability of a negative observation.

Here's a real result from 1MM simulations of this range statistic: Mean = 1.8, stdev = .9, skew = .6. If we believe this is normally distributed then that's saying there's a 2.2% chance of the variable being negative.

4. ## Re: Is there a statistic like standard deviation for bounded variables?

Let me restate this again: I have a large number of samples of a random variable. For any statistical inference we want to make I could simply supply the full empirical distribution function and make the calculation.

What I am hoping is that for variables with skewed but non-parameterized distributions there is a standard and more concise method for describing the probability of a sample being within some ranges on either side of the mean.

5. ## Re: Is there a statistic like standard deviation for bounded variables?

Originally Posted by dbooksta
Sorry, Shakarri, I just restated the problem to make it more clear. In this case I'm looking at range statistics, and their samples have a sort of Gamma-like positive skewness.

The sample mean can't be normal, because samples are concentrated near 0, but can never be negative. I don't know how this breaks the central limit theorem, but obviously there's zero probability of a negative observation.

Here's a real result from 1MM simulations of this range statistic: Mean = 1.8, stdev = .9, skew = .6. If we believe this is normally distributed then that's saying there's a 2.2% chance of the variable being negative.
The distribution of sample mean TENDS to a normal distribution as sample size gets larger. For any finite sample size there will be part of the distribution which is negative. 2.2% is quite small although not negligible in some fields. As you increase sample size it will become smaller.
I have dealt with such conditions in the normal approximation to the binomial distribution for small chance of success.
Consider a binomial distribution with chance of success $\displaystyle \mu$ which has been sampled n times.
Suppose we get 1 success and n-1 failures from sampling.
The sample mean p is $\displaystyle \frac{1}{n}$
The sample standard deviation s is $\displaystyle \sqrt{p(1-p)}=\sqrt{\frac{1}{n} \times \frac{n-1}{n}}=\frac{1}{n}\sqrt{n-1}$

The number of standard deviations between the sample mean and the lower limit of 0 successes is $\displaystyle \frac{p-0}{s}=\frac{\frac{1}{n}}{\frac{\sqrt{n-1}}{n}}=\frac{1}{\sqrt{n-1}}$

If we keep sampling but remain with only 1 success the number of standard deviations between the sample mean and 0 is
$\displaystyle \lim_{n\rightarrow \infty} \frac{1}{\sqrt{n-1}}=0$

The "distance" between the sample mean and the 0 limit is 0, this means that 50% of the distribution mean lies below the mean if we assume a normal distribution. In this extreme case the error in assuming a normal distribution is significant.
I have had some problems with this in the past where the chance of success was around 0.0001 from my research. I developed a way to calculate confidence intervals assuming the sample mean is normally distributed but truncated at zero. It is not concise as you had hoped but I'll attach the paper here if you want to look into it. It is generalised for distributions which have both an upper and a lower limit. Since you have no upper limit you can just use infinity for that and it will work out.