Originally Posted by

**melysion** Hi there

I have a distribution plot and I need to know how to calculate the 95% confidence interval for that plot. I know that for a bell shaped distribution one uses the t-distribution, but my plot is not bell shaped at all - it looks more a slope, where most of the population is to the right of the plot which then falls rapidly as we move left (a bit like a steep ski slope!).

How might I calculate the 95% confidence interval for this sort of plot?

All answers much appreciated

Hi, melysion.

There is a problem with your question because you ask for a confidence interval but don't say what sample statistic or parameter the confidence interval is for. A confidence interval applies to a statistic such as the sample mean or sample variance, not to a plot. See http://en.wikipedia.org/wiki/Confidence_interval.

By using the term t-distribution, you seem to be asking for a 95% confidence interval for the sample mean. But you are concerned about the non-normality of the underlying distribution. The Central Limit Theorem (CLT) says that no matter what the underlying distribution is, the sample mean will be approximately normally distributed when the sample is large enough. But what is large enough depends on the distribution and the CLT does not give specific information on what is large enough.

Here's how I handled this problem. First, using your data I calculated the 95% confidence interval for the sample mean using the standard normality assumption. The formula for the 95% confidence limits are $\displaystyle \mu \pm 1.96\sigma / \sqrt{n}$. (The 1.96 is from the normal distribution, which is essentially equal to the t-distribution for a sample size of 273.)

Code:

Mean 2.044
Standard deviation 11.027
Lower 95% limit .736
Upper 95% limit 3.352

Now to check on the effect of non-normality, I used a technique called bootstrapping or resampling. To do this I drew 100,000 samples with replacement of size 273 from your original sample. I calculated the mean of each of those 100,000 samples. I then calculated the 2.5 and 97.5 percentiles of those 100,000 means. This gives a robust estimate of the 95% confidence interval for the sample mean without making any assumptions on the underlying distribution. Here are the results.

Code:

Resamples 100,000
Lower 95% limit .897
Upper 95% limit 3.480

This shows that because of the non-normality the standard estimate of the confidence interval underestimates the confidence limits, but not terribly. With a sample size of 273, the Central Limit Theorem does well considering its generality.

PS: Your post is under High School Mathematics. Is this really for high school? It seems too advanced for that.