# Thread: calculating confidence interal for non bell-shaped plot

1. ## calculating confidence interal for non bell-shaped plot

Hi there

I have a distribution plot and I need to know how to calculate the 95% confidence interval for that plot. I know that for a bell shaped distribution one uses the t-distribution, but my plot is not bell shaped at all - it looks more a slope, where most of the population is to the right of the plot which then falls rapidly as we move left (a bit like a steep ski slope!).

How might I calculate the 95% confidence interval for this sort of plot?

2. Originally Posted by melysion
Hi there

I have a distribution plot and I need to know how to calculate the 95% confidence interval for that plot. I know that for a bell shaped distribution one uses the t-distribution, but my plot is not bell shaped at all - it looks more a slope, where most of the population is to the right of the plot which then falls rapidly as we move left (a bit like a steep ski slope!).

How might I calculate the 95% confidence interval for this sort of plot?

You will need to be able to compute the values of the cumulative distribution
for your distribution (better still the inverse of the cumulative distribution).
This is:

F(x)=prob(observation<x)=area to the left of x on your plot,

Then the endpoints for a confidence interval are:

lower limit x1 such that F(x1)=0.025, and upper limit x2 such that
F(x2)=0.975.

Then 0.95 (which is 95%) of your distribution will lie in (x1, x2).

RonL

3. Thanks for your reply, but I don't really follow. I think I may need a 'baby steps' approach.

Would transforming the data beforehand help?

4. Originally Posted by melysion
Thanks for your reply, but I don't really follow. I think I may need a 'baby steps' approach.

Would transforming the data beforehand help?
It would be easier to explain if I could see the plot (and any other information
you have).

RonL

5. Hi again

Well - its in excel format and I've attached it to this message.

Does that help?

Thanks again - its much appreciated

6. Originally Posted by melysion
Hi again

Well - its in excel format and I've attached it to this message.

Does that help?

Thanks again - its much appreciated
I think (0,6.7) is about as well as you will get out of this

RonL

7. Ok. But how did you get 6.35 please?

8. Originally Posted by melysion
Ok. But how did you get 6.35 please?
I worked out the total number of cases (that is the sum of the frequencies in
column B = 273), multiplied it by 0.95 to get 259.35.

Then I added a column C with the cumulative sum of the frequencies in
column B, then interpolated to find value corresponding to column A that
would have given this value.

This is a one sided confidence interval, to get a two sided interval I would
have had to fit the frequency table to a theoretical distribution to give
the required resolution at the low end.

RonL

9. Originally Posted by melysion
Hi there

I have a distribution plot and I need to know how to calculate the 95% confidence interval for that plot. I know that for a bell shaped distribution one uses the t-distribution, but my plot is not bell shaped at all - it looks more a slope, where most of the population is to the right of the plot which then falls rapidly as we move left (a bit like a steep ski slope!).

How might I calculate the 95% confidence interval for this sort of plot?

Hi, melysion.

There is a problem with your question because you ask for a confidence interval but don't say what sample statistic or parameter the confidence interval is for. A confidence interval applies to a statistic such as the sample mean or sample variance, not to a plot. See http://en.wikipedia.org/wiki/Confidence_interval.

By using the term t-distribution, you seem to be asking for a 95% confidence interval for the sample mean. But you are concerned about the non-normality of the underlying distribution. The Central Limit Theorem (CLT) says that no matter what the underlying distribution is, the sample mean will be approximately normally distributed when the sample is large enough. But what is large enough depends on the distribution and the CLT does not give specific information on what is large enough.

Here's how I handled this problem. First, using your data I calculated the 95% confidence interval for the sample mean using the standard normality assumption. The formula for the 95% confidence limits are $\mu \pm 1.96\sigma / \sqrt{n}$. (The 1.96 is from the normal distribution, which is essentially equal to the t-distribution for a sample size of 273.)
Code:
Mean                    2.044
Standard deviation     11.027
Lower 95% limit          .736
Upper 95% limit         3.352
Now to check on the effect of non-normality, I used a technique called bootstrapping or resampling. To do this I drew 100,000 samples with replacement of size 273 from your original sample. I calculated the mean of each of those 100,000 samples. I then calculated the 2.5 and 97.5 percentiles of those 100,000 means. This gives a robust estimate of the 95% confidence interval for the sample mean without making any assumptions on the underlying distribution. Here are the results.
Code:
Resamples             100,000
Lower 95% limit          .897
Upper 95% limit         3.480
This shows that because of the non-normality the standard estimate of the confidence interval underestimates the confidence limits, but not terribly. With a sample size of 273, the Central Limit Theorem does well considering its generality.

PS: Your post is under High School Mathematics. Is this really for high school? It seems too advanced for that.