Confidence interval from st. error when the mean must be positive

• October 24th 2012, 12:33 AM
aferraro
Confidence interval from st. error when the mean must be positive
Hello all,

(Statistical newbie here - learning some basic concepts)

Some background (you can probably skip to the 'important bit', but just in case...): I am calculating the mean number of stratospheric sudden warmings (SSWs) per year. An SSW is a winter atmospheric event which happens roughly 6 times a decade. We don't have many of them on record. The dataset I have has 29 SSWs over 45 years. This gives a mean of 0.64 per year. For a given winter, there are zero events about half the time, one event about 40% of the time and two events about 10% of the time. But there is a lot of variability in this. For example, there were no SSWs at all between 1990 and 1998!

This means the sampling error on the estimate of the mean is quite large. I can calculate the standard error simply enough. I have a table going from the year 1958 to 2002, with the number of events in each year. A typical string the number of events per winter is: 2 1 0 2 1 1 2 0 1 0 0 0 0 0 1. I calculate the standard error of the mean by calculating the standard deviation and dividing it by the square root of the number of independent observations (the number of years).

So far so good.

* Important bit *: I understand how the standard error (SE) of the mean relates to the confidence interval, so that the 95% interval is approx. +/- 2*SE. All well and good. But depending on the dataset I use, I can obtain a standard error where this confidence interval crosses zero. I interpret this as an indication that the sampling error is so large we can't even be sure whether the true mean is positive!

The problem is, this is physical nonsense. We can't have a negative number of events per year. Realistically (for physical reasons that aren't too important) we will probably only ever see 0, 1, 2 or 3 events in each year.

Even though the estimate of the mean can be assumed to be normally distributed, we have an additional piece of information: that is must be positive. How do I work this into my confidence interval? Otherwise I have error bars crossing zero which makes no physical sense. Any advice would be much appreciated.

Attached plots to help:
ssws - this just shows the raw data, i.e. the number of SSW events in each year.
ssws2 - this shows the number of SSWs occurring in the period 1958-2002 split by month. I basically want to put a confidence interval on this bar chart.
ssws3 - this shows the frequency with which we observe 0, 1 annd 2 SSW events in a given year. I know it's only 3 data points, but it could be said to resemble a positive-only normal distribution.
• October 24th 2012, 02:50 AM
awkward
Re: Confidence interval from st. error when the mean must be positive
Given your data, where the number of events each year has been 0, 1, or 2, I suspect that a Poisson distribution is a more reasonable assumption than a Normal. Here is a link to a page that shows how to compute confidence intervals on the mean for a Poisson distribution:

Poisson Confidence Limits

Disclaimer: I haven't tried it myself, just took a quick glance. But it looks like the confidence intervals do not extend into the negative numbers.
• October 24th 2012, 03:29 AM
aferraro
Re: Confidence interval from st. error when the mean must be positive
Thanks for the suggestion. I'm a little confused about that page. It seems to suggest that the confidence interval is based purely on the number of events observed. Let's say I want to put error bars on my plot 'ssws2' above. I have 5 events occurring in January. With this number of observed events the Poisson 95% confidence interval is [1.6235, 11.6683]. I'm curious as to how the number is calculated based on the number of observations and uses no information about the frequency. In my example, the 5 January events happened over 45 years, so the frequency of January events is 0.111. But what if the 5 events happened over only 10 years? The frequency is much higher, but the confidence interval on that number remains the same.

Oh wait, should I multiply the [1.6235, 11.6683] by the frequency to get the confidence interval for the frequency? In which case, in my example, the frequency and associated confidence interval would be: 0.018 < 0.111 < 1.295. Do I have it right?
• October 24th 2012, 04:52 PM
awkward
Re: Confidence interval from st. error when the mean must be positive
Sorry, I replied too hastily-- I now see that the confidence interval on the page I linked to is for a single observation only, which is not your situation.

I still think a Poisson distribution is appropriate, though, so here is what I would do. The variance of the Poisson is equal to the mean, so the standard error is $\sqrt{\lambda /n}$, and an approximate 95% confidence interval on the mean is $\lambda \pm 1.96 \sqrt{\lambda / n}$. This seems to work well for the data in your chart; at least it doesn't extend into the negative numbers.

The reason the confidence interval is approximate is that it is based on a Normal distribution as an approximation to the Poisson. If this bothers you , there are statistical packages that will give an "exact" confidence interval, i.e. not based on the Normal. For example, the function for this purpose in R is "poisson.test". You can download and install R for free.
• October 25th 2012, 08:23 AM
aferraro
Re: Confidence interval from st. error when the mean must be positive
So lambda is the mean? I find that the confidence interval does extend to negative numbers.

The most difficult month is November, where I have one event in 45 years. Therefore the mean is 0.02 events per November. The standard error is SQRT(0.02/45) = 0.02. If I multiply this by 1.96 the confidence interval will cross zero! If I use R's poisson.test I get the following confidence intervals for this example:

>poisson.test(1,T=45,conf.level=0.95)

[0.0006, 0.1238]

So that makes some sense. But it's not consistent with the method of calculating the standard error. Is that because the normal distribution approximation breaks down in my example? We are considering 1 event in 45 after all.
• October 25th 2012, 01:44 PM
awkward
Re: Confidence interval from st. error when the mean must be positive
Quote:

Originally Posted by aferraro
So lambda is the mean? I find that the confidence interval does extend to negative numbers.

The most difficult month is November, where I have one event in 45 years. Therefore the mean is 0.02 events per November. The standard error is SQRT(0.02/45) = 0.02. If I multiply this by 1.96 the confidence interval will cross zero! If I use R's poisson.test I get the following confidence intervals for this example:

>poisson.test(1,T=45,conf.level=0.95)

[0.0006, 0.1238]

So that makes some sense. But it's not consistent with the method of calculating the standard error. Is that because the normal distribution approximation breaks down in my example? We are considering 1 event in 45 after all.

Yes, I think you are correct; the Normal approximation does not work well for events that occur that rarely.
• October 26th 2012, 12:07 AM
aferraro
Re: Confidence interval from st. error when the mean must be positive
OK, I think I understand this now. I'll use the exact Poisson confidence interval. Thank you for your help!