i dont understand the method you posted (that doesh't mean its wrong).
Have you been taught how to make a confidence interval?
I have some data (4 runs each of about 10 trials) which is binomial with n_hits/N_trials
n/N = 0/11, 0/9, 0/10, 0/10
So, I estimate the probability p = n/N = 0
But how can I calculate an uncertainty on this value?
I thought to try
total N_tot=40 and n_tot=1, so p_tot=1/40 = 0.025
(i.e., assume one of the trials happened to be successful instead of not)
Then s = sqrt[ 0.025 * (1-0.025)/40] = 0.0247 (approx. 1/40)
is this a more correct way of doing this?
Hope someone can help! Thanks.
I actually mentioned this in another thread as an aside earlier today. Weird coincidence. You're intuition is actually pretty close to what is recommended.
A standard thing to do is add in a certain number of successes and failures - adding two of each is recommended and has a nice interpretation - technically for the interpretation to work you add 1.96 successes and failures, but people usually just round up to 2 so that things make sense. This is sometimes called the Agresti-Coull method (side note: I took categorical data analysis from Agresti). Adding any positive number of successes and failures also has a Bayesian interpretation as putting a Beta prior on the sample proportion, if you know what that means.
I'll stick with what's fashionable and suggest adding 2 successes and 2 failures. For example, if you observe 0 successes and 40 trials, you would tweak this number to 2 successes in 44 trials. If p_tilde = 2 / 44 then you would estimate
for your standard error. If you use this standard error and use to estimate p then you can make confidence intervals, which is where the motivation and justification for 2 and 2 comes from.
The issue is that the estimate of p is 0, which makes the usual estimate of the standard error 0. If you naively try to calculate a confidence interval with the usual methods you get {0} which is useless. So the idea is to cook the books so that things work, except hopefully in a way that makes sense statistically.
There seems to be something philosophically devious going on here. I mean, the experiment produced 0 successes in every trial, and we want to try and draw our confidence about the conclusion. I would say the conclusion is absolutely certain. We have no evidence to anything! I understand the aim to be able to draw a confidence interval, but it seems sneaky. As you say, you're cooking the books to make it work numerically. My concern is the empirical implications from that, which one might argue is the point of such statistics.
Saying that we are "adding two successes and two failures" is just a way for people to explain to themselves what they are doing. The correction as derived is perfectly reasonable. The usual confidence interval is based on inverting the lousy Z test; the correction comes from inverting the good one. More importantly, it has been shown to perform well.
If this sort of thing bothers you, you would probably find an advanced course in statistical inference quite troubling. Stein's paradox comes to mind.
I would have to see a concrete example to see how that works. If an experiment is performed to test some target, and it appears 0 times in a sample of supposedly suitable size (though possibly not), then I don't see what meaningful inference can be derived from that. Instead of letting the data speak for itself, so to speak, you fudge numbers into it to derive a standard error. But doesn't that just suggest "if we were to have a little bit larger sample, we would have seen a hit so that ..." Pragmatically, I would see it as the experiment is a bust, and we need to try again. Nothing wrong with new data. And on that note, there's nothing disconcerting about Stein's paradox. It certainly isn't pragmatically troubling like an experiment that results in a 0 probability.
I certainly hope that we can make a meaningful inference. With all those failures, it seems to me like we have quite strong evidence that p is close to 0. Since the variance is a function of p, we would also hope to conclude that the variance of our estimate of p is small.
Simulations have shown constructing the confidence interval like this works fine. And it should be obvious that it is going to perform better than {0}. The fact that you don't like what you perceive to be the motivation behind the correction doesn't change the fact that it performs better. Monte Carlo it yourself if you don't believe me (or look up the original paper which IIRC includes simulation results).Instead of letting the data speak for itself, so to speak, you fudge numbers into it to derive a standard error. But doesn't that just suggest "if we were to have a little bit larger sample, we would have seen a hit so that ..." Pragmatically, I would see it as the experiment is a bust, and we need to try again. Nothing wrong with new data.
There's nothing wrong, by the way, with always adding two successes and two failures regardless of how many successes you have.
If I ever told a medical experimenter that his data was completely worthless due to something like this, and that he needed to collect more, I think I would probably be fired from the project.
And the original paper's title is ... ?
Also, what is your definition of "perform better"? The data only supports there being a point estimate with 0 error, which is the point of altering the data. But if that is the measure of performance, you could also just say there is a small confidence interval of some made up sort. It would "perform better," too.
You misunderstand what my concern is, though. I understand the statistical benefit of cooking the books, as you say. My point is wholly pragmatic. If the experiment is to express something about a phenomena, then obtaining a zero probability says nothing about it. Sure, we can infer it has a very low probability, but when it is completely zero, no evidence was obtained, it is like trying to say "lack of evidence supports the conclusion ..." It could be that the phenomena is unlikely. It could mean the sample size is too small. It could mean there is a problem with the experiment.
I'm not sure why you appear to be so skeptical, but here it is.
http://www.jstor.org/stable/2685469
Thanks for the paper. It was very informative. As I understood it, the adjustment is to help Wald intervals perform better (in terms of containing a 95% confidence interval), especially when p is near the extremes. If p is near the extreme, it is hard to look at it as the center of the interval, which is what Wald attempts to do. The adjusted Wald helps that issue.
None of this changes what I said, since I already agreed I'm sure it has statistical benefit. Per the article, I would just use the score interval. My point was pragmatic, however. I'm skeptical because that makes for good science, and statistics applies to such scientific inferences. If our data shows that we lack all evidence of some phenomena, you can most certainly use an adjusted Wald or other method of constructing a confidence interval about that, but it doesn't change the fact our point estimate is at the corner. I'm skeptical about the empirical implications, not the numeric methods involved. The Wald adjustment is not really cooking the books, as you said, since it isn't about adjusting the sample to give us a better estimate, which is what it appeared like to me. It is a correction to the Wald test itself for its lack of interval performance (given certain sample sizes or extreme estimates). The originator of this thread was really trying to force his data to say something it doesn't.
First of all, many thanks for the help and the interest from both of you: I've found this discussion very useful. I'm not trying to make my data say anything that isn't there. I am simply asking the legitimate question: "Given that I estimate p to be 0, given my limited number of trials, what are the chances that p is not exactly 0". Clearly a reasonable estimate of this uncertainty can be made for any situation where I estimate p > 0 (i.e., n > 0), so why shouldn't I be able to do it for n = 0?
Incidentally, I came up with another (more complicated) solution, and I will post this in a second :-)
Sorry... I have to attach a pdf because the Latex editor was giving all sorts of "unknown latex errors" for some reason. Here is another method that I had a go at. Probably questionable, but would appreciate the feedback!
binomial.pdf
For reference, from the Agresti and Coull (1998) paper above, I would also compare with the Clopper-Pearson interval. I didn't bother to run the calculation manually, but I found an R function binCI(n, y) that is supposed to do it (n = # observations and y = # of successes). In fact, it has facilities for a number of methods. I'll print them all. The output is below:
As the paper details, the CP interval does not perform very well, but it is an alternative to the basic. Note, I don't know half the methods used above or their appropriateness. Nor do I know if the function even works correctly. If you're interested, find their formulas and check them manually. I only leave this as reference.Code:> binCI(40, 0, method = "CP") ## Clopper-Pearson 95 percent CP confidence interval [ 0, 0.0881 ] Point estimate 0 > binCI(40, 0, method = "AC") ## Agresti-Coull 95 percent AC confidence interval [ -0.01677, 0.1044 ] Point estimate 0 > binCI(40, 0, method = "Blaker") ## Blaker 95 percent Blaker confidence interval [ 0, 0.0795 ] Point estimate 0 > binCI(40, 0, method = "Score") ## Wilson Score 95 percent Score confidence interval [ 0, 0.08762 ] Point estimate 0 > binCI(40, 0, method = "SOC") ## Second-order corrected 95 percent SOC confidence interval [ -0.005935, 0.05665 ] Point estimate 0 > binCI(40, 0, method = "Wald") ## Wald 95 percent Wald confidence interval [ 0, 0 ] Point estimate 0