Re: Binomial error paradox

Hey Shakarri.

If you are trying to find the relative difference between the mean and the standard error then simply set up the inequality and solve for n.

If you are using a Wald test or a normal approximation then the standard error of the mean with a binomial is se = SQRT(p_hat*(1-p_hat)/n) where p_hat is the estimated value of the proportion which is just the mean of the sample data.

So you are looking at [1.96*se]/p_hat < t where t is your threshold (3% = 0.03) so extracting n we get:

(1.96)^2*(1-p_hat)/(p_hat*t^2) < n or

n > (1.96)^2*(1-p_hat)/(p_hat*t^2)

So you can find the first integer satisfying that condition and you have your sample size.

If you want to consider that p_hat can fluctuate within a specific range then you will need to do this for the lower and upper bounds and combine both information to get a value for n.

Re: Binomial error paradox

I am afraid you have misunderstood my question, but thanks for trying.

I am using a normal approximation and the standard error multiplied by 1.96 is 0.004

I am using the formula [1.96*se]/p_hat = t as in your response but for finding t for the current sample size n

The problem is that when applying this equation to the chance of the event occuring (p_hat= 0.1) t= 0.004/0.1= 0.04 which I consider to be too high and so more data would need to be gathered

BUT

Applying the same formula to the chance of the event not occurring (p_hat= 0.9) t=0.004/0.9= 0.0044 which is a low enough value of t and no more data would need to be gathered.

t in this case is almost **10 times lower** than t for the chance of the event occurring.

So on one hand I am not sure if 0.1 is accurate for the chance of something happening, but on the other hand I am sure that 0.9 is accurate for the chance of it not happening. I hope you can see why this doesn't make sense.

Re: Binomial error paradox

Well in terms of entropy, the maximum value is at p = 1/2, but what you could do is again use upper and lower bounds and combine the information to get an estimate.

You can make the bounds whatever you like and you can even make them dependent on an existing sample and each new observation you obtain.

The reason I mention entropy is that the point of the highest entropy is where the highest point of uncertainty. You also need a relatively lower sample for low points of entropy which is why considering where the highest amount of entropy (and the lowest) is something to really consider when you want to do these kinds of calculations.

Apart from either using a guess based on a prior distribution, or updating your guess with each new observation that is added to your appended sample, I can't really suggest anything else.

Re: Binomial error paradox

This is not about the upper and lower bounds it is about the reference point used when looking at the relative error. It isn't clear what the relative error should be relative to.

Forget about the numbers, if the chance of something happening is p and the chance of it not happening is q and I am taking a 95% confidence interval

The standard error or the difference between the upper bound and the mean is 1.96*(pq/n)^1/2

This error relative to p is [1.96*(pq/n)^1/2]/p

The error relative to q is [1.96*(pq/n)^1/2]/(1-p)

Which relative error is correct? If p<0.5 then the first figure is higher than the second. But why should they be higher? Why would I be more certain about the chance of something happening than I am about the chance of it not happening? Since q is related to p I cannot be certain about 1 and uncertain about the other.

Ultimately my question is how do people find a relative error that avoids this paradox of being more certain of q than they are of p

Re: Binomial error paradox