# Thread: Standard Error of a Binomial Distribution

1. ## Standard Error of a Binomial Distribution

Checked the FAQ and there is nothing that says 'no gambling problems' so this question might be ok. I am trying to create a soccer model using shots on target and 'conversion rate' (CR) which is how often shots on target are converted to goals. The difficulty I have is in determining how much significance to assign to each games CRin creating my ratings. I'll use an example to explain.

Team A: 1 goal. 5 shots on target. CR = 0.2. Expected CR = 0.25
Team B: 2 goals. 10 shots on target. CR = 0.2. Expected CR = 0.25

I thought this was a Binomial distribution with p = expected CR and n = number of shots on target, with variance = n*p*(1-p) but my Standard Error results were rubbish since team B had more shots on target, the CR = 0.2 should be more accurate than the CR for team a since I have more samples, where a sample is a shot on target. Does that make sense? Then I tried a Bernoulli distribution which worked better where variance = p(1-p), standard deviation was the square root of the variance, and then I calculated a standard error by dividing the SD by the square of n, which in this case was the number of shots on target.

Team A:
Variance = 0.25*0.75 = 0.43
SD = SQRT(0.43) = 0.185
SE = 0.185 / SQRT 5 = 0.08

Team B:
SE = 0.185 / SQRT 10 = 0.06

This looks a lot better to me, since the SE of the CR is lower for team B than it is for team A. Is this close to what I should be doing? The next part if to use the inverse of the SE to create a value of 'significance' so I can weight the importance of the CR when estimating team strength in terms of CR. Where there are no shots on goal, the significance should be 0 since there is no CR data to work with, and when there are many shots on goal, the significance should be greater.

2. variance = n*p*(1-p)
This is the correct formula for the variance of a binomial(n,p) distribution but i think you're mis-interpreting it.
This is the variance of the number of shots on target if the model is correct, not the variance of your estimate of the conversion rate. You would expect the variability of the number of shots on target to increase with the number of attempts, which is what you've observed.

For the variance of your estimate, i'd think along these lines (not sure it's correct though):

given that the true distribution is $X \sim Bin(n,p)$,and that the estimated conversion rate is $\hat{p} = X/n$.

$Var(\hat{p}|n) = Var(\frac{X}{n}|n) = \frac{Var(X|n)}{n^2} = \frac{np(1-p)}{n^2} = \frac{p(1-p)}{n}$

$sd(\hat{p}|n) = \sqrt{\frac{p(1-p)}{n}}$

Which is decreasing in n as you expected. Whether or not this can be inverted to create a significance measure with a useful interpretation, i have no idea im afraid.

3. Thanks for the response, that's similar to what I'm looking for. I say similar because I can't tell if it's correct or not, but it does the job!