Checked the FAQ and there is nothing that says 'no gambling problems' so this question might be ok. I am trying to create a soccer model using shots on target and 'conversion rate' (CR) which is how often shots on target are converted to goals. The difficulty I have is in determining how much significance to assign to each games CRin creating my ratings. I'll use an example to explain.
Team A: 1 goal. 5 shots on target. CR = 0.2. Expected CR = 0.25
Team B: 2 goals. 10 shots on target. CR = 0.2. Expected CR = 0.25
I thought this was a Binomial distribution with p = expected CR and n = number of shots on target, with variance = n*p*(1-p) but my Standard Error results were rubbish since team B had more shots on target, the CR = 0.2 should be more accurate than the CR for team a since I have more samples, where a sample is a shot on target. Does that make sense? Then I tried a Bernoulli distribution which worked better where variance = p(1-p), standard deviation was the square root of the variance, and then I calculated a standard error by dividing the SD by the square of n, which in this case was the number of shots on target.
Variance = 0.25*0.75 = 0.43
SD = SQRT(0.43) = 0.185
SE = 0.185 / SQRT 5 = 0.08
SE = 0.185 / SQRT 10 = 0.06
This looks a lot better to me, since the SE of the CR is lower for team B than it is for team A. Is this close to what I should be doing? The next part if to use the inverse of the SE to create a value of 'significance' so I can weight the importance of the CR when estimating team strength in terms of CR. Where there are no shots on goal, the significance should be 0 since there is no CR data to work with, and when there are many shots on goal, the significance should be greater.