Thread: Calculating the likliehood of an event

1. Calculating the likliehood of an event

Hi,

I'm trying to calculate the likliehood that a website post will be 'useful' to people.

For example:
Website post 1 is voted to be useful by 8 people and 'not useful' by 2 people.
Website post 2 is voted to be useful by 800 people and 'not useful' by 200 people.

So on the face of it both posts are 80% likely to be useful to the next website viewer. However, because post 2 has been voted on by far more people I think it is more likely to be useful because it is less likely to have been a 'fluke' result. My question is how do I quantify how likley each post is to be be useful for the next website viewer, whilst taking into account the sample size of the votes???

Stu

2. Re: Calculating the likliehood of an event

An experiment that has two possible outcomes has what is known as a binomial distribution. Suppose the probability of a particular web site being scored 'useful' is 0.8 and the probability that it is 'not useful' is 0.2. If you ask N people what they think the average result will be 80% useful and 20% not useful. But few (if any) of your surveys would get that exact result, as there is varaibility in each trial (much like flipping a fair coin 100 times is highly unlikely to result in exactly 50 heads and 50 tails). The measure of this variability is the standard deviation of the survey results, and it can be shown that for a binomial distribution the standard deviation is sqrt(npq), where 'n' = number of participants, p = probability that any one participant answers "useful," and q = probability that any participant answers "not useful." Note that q = 1-p. The significance of the standard deviation is that it can be shown that you can expect about 68% of all survey results to land within one standard devaition(one "sigma") of the mean.

The average for this distribution is given by np. For n = 10 the average is 8, and n = 1000 the average is 800, or 80% in both cases.

The standard deviation for a binomial distribution is given by sqrt(npq). For n=10 the standard deviation is sqrt(1.6) = 1.26. This means that there's a 68% probability that the survey of 10 people will result in in 8 +/- 1.6 "useful" responses. In other words you have a 68% chance of the survey yielding between 6.4 and 9.6 "useful" responses. The width of that "1 sigma" band covers 32% of the range of possible outcomes (from 0 to 10). But for n=1000 you get std deviation = sqrt(160)= 12.6, so there's a 68 percent probability that the survey of 1000 will yield between 788.4 and 812.6 "useful" responses. The width of this band is only 25.2/1000 = 2.5% of the full range. Hence the larger the population that more likely the results are "reasonably close" to the true mean, and the more consistent your surveys will be.

3. Re: Calculating the likliehood of an event

Many thanks, that's great!

Is it possible to calculate a single value for "how likely it is that the next user will find the post useful"? or is the answer just 80% for both posts?

4. Re: Calculating the likliehood of an event

Originally Posted by stubarny
Many thanks, that's great!

Is it possible to calculate a single value for "how likely it is that the next user will find the post useful"? or is the answer just 80% for both posts?
The probability that the next person will answer "useful" is always 80% (assuming that 80% is the "true" mean for the survey). Think about it this way - if this value changed depending on how many people you ask then the participants would have to know how many other people you were going to ask, and then change their answer accordingly. You'd get an answer like this: "Well, if you're only going to ask 10 people I'll say it's not useful, but if you're going to ask 1000 my answer is it's useful."