An experiment that has two possible outcomes has what is known as a binomial distribution. Suppose the probability of a particular web site being scored 'useful' is 0.8 and the probability that it is 'not useful' is 0.2. If you ask N people what they think the average result will be 80% useful and 20% not useful. But few (if any) of your surveys would get that exact result, as there is varaibility in each trial (much like flipping a fair coin 100 times is highly unlikely to result in exactly 50 heads and 50 tails). The measure of this variability is the standard deviation of the survey results, and it can be shown that for a binomial distribution the standard deviation is sqrt(npq), where 'n' = number of participants, p = probability that any one participant answers "useful," and q = probability that any participant answers "not useful." Note that q = 1-p. The significance of the standard deviation is that it can be shown that you can expect about 68% of all survey results to land within one standard devaition(one "sigma") of the mean.

The average for this distribution is given by np. For n = 10 the average is 8, and n = 1000 the average is 800, or 80% in both cases.

The standard deviation for a binomial distribution is given by sqrt(npq). For n=10 the standard deviation is sqrt(1.6) = 1.26. This means that there's a 68% probability that the survey of 10 people will result in in 8 +/- 1.6 "useful" responses. In other words you have a 68% chance of the survey yielding between 6.4 and 9.6 "useful" responses. The width of that "1 sigma" band covers 32% of the range of possible outcomes (from 0 to 10). But for n=1000 you get std deviation = sqrt(160)= 12.6, so there's a 68 percent probability that the survey of 1000 will yield between 788.4 and 812.6 "useful" responses. The width of this band is only 25.2/1000 = 2.5% of the full range. Hence the larger the population that more likely the results are "reasonably close" to the true mean, and the more consistent your surveys will be.