Estimating proportions and means
A relatively simple situation is estimation of a
proportion. For example, we may wish to estimate the proportion of residents in a community who are at least 65 years old.
The
estimator of a
proportion is
, where
X is the number of 'positive' observations (e.g. the number of people out of the
n sampled people who are at least 65 years old). When the observations are
independent, this estimator has a (scaled)
binomial distribution (and is also the
sample mean of data from a
Bernoulli distribution). The maximum
variance of this distribution is 0.25/
n, which occurs when the true
parameter is
p = 0.5. In practice, since
p is unknown, the maximum variance is often used for sample size assessments.
For sufficiently large
n, the distribution of
will be closely approximated by a
normal distribution with the same mean and variance.
^{[1]} Using this approximation, it can be shown that around 95% of this distribution's probability lies within 2 standard deviations of the mean. Because of this, an interval of the form
will form a 95% confidence interval for the true proportion. If this interval needs to be no more than
W units wide, the equation
can be solved for
n, yielding
^{[2]}^{[3]} n = 4/
W^{2} = 1/
B^{2} where
B is the error bound on the estimate, i.e., the estimate is usually given as
within ± B. So, for
B = 10% one requires
n = 100, for
B = 5% one needs
n = 400, for
B = 3% the requirement approximates to
n = 1000, while for
B = 1% a sample size of
n = 10000 is required. These numbers are quoted often in news reports of
opinion polls and other
sample surveys.