# Thread: Sampling of a population of smokers

1. ## Sampling of a population of smokers

An unknown fraction p of a certain population smokes, and random sampling with replacement is used to determine p. It is desirable to find p with an error not exceeding 0.005 with a confidence level of 95%. How large should the sample size be?

I am confused by the error and the confidence level numbers.
Is this a binomial event? Thanks for any leads.

2. (Please correct me if I'm wrong)
Here is my logic:

Let n be the sample size.
Let p' be the sample mean.(which is different from true mean).
Let S = np' be the no of smokers in the sample.
Let e = error rate in %(which is given as 0.005 in the question)

Concept of error
The error, if given as a percentage like 0.5% of the sample population n (=0.005), tells us about the standard deviation in terms of n.
Since, standard deviation of the binomial distro is ${\sqrt {npq}}$,
$e = {{\sqrt {npq}} \over n}$

The absolute value of the error is then 0.005n.

In this question, we are concerned with the error of the sample mean(np' - np).
Hence, its $np \pm 0.005n$.

And we can set up the inequality:

(np - ne) < S < (np + ne)

with which to start solving the question numerically.

Concept of Confidence Interval

The range of S which we defined earlier must constitute 95% of the probabilities. The range of S is huge, stretching from 0 smoker to n (=100%) smokers. But the tail ends are not likely. Only the ones centred around the mean (plus or minus the standard deviation) matters. In fact, they constitute 95% of all probabilities = having a 95% chance of containing the true mean = 95% confidence interval.

Plotted on the standard normal curve, its the area within the range -1.96 to 1.96. These values and area are INVARIANT for the confidence interval of 95%. In fact, we use it so often we call it "within 2 standard deviations".

How to calculate sample size

We are given a fixed error rate (a number we cannot exceed but can go down), and a fixed confidence interval, but we can vary the n. We find a relationship between e and n:

$e = {{\sqrt {npq}} \over n}$
So as n --> $\infty$, e --> 0.

(can anyone help me out here?...I 'm trying to find an expression for e < n)