(Please correct me if I'm wrong)

Here is my logic:

Let n be the sample size.

Let p' be the sample mean.(which is different from true mean).

Let S = np' be the no of smokers in the sample.

Let e = error rate in %(which is given as 0.005 in the question)

Concept of error

The error, if given as a percentage like 0.5% of the sample population n (=0.005), tells us about the standard deviation in terms of n.

Since, standard deviation of the binomial distro is ,

The absolute value of the error is then 0.005n.

In this question, we are concerned with the error of the sample mean(np' - np).

Hence, its .

And we can set up the inequality:

(np - ne) < S < (np + ne)

with which to start solving the question numerically.

Concept of Confidence Interval

The range of S which we defined earlier must constitute 95% of the probabilities. The range of S is huge, stretching from 0 smoker to n (=100%) smokers. But the tail ends are not likely. Only the ones centred around the mean (plus or minus the standard deviation) matters. In fact, they constitute 95% of all probabilities = having a 95% chance of containing the true mean = 95% confidence interval.

Plotted on the standard normal curve, its the area within the range -1.96 to 1.96. These values and area are INVARIANT for the confidence interval of 95%. In fact, we use it so often we call it "within 2 standard deviations".

How to calculate sample size

We are given a fixed error rate (a number we cannot exceed but can go down), and a fixed confidence interval, but we can vary the n. We find a relationship between e and n:

So as n --> , e --> 0.

(can anyone help me out here?...I 'm trying to find an expression for e < n)