# Normail approximation to binomial distribution

• Jun 5th 2011, 04:48 AM
iamnobody917
Normail approximation to binomial distribution
Suppose X1+X2+...+X80 follows a binomial distribution with n = 80, p = 0.74, q = 0.26, mean = 59.2, variance = 15.392
then we may say that X1 + X2 + ... + X80 follows a normal distribution(approximately) with mean = 59.2, variance = 15.392

Yi is another random variable such that Yi = 7Xi - 1 for i=1, 2, ..., 80
Y1 + Y2 + ... + Y80 = 7(X1 + X2 + ... + X80) - 80
so Mean(Y1 + Y2 + ... Y80) = 7Mean(X1+X2+...+X80) - 80 = 7*59.2 - 80 = 334.4
Var(Y1 + Y2 + ... + Y80) = 49Var(X1+X2+...+X80) = 49 * 15.392 = 754.208
so we may also say that Y1+Y2+...+Y80 follows a normal distribution with mean = 334.4, variance = 754.208
Is this correct up to this point?

Now suppose we want to calculate:
probability(Y1+Y2+...+Y80 >= 340)
=P( Z >= (339.5- 334.4)/sqrt(754.208) )
=P( Z >= 0.1857 )

On the other hand
probability(Y1+Y2+...+Y80 >= 340)
=P( 7(X1+X2+...+X80)-80 >= 340 )
=P( X1+X2+...+X80 >= 60 )
=P( Z >= (59.5-59.2)/sqrt(15.392) )
=P( Z >= 0.0765 )
which is completely different.
It seems the methods do not give consistent answers.

Now when we want to calculate
probability(Y1+Y2+...+Y80 >= 300)
=P( Z >= (299.5 - 334.4)/sqrt(754.208) )
=P( Z >= -1.2708 )

on the other hand
P(Y1+Y2+...+Y80 >= 300)
=P( 7(X1+X2+...+X80)-80 >= 300 )
=P( X1+X2+...+X80 >= 55 )
=P( Z >= (54.5-59.2)/15.392 )
=P( Z >= -1.1980 )
which is quite close to the answer from method 1

So which method is the correct one?
I also don't understand the reasons for such a large discrepancy in the first calculation.
• Jun 5th 2011, 05:42 AM
CaptainBlack
Quote:

Originally Posted by iamnobody917
Suppose X follows a binomial distribution with n = 80, p = 0.74, q = 0.26, mean = 59.2, variance = 15.392
then we may say that X1 + X2 + ... + X80 follows a normal distribution(approximately) with mean = 59.2, variance = 15.392

No. The sum of 80 of these RVs has mean 80*59.2, and variance 80*15.392

If X were a RV with distribution B(80,0.74) then its mean would be 59.2 and variance 15.392 and so X is approximatly ~N(59.2,15.392)

CB
• Jun 5th 2011, 06:04 AM
iamnobody917
Quote:

Originally Posted by CaptainBlack
No. The sum of 80 of these RVs has mean 80*59.2, and variance 80*15.392

If X were a RV with distribution B(80,0.74) then its mean would be 59.2 and variance 15.392 and so X is approximatly ~N(59.2,15.392)

CB

I have made a mistake in the question. It should be X1+X2+...+X80 ~ B(80, 0.74)
X1, X2, ..., X80 are the Bernoulli variables.
• Jun 5th 2011, 06:53 AM
CaptainBlack
Quote:

Originally Posted by iamnobody917
Suppose X1+X2+...+X80 follows a binomial distribution with n = 80, p = 0.74, q = 0.26, mean = 59.2, variance = 15.392
then we may say that X1 + X2 + ... + X80 follows a normal distribution(approximately) with mean = 59.2, variance = 15.392

Yi is another random variable such that Yi = 7Xi - 1 for i=1, 2, ..., 80
Y1 + Y2 + ... + Y80 = 7(X1 + X2 + ... + X80) - 80
so Mean(Y1 + Y2 + ... Y80) = 7Mean(X1+X2+...+X80) - 80 = 7*59.2 - 80 = 334.4
Var(Y1 + Y2 + ... + Y80) = 49Var(X1+X2+...+X80) = 49 * 15.392 = 754.208
so we may also say that Y1+Y2+...+Y80 follows a normal distribution with mean = 334.4, variance = 754.208
Is this correct up to this point?

Now suppose we want to calculate:
probability(Y1+Y2+...+Y80 >= 340)
=P( Z >= (339.5- 334.4)/sqrt(754.208) )
=P( Z >= 0.1857 )

On the other hand
probability(Y1+Y2+...+Y80 >= 340)
=P( 7(X1+X2+...+X80)-80 >= 340 )
=P( X1+X2+...+X80 >= 60 )
=P( Z >= (59.5-59.2)/sqrt(15.392) )
=P( Z >= 0.0765 )
which is completely different.
It seems the methods do not give consistent answers.

Now when we want to calculate
probability(Y1+Y2+...+Y80 >= 300)
=P( Z >= (299.5 - 334.4)/sqrt(754.208) )
=P( Z >= -1.2708 )

on the other hand
P(Y1+Y2+...+Y80 >= 300)
=P( 7(X1+X2+...+X80)-80 >= 300 )
=P( X1+X2+...+X80 >= 55 )
=P( Z >= (54.5-59.2)/15.392 )
=P( Z >= -1.1980 )
which is quite close to the answer from method 1

So which method is the correct one?
I also don't understand the reasons for such a large discrepancy in the first calculation.

You are applying the wrong continuity correction when using the normal approximation for the distribution of Y. The correction should be 3.5 not 0.5.

CB