# Normail approximation to binomial distribution

• Jun 5th 2011, 05:48 AM
iamnobody917
Normail approximation to binomial distribution
Suppose X1+X2+...+X80 follows a binomial distribution with n = 80, p = 0.74, q = 0.26, mean = 59.2, variance = 15.392
then we may say that X1 + X2 + ... + X80 follows a normal distribution(approximately) with mean = 59.2, variance = 15.392

Yi is another random variable such that Yi = 7Xi - 1 for i=1, 2, ..., 80
Y1 + Y2 + ... + Y80 = 7(X1 + X2 + ... + X80) - 80
so Mean(Y1 + Y2 + ... Y80) = 7Mean(X1+X2+...+X80) - 80 = 7*59.2 - 80 = 334.4
Var(Y1 + Y2 + ... + Y80) = 49Var(X1+X2+...+X80) = 49 * 15.392 = 754.208
so we may also say that Y1+Y2+...+Y80 follows a normal distribution with mean = 334.4, variance = 754.208
Is this correct up to this point?

Now suppose we want to calculate:
probability(Y1+Y2+...+Y80 >= 340)
=P( Z >= (339.5- 334.4)/sqrt(754.208) )
=P( Z >= 0.1857 )

On the other hand
probability(Y1+Y2+...+Y80 >= 340)
=P( 7(X1+X2+...+X80)-80 >= 340 )
=P( X1+X2+...+X80 >= 60 )
=P( Z >= (59.5-59.2)/sqrt(15.392) )
=P( Z >= 0.0765 )
which is completely different.
It seems the methods do not give consistent answers.

Now when we want to calculate
probability(Y1+Y2+...+Y80 >= 300)
=P( Z >= (299.5 - 334.4)/sqrt(754.208) )
=P( Z >= -1.2708 )

on the other hand
P(Y1+Y2+...+Y80 >= 300)
=P( 7(X1+X2+...+X80)-80 >= 300 )
=P( X1+X2+...+X80 >= 55 )
=P( Z >= (54.5-59.2)/15.392 )
=P( Z >= -1.1980 )
which is quite close to the answer from method 1

So which method is the correct one?
I also don't understand the reasons for such a large discrepancy in the first calculation.
• Jun 5th 2011, 06:42 AM
CaptainBlack
Quote:

Originally Posted by iamnobody917
Suppose X follows a binomial distribution with n = 80, p = 0.74, q = 0.26, mean = 59.2, variance = 15.392
then we may say that X1 + X2 + ... + X80 follows a normal distribution(approximately) with mean = 59.2, variance = 15.392

No. The sum of 80 of these RVs has mean 80*59.2, and variance 80*15.392

If X were a RV with distribution B(80,0.74) then its mean would be 59.2 and variance 15.392 and so X is approximatly ~N(59.2,15.392)

CB
• Jun 5th 2011, 07:04 AM
iamnobody917
Quote:

Originally Posted by CaptainBlack
No. The sum of 80 of these RVs has mean 80*59.2, and variance 80*15.392

If X were a RV with distribution B(80,0.74) then its mean would be 59.2 and variance 15.392 and so X is approximatly ~N(59.2,15.392)

CB

I have made a mistake in the question. It should be X1+X2+...+X80 ~ B(80, 0.74)
X1, X2, ..., X80 are the Bernoulli variables.
• Jun 5th 2011, 07:53 AM
CaptainBlack
Quote:

Originally Posted by iamnobody917
Suppose X1+X2+...+X80 follows a binomial distribution with n = 80, p = 0.74, q = 0.26, mean = 59.2, variance = 15.392
then we may say that X1 + X2 + ... + X80 follows a normal distribution(approximately) with mean = 59.2, variance = 15.392

Yi is another random variable such that Yi = 7Xi - 1 for i=1, 2, ..., 80
Y1 + Y2 + ... + Y80 = 7(X1 + X2 + ... + X80) - 80
so Mean(Y1 + Y2 + ... Y80) = 7Mean(X1+X2+...+X80) - 80 = 7*59.2 - 80 = 334.4
Var(Y1 + Y2 + ... + Y80) = 49Var(X1+X2+...+X80) = 49 * 15.392 = 754.208
so we may also say that Y1+Y2+...+Y80 follows a normal distribution with mean = 334.4, variance = 754.208
Is this correct up to this point?

Now suppose we want to calculate:
probability(Y1+Y2+...+Y80 >= 340)
=P( Z >= (339.5- 334.4)/sqrt(754.208) )
=P( Z >= 0.1857 )

On the other hand
probability(Y1+Y2+...+Y80 >= 340)
=P( 7(X1+X2+...+X80)-80 >= 340 )
=P( X1+X2+...+X80 >= 60 )
=P( Z >= (59.5-59.2)/sqrt(15.392) )
=P( Z >= 0.0765 )
which is completely different.
It seems the methods do not give consistent answers.

Now when we want to calculate
probability(Y1+Y2+...+Y80 >= 300)
=P( Z >= (299.5 - 334.4)/sqrt(754.208) )
=P( Z >= -1.2708 )

on the other hand
P(Y1+Y2+...+Y80 >= 300)
=P( 7(X1+X2+...+X80)-80 >= 300 )
=P( X1+X2+...+X80 >= 55 )
=P( Z >= (54.5-59.2)/15.392 )
=P( Z >= -1.1980 )
which is quite close to the answer from method 1

So which method is the correct one?
I also don't understand the reasons for such a large discrepancy in the first calculation.