# Thread: [SOLVED] Geometric law estimated with the central limit theorem - Help :)

1. ## [SOLVED] Geometric law estimated with the central limit theorem - Help :)

Here is the problem:

We have a random sample of 48 discrete random variables X1...X48, independent and identically distributed, picked from a population at random with replacement (meaning that I could possibly repick the same one various time, not sure if it's the right word. I'm french xD). We know that the Xi follow a geometric law with parameter of 1/4. Give the estimate with the central limit theorem, of the probability that at least 20 of the 48 random variables be less than 3.

Here is my attempt of a solution :
X -> G(1/4)
p = 1/4

We know that for a geometric law E(X) = 1/p = 4 and that Var(X) = (1-p)/p^2 = 12

The central limit theorem is as follow : [X - E(X)]/sqr_root(Var(x)) -> N(0;1)
so in my case I have (X - 4)/sqr_rt(12)

I seek P(X<3)
P(X<3) = P( X-4/sqr_rt(12) < (3-4)/sqr_rt(12) ) = P(Z < - 0,28) = *** HERE IS MY PROBLEM *** I don't know how to interpret what "at least 20 of the 48 random variables" implies...

Should it be P(Z < -0,28) = 20/48 - 0,1141 (from the normal distribution table value of 0,28) ?
Or should it be something else???

2. This seems to be a Binomial problem, where you want
P(X>=20). Here X is bin(48,p)
The p is the probability that a geometric random variable is less than 3.
NOW there are two ways of defining a geo, whether we include he trial on which the first success appears or we don't count that trial. So, I need to know which geo distribution you are using.
Once you obtain the p, just do the usual normal approximation to the binomial AND you should start at 19.5,
using the correction factor.
z=(19.5-np)/(npq)^(1/2)

3. In the exercise we absolutely have to use an estimation with the central limit theorem.
Also I'm not sure but I think that in the problem we want to pick 20 out of 48 random variables. Where each variable is Xi -> G(1/4).

We don't count any trial in our geometric law as far as I know. We search xi number of trial until we get a success.

4. I am using the CLT approximation to the binomial.
The geo can be set up two different ways, as in the neg binomial.

5. Yea well I meant, we absolutely need to use the geometric law with the central limit theorem (no binomial).

We don't count any trial in our geometric law as far as I know. We search xi number of trial to get 1 success.

6. I am using a geo
p=P(geo<3)
then use the normal approximation to the binomial
with n=48 and p=P(geo WITH parameter 1/4 is less than 3).
These are different p's.

7. I feel (and might be) retarded but I don't understand .

8. I need to go to bed.
The question....
probability that at least 20 of the 48 random variables be less than 3
IS a binomial question
where a 'success' is for something to be less than 3.
HENCE p=P(that ONE geo is less than 3)
GET it, compute that event.
Then apply the normal approximation to the binomial
with n=48 and this p from your geo distribution.
You want
P(Binomial(n=48,p)>=20)
YOU approximate it with
P(Z>(19.5-48p)/(48p(1-p))^(1/2))
BUT you need to give me your geo distribution.
IN some books the geo starts at 0 in others it starts at 1.
It depends on how you define the r.v.

9. Ok it starts to make sense

For X -> G(P)
F(x_i)=P(X=x_i)=(1-p)^(x_i-1)p=pq^(x_i-1)

That's what I have for my geo, hope it answers your question. If not, suppose it starts at 1.

10. Originally Posted by Lok
Ok it starts to make sense

For X -> G(P)
P(X=x_i)=(1-p)^(x_i-1)p=pq^(x_i-1)

That's what I have for my geo, hope it answers your question. If not, suppose it starts at 1.
The capital F is incorrect, it's not a cdf.
However, you are counting the trial in which the first succes appears.
SO the p FOR the binomial is just
P(X=1)+P(X=2) where you insert a different p into the geo
THUS the p for the binomial is
p=1/4 + (1/4)(3/4).
NOW plug that into
P(Z>(19.5-48p)/(48p(1-p))^(1/2)).

By the way my students messed up a similar problem on last weeks quiz EVEN after one student put it on the board.
In that case n=100 and the p was the probability that an exponential rv was greater than it's mean of 10.
And the question there was what was the probability that at least half, 50, of the 100 .....
So we needed P(BIN(100,p)>=50)
and approximated it with
P(Z> (49.5-np)/(npq)^(1/2))
where n=100 and the p came from a different distribution, the exponential.
IN your case the p came from a geo.
BUT I needed to know how your geo was defined.
It varies from book to book, likewise so does the neg binomial.

11. I get, when using the correction factor (19.5)
P(Z>-.43643578) which you can lok (pun) up online.

12. Yea you are right the capital F was incorrect, I just brainlessly capitalized the first 'letter' of the sentence.

The 19.5, where does it come from? It is 20 (from the 20 out of 48?) - 0.5 (??)

Ok, so I did the maths it gives

(Y-21)/(189/16)^1/2 -> N(0;1)
I use Y = 19.5 as you said
I get P(Z > -0.44), when I look in my normal table, it would mean
0.17+0.50=0.67

Is a 0.67 probability right in this case? Hope so

I'll thank you properly tomorrow

13. The continuity correction helps in the approximation.
Here's a pretty picture of why you need it.
Continuity correction factor
The point is you are placing a normal denity, which is continuous,
over a discrete distribution.
The binomial rv only takes on integer values, so you want to split the difference between the sets of values you are considering and the remaining values. In your case those two sets are
{20,21,22,...,48} and {0,1,...,19}.

A good example is the approximation of
P(X=20), where X is a Bin(48, .4)
The correct value is (48 choose 20) (.4)^20 (.6)^28.
If you want to approximate it via the CLT, you need the +-.5.
Otherwise P(Normal=20)=0.