# Thread: from linguistics, a problem of statistics..

1. ## from linguistics, a problem of statistics..

Hello everybody!

I am a linguist and I came across a linguistic problem connected with statistics. I've already solved it by "brute forse", with a little program calculating empirically the results, but a mathematical approximation could be useful..

The problem can be expressed as follow:

imagine to have 2 boxes.

In the first box you have the following elements : A, A, B, B, B , C
In the second one you have : A, A, A, B, C, C
note: the elements in the two boxes are not equiprobable.

Now assume to take 2 elements from the box 1 (putting back the first element you took so that when you will take the second element you will still have all the original elements in the box, and considering the order of extraction), and 2 elements from the box 2.
Imagine to do that 10 times for the box 1, and 5 times for the box 2.
My question is: what is the probability to have 1,2,3,..,n matches among the 10 couples of elements taken from the box one, and the 5 taken from the box 2?

I can experimentally approximate it, but I would like to find a formula to determinate it. It seems to me that the result could be approximated with a Poisson density function, but I don't understand how to calculate the lambda..

2. ## Re: from linguistics, a problem of statistics..

Originally Posted by Mathoman1
I am a linguist and I came across a linguistic problem connected with statistics.
The problem can be expressed as follow:
imagine to have 2 boxes.
In the first box you have the following elements : A, A, B, B, B , C
In the second one you have : A, A, A, B, C, C
note: the elements in the two boxes are not equiprobable.
Now assume to take 2 elements from the box 1 (putting back the first element you took so that when you will take the second element you will still have all the original elements in the box, and considering the order of extraction), and 2 elements from the box 2.
Imagine to do that 10 times for the box 1, and 5 times for the box 2.
My question is: what is the probability to have 1,2,3,..,n matches among the 10 couples of elements taken from the box one, and the 5 taken from the box 2?.
This is what I understand, one takes a letter from box I, notes its value, puts it back and repeat. Thus we have a pair of letters.
Then repeat that process ten times. Thus the outcome space is a set of ten pairs of letters.
Actually this a simple binomial distribution.
$\displaystyle \mathcal{P}(AA)=\frac{4}{36}$: the probability of two A's.
$\displaystyle \mathcal{P}(BB)=\frac{9}{36}$: the probability of two B's
$\displaystyle \mathcal{P}(CC)=\frac{1}{36}$: the probability of two C's.

Thus the probability getting two of a kind on any one turn is $\displaystyle \frac{14}{36}$.
Let $\displaystyle \mathcal{P}(X=n)$ stand for the probability of getting $\displaystyle n$ matching pairs in ten turns.
Then $\displaystyle \mathcal{P}(X=n)=\binom{10}{n}\left(\frac{14}{36} \right)^n \left(\frac{22}{36}\right)^{10-n}$.
That is for box I.

For box II. change 10 to 5.

Did I read the question correctly?

3. ## Re: from linguistics, a problem of statistics..

Originally Posted by Plato

Did I read the question correctly?
eh... it is not so simple...

the couples I can get from the two boxes are not only AA, BB, CC, but also the mixed ones, that is AB, BC, CA, etc..

And I am not interested in the matches between the couples from the same box, but in the matches between the couples from the two boxes.

For exemple, the samples from box 1 could be:

AB, BC, BC, BB, CA, BA, BC, AA, BB, CB

ant those from box 2:

BB, BC, CC, AC, AB

total number of matches: 5

box1 AB, AB = box 2 AB
box1 BC = box 2 BC
box1 BB, BB box 2 BB

4. ## Re: from linguistics, a problem of statistics..

Originally Posted by Mathoman1
For exemple, the samples from box 1 could be:

AB, BC, BC, BB, CA, BA, BC, AA, BB, CB

ant those from box 2:

BB, BC, CC, AC, AB

total number of matches: 5

box1 AB, AB = box 2 AB
box1 BC = box 2 BC
box1 BB, BB box 2 BB
Well it took me sometime to understand where you were getting that 5.
I do not see a model for this right off.
$\displaystyle 9^{10}$ possible outcomes for box I.
$\displaystyle 9^{5}$ possible outcomes for box II.
I don't know how to count the possible matches.
Moreover, the matches do not have the same probabilities.

5. ## Re: from linguistics, a problem of statistics..

Originally Posted by Plato
Well it took me sometime to understand where you were getting that 5.
I did a mistake, the 5 matches are:

box1 AB = box 2 AB
box1 BC, BC = box 2 BC
box1 BB, BB = box 2 BB

it is BC which appears two times in box1, not AB
sorry

Originally Posted by Plato
I don't know how to count the possible matches.
Moreover, the matches do not have the same probabilities.
that is my problem..

but do you think that it could be possible to calculate them, or it is practically impossible and therefore it is a problem that can be solved only with a computer simulation?

6. ## Re: from linguistics, a problem of statistics..

$\displaystyle \begin{array}{*{20}c} {} &\vline & {AA} &\vline & {AB} &\vline & {AC} &\vline & {BB} &\vline & {BA} &\vline & {BC} &\vline & {CC} &\vline & {CA} &\vline & {CB} \\\hline I &\vline & 4 &\vline & 6 &\vline & 2 &\vline & 9 &\vline & 6 &\vline & 3 &\vline & 1 &\vline & 2 &\vline & 3 \\\hline {II} &\vline & 9 &\vline & 3 &\vline & 6 &\vline & 1 &\vline & 3&\vline & 2 &\vline & 4 &\vline & 6 &\vline & 2 \\ \end{array}$
From that table we can we can find the probability of each.
$\displaystyle \mathcal{P}(AC)=\frac{2}{36}$ from box I
but $\displaystyle \mathcal{P}(AC)=\frac{6}{36}$ from box II.
I think a good programmer write code to check all $\displaystyle 9^{10}\times 9^5$ possible outcomes. That is beyond any thing I have ever done.