1. ## Calculating average

Hi,

I am confused and am hoping that someone can clarify the following situation: Let say I have a dice that has 800 sides. Let say I am throwing this dice 50 times. How many different outcomes am i to expect ( outcome : the side that has its face down -> not sure if 800 side dice has a face up). Since each outcome has the same chance to appear then i would expect 50 different sides to appear once when dice is thrown 50 times. I am probably wrong here because when i do the simulation

Code:
use strict;

my %dice_outcome;
my %dice_outcome_freq;

for (1..500000){
my %shash;
for (my $i = 0;$i<50;$i++){$shash{int(rand(800)+1)}++;
}
my $types_cnt = keys %shash;; my$freq_cnt = 0;
foreach my $key (keys %shash){$freq_cnt += $shash{$key};
}
$dice_outcome_freq{$freq_cnt/$types_cnt}++;$dice_outcome{$types_cnt}++; } foreach my$key( sort{$a<=>$b}keys %dice_outcome){
print "$key\t$dice_outcome{$key}\t".($dice_outcome{$key}/50000)."\n"; } print "\n\n"; foreach my$key( sort{$a<=>$b}keys %dice_outcome_freq){
print "$key\t$dice_outcome_freq{$key}\t".($dice_outcome_freq{\$key}/50000)."\n";
}

Result:

Distribution of different outcomes:
40    1    2e-05
41    3    6e-05
42    18    0.00036
43    184    0.00368
44    1178    0.02356
45    5715    0.1143
46    22052    0.44104
47    64227    1.28454
48    131513    2.63026
49    170835    3.4167
50    104274    2.08548

Distribution of frequencies of diffrent outcomes:
1    104274    2.08548
1.02040816326531    170835    3.4167
1.04166666666667    131513    2.63026
1.06382978723404    64227    1.28454
1.08695652173913    22052    0.44104
1.11111111111111    5715    0.1143
1.13636363636364    1178    0.02356
1.16279069767442    184    0.00368
1.19047619047619    18    0.00036
1.21951219512195    3    6e-05
1.25    1    2e-05
i do not get this uniform distribution. Is this because the central limit theorem. and how would i calculate the average number of outcomes without the simulation having only the input values:

Number of possible outcomes: 800
Number of throws: 50
----------------------------------------------------
What is the average number of different outcomes?
What is the average frequency of the average number of different outcomes?

Thank you.

3. ## Re: Calculating average

hm.. yep that is true but i am lookint for the number of different outcome types. Let me scale ths proble down to a 6 outcome die. Let say i want to know the average number of different outcomes given i throw a die 4 times. So i have different posibilities:

I can get :

1,2,3,4 -> [tex] \binom{6}{4} \times 4! [tex] -> the number of ways i can get 4 different permutrd outcomes
1,1,2,3 -> [tex] \binom{6}{3} \times \binom{4}{2} \times 2! [tex]
1,1,2,2 -> [tex] \binom{6}{2} \times \binom{4}{2} \times 1! [tex]
1,1,1,2 -> [tex] \binom{6}{2} \times \binom{4}{3} \times 1! [tex]

so actually what i am looking for is a genetar expresion for the above series ? Notice that when you allow larger alphabets and larger tuples there is higher number of subtuples that need to be considered.

(or maybe you are giving me the solution but i am not seeing it)

4. ## Re: Calculating average

You have an alphabet of N symbols each with a probability of 1/N of occurring.

You want to make K independent selections where in general K < N, and usually K << N

I guess you want to find the expectation of the number of unique symbols, say M, appear in this selection.

let's take a look

for M=1, i.e. you chose the same symbol K times

$$Pr[K]=\left(\frac{1}{N}\right)^K$$

and there are N ways of doing this so the overall probability is

$$N\left(\frac{1}{N}\right)^K=\left(\frac{1}{N} \right)^{K-1}$$

for M=2, i.e. symbol1 appears L times, and symbol2 appears K-L times

$$Pr[L,K-L]=\left(\frac{1}{N} \right)^L\left(\frac{1}{N} \right)^{K-L}$$

and this can happen N(N-1) different ways

and so forth, at M=Q you have to account for the probability of all the various ways you can choose Q unique ways from K selections and then multiply that by the number of ways you can choose Q distinct symbols from N.

I'm pretty sure this is all just the multinomial distribution like I gave you a link for.