# Thread: Why is this distribution non-uniform?

1. ## Why is this distribution non-uniform?

So, here is the situation. I'm taking a string of 5 integer digits and using a formula to convert that string of digits into a real-valued number.

The basic formulation is this:

$\text{String} = abcde$

$\text{Real} = (a+0.1\times b + 0.01 \times c + 0.001 \times d) \times 10^{\frac{e}{2}-2}$

So for example...

$\text{String} = 56796$

$\text{Real} = (5+0.1\times 6 + 0.01 \times 7 + 0.001 \times 9) \times 10^{\frac{6}{2}-2} = 5.679 \times 10^{1} = 56.79$

Now, if I set up 10,000 such strings, and populate each string with 5 randomly (and uniformly) selected integers in the interval $[0,9]$, then I would expect to get a nice uniform distribution of reals on the interval $(0, 3161.96)$, which each bound in that interval corresponding to $00000$ and $99999$ respectively.

However, what I'm finding is not a uniform distribution at all. The distribution I am finding is the one that is attached as a histogram, with small numbers receiving a far larger share of the selection.

Can anybody explain why this is not showing a uniform distribution, and what can be done to make it so?

2. Let's take the best case scenario: a,b,c,d, and e are all 9.

Upon substituting these numbers in..

$9.999*10^{5/2}$

and this is your maximum value.

Now, if we take it down one notch: a,b,c,d,e are all 8. The value you get is around 900. Big step down, huh? And remember, the probability of getting all of them to be 8 is relatively low. This makes it almost inevitable that most of the values will be concentrated the the lower region of the graph.

3. Could you plot a graph:
Frequency vs lg(Number).

4. Originally Posted by rtblue
Let's take the best case scenario: a,b,c,d, and e are all 9.

Upon substituting these numbers in..

$9.999*10^{5/2}$

and this is your maximum value.

Now, if we take it down one notch: a,b,c,d,e are all 8. The value you get is around 900. Big step down, huh? And remember, the probability of getting all of them to be 8 is relatively low. This makes it almost inevitable that most of the values will be concentrated the the lower region of the graph.
Generally, I think your explanation is a bit off, because there are others ways of generating numbers between that given by 88888 and 99999, and so the large step between the two in magnitude doesn't necessarily indicate a tendency towards smaller numbers.

However, I've worked out why the distribution is non-uniform.

There are just far more ways for a small number to be expressed than for a large number. For example, the only way to get a number greater than 1,000 is for the final digit to be 9, and for the other numbers to be above a certain threshold.

By contrast, to get a number between 0 and 1, you can have a, b, c, and d be anything, and e be [0,2], or alternatively, you can have e be as high as 9, and have a, b, c, and d by low numbers, for example 00019 gives 0.316.

So generally, it is simply that the conditions necessary for very high numbers occur with far smaller frequency than with high numbers.