How to get a probability space of repeated numbers?

If you have a set of numbers D = {x1, x2, x3, ..., xn}, you can determine if there is any repeated numbers in D using an algorithm using a direct access table (hashing).

But how do you determine a probability space when deriving the expected running time?

For the probability space, are we basically saying what is the probability that a number in D is repeated, but how do you get a probability from this?

Can anyone help please?

I am thinking, that for each number xi in D, you have to check if it is the same number as all the other ones in D, so there is n-1 other numbers in D.

Re: How to get a probability space of repeated numbers?

Hey Sneaky.

With regards to your question, you have to make an assumption with regard to the probability.

Typically we do this in a couple of ways depending on the problem.

One way is to make mathematical assumptions and then derive the PDF (probability function) of the distribution. This is done with things like Binomial, Poisson, and other similar distributions.

The other way is to look at a distribution based on its fit to some model. We do this by either forcing a distribution to have a certain structure (like Normal, Chi-square, Uniform) or we can use what is called an empirical distribution which is just a fancy way of using the actual data from an actual experiment/process/etc and plotting a nice frequency histogram and normalizing it.

If you want to assume pure randomness then use a uniform distribution since it has the highest entropy, it gives the best model of randomness provided that all realizations are independent from the other ones.