# Thread: sampling from a population

1. ## sampling from a population

i have a population of items and i calculate how similar each item is to the rest of the items in the population. i store these values in a symmetric nxn similarity matrix. when i look at the distribution of scores, they follow a beta distribution -- most similarity scores are close to 0 (the items are dissimilar) and then they tail off and fewer and fewer are closer to 1 (the items are very similar).

i want to generate samples from this distribution -- i want to group items together whose distribution of scores is also beta, but only with a lot of the mass centered around the mean and tailing off on both ends (like a hump). these groupings of beta distributions should (i'm pretty sure) naturally form in my data, so my question is: is there a method to sample from my population that gives me a smaller population with a given distribution? any starting point would help...thanks!

2. Originally Posted by hoffmann
i have a population of items and i calculate how similar each item is to the rest of the items in the population. i store these values in a symmetric nxn similarity matrix. when i look at the distribution of scores, they follow a beta distribution -- most similarity scores are close to 0 (the items are dissimilar) and then they tail off and fewer and fewer are closer to 1 (the items are very similar).

i want to generate samples from this distribution
This seems OK up to here, you want to generate samples from a beta distribution (I would use whatever function is provided by the software you are using for this. That would be betarnd(..) for matlab and octave)

[
-- i want to group items together whose distribution of scores is also beta, but only with a lot of the mass centered around the mean and tailing off on both ends (like a hump). these groupings of beta distributions should (i'm pretty sure) naturally form in my data, so my question is: is there a method to sample from my population that gives me a smaller population with a given distribution? any starting point would help...thanks!
Now you have lost me

CB

3. ok so the first part makes sense, that's good. now i'll explain the second part:

i want to find groups in my data that follow a certain distribution. i don't really want to generate "random" samples, but samples whose scores are beta distributed too. this isn't really a clustering problem but a slight variant...i need to traverse the entire space of my data in order to create groups of objects that also have a characteristic distribution. i've done some reading on this, and this type of approach is called a markov chain monte carlo (MCMC) approach. hope this makes a little more sense...thanks.