1. ## Group Assortment Measure

Hi
I need a formula that returns a value representative of the amount of ‘assortment’ a group shows. The groups are made up of individuals, all of a binary class (e.g. male or female), are of difference sizes, and can be from different populations (i.e. different ratio of males to females). I have thought of the logical rules and examples for this, but am having difficulty formulising it properly, despite extensive attempts using binomial probabilities. I think the best way to explain is give some examples, of some groups, and which would rank the highest in ‘assortment’:

e.g. In a population with equal ratio of males:females

GROUP-A = 1Male & 1Female
GROUP-B = 2M & 0F
G-C = 0M & 2F
G-A is the most ‘dissassorted’ whilst G-B and G-C are equally assorted

G-D = 5M & 0F
G-D is more assorted than both G-B, and G-C, as the probability of getting 5 males in a group of 5 is much lower than getting 2 in a group of 2

Now, consider some groups from a population of with 9 males to each female
G-E = 5M & 0F
G-F = 5M & 5F

G-E demonstrates less assortment that G-D, as chances of getting 5M 0F is much higher when chance of male occurrence is 0.9 (i.e. 9:1 M:F)
G-G demonstrates much more ‘assortment’ than G-F (or G-B or G-C), as the chances of getting 5F at with 0.1 chance of getting each female (even in a group of 10 individuals), is very low.

Therefore, a need a measure that would give a value of assortment for any given group, and would make sense that the more ‘assorted’ a group is, and the reduced likelihood of getting it, the higher the value is.
I have tried lots of things with binomial probabilities, and one of the main problems with my best attempts is that a group with no actual assortment (e.g. 1M & 1F) could score higher than a group which potentially displays assortment (e.g. 2M & 0F) if , for example, the chance of a female occurring is very low.

2. ## Re: Group Assortment Measure

What you are looking for is called entropy.

It's a measure of the "randomness" of a distribution.

It's basically the expected value of the natural log of your random variable.

For example with a binomial distribution as you have the entropy is given by

$S = \displaystyle{\sum_{k=0}^N}\ln(k)\begin{pmatrix}N \\ k \end{pmatrix}p^k (1-p)^{N-k}$

where $p$ is the probability of being male (or female if you like).

If $p=0.5$ then

$S = \displaystyle{\sum_{k=0}^N}\ln(k)\begin{pmatrix}N \\ k \end{pmatrix}(0.5)^N$

Entropy (information theory) - Wikipedia, the free encyclopedia

3. ## Re: Group Assortment Measure

Hi

Thank you very much for your reply, it is much appreciated. However, I do not fully understand how the S value would provide a comparable measure of assortment for each group, so that they could be compared? Also, does not ln(0) yield -Infinity? Any more help with this, i.e. how to get individual scores of 'assortment' for each group so that they can be compared, would be great.

Thanks again,

Josh

4. ## Re: Group Assortment Measure

Originally Posted by josh1
Hi

Thank you very much for your reply, it is much appreciated. However, I do not fully understand how the S value would provide a comparable measure of assortment for each group, so that they could be compared? Also, does not ln(0) yield -Infinity? Any more help with this, i.e. how to get individual scores of 'assortment' for each group so that they can be compared, would be great.

Thanks again,

Josh
I messed up. It's been a while

$Pr[k] = p_k$

then

$S=\displaystyle{\sum_{k=0}^N} \ln(p_k)p_k$

5. ## Re: Group Assortment Measure

Originally Posted by josh1
Hi

Thank you very much for your reply, it is much appreciated. However, I do not fully understand how the S value would provide a comparable measure of assortment for each group, so that they could be compared? Also, does not ln(0) yield -Infinity? Any more help with this, i.e. how to get individual scores of 'assortment' for each group so that they can be compared, would be great.

Thanks again,

Josh
I can think of two measures that might fit your bill. I suspect that for equal probability of male/female the two are equivalent.

The first is simply the probability of the group. As you know the expected value will be that a group has equal numbers of males and females. The more "assorted" your group is the less likely it is. (I think)

You can find this value for a given group of say $N$ people composed of $k$ males by

$p=\begin{pmatrix}N \\ k\end{pmatrix}\left(\dfrac 1 2\right)^N$

for $Pr[\mbox{male}]=Pr[\mbox{female}]=\dfrac 1 2$

since the $\left(\dfrac 1 2\right)^N$ is constant for all p you can use just $\begin{pmatrix}N \\ k \end{pmatrix}$ as an equivalent measure.

Another simpler measure is simply

$p=|\#males - \#females|$

$p=0$ indicates a very unassorted group while larger values of $p$ indicate larger amounts of disassortedness.

That's about all I can give you w/o a more specific definition of "assortedness"

6. ## Re: Group Assortment Measure

Hi romsek:

Originally Posted by romsek
I can think of two measures that might fit your bill. I suspect that for equal probability of male/female the two are equivalent.
Unfortunately, I need a measure which also works when there is not an equal probability of male/female

The first is simply the probability of the group. As you know the expected value will be that a group has equal numbers of males and females. The more "assorted" your group is the less likely it is. (I think)
It is not always the case that the 'expected' value is that a group has equal M & F. For example, in a population of 90M 10F, the 'expected' for a group would be 9:1.

You can find this value for a given group of say $N$ people composed of $k$ males by

$p=\begin{pmatrix}N \\ k\end{pmatrix}\left(\dfrac 1 2\right)^N$

for $Pr[\mbox{male}]=Pr[\mbox{female}]=\dfrac 1 2$

since the $\left(\dfrac 1 2\right)^N$ is constant for all p you can use just $\begin{pmatrix}N \\ k \end{pmatrix}$ as an equivalent measure.
This was also my initial thought, and works very well given a set group size and 50/50 chance of m or f. However, it does not logically score assortment across group sizes
e.g. group of 5M 5F is classed as much 'more assorted' (i.e. lower value) than 1M 1F yet they are both balanced, and, if anything, 5M 5F is more 'dissassorted' as there is a lower probability than this balance would arise by chance alone
Also, 5M 5F would be classed as 'more assorted' than 2M 0F, yet obviously 2M 0F is much more assorted than a perfectly balanced group (when the pop ratio is 1:1)

Another simpler measure is simply

$p=|\#males - \#females|$

$p=0$ indicates a very unassorted group while larger values of $p$ indicate larger amounts of disassortedness.
I am not entirely sure in what context you are using '#', yet I think this would have difficulties across group sizes e.g. 4M 0F group classed as less assorted than 105M 100F group, when in fact 4M 0F is showing more assortment

That's about all I can give you w/o a more specific definition of "assortedness"
Yes, thank you for your suggestions given my undefined 'assortedness'. Although difficult to phrase exactly without an exact measure devised yet, I can give you my best attempt:
Group Assortment: The extent to which a group is non-randomly biased towards a particular class, where non-random considers the group size and the sampled population composition

I know this does not allow immediate or easy translation into a formula, however thanks for the suggestions so far and its clear to see you are thinking along the same lines as I was getting at, which is encouraging, if you would like to give some examples of different 'groups' which may be fundamental in understanding my definition of assortment, I would be happy to 'rank' them in the 'assortment' value.

7. ## Re: Group Assortment Measure

Originally Posted by josh1
Hi romsek:

Unfortunately, I need a measure which also works when there is not an equal probability of male/female
Incorporating a probability of male or female is easy enough. The formula is modified to become

$Pr[\mbox{k males}]=\begin{pmatrix}N \\ k\end{pmatrix}p^k (1-p)^{N-k}$

where $p$ is either the probability of choosing a male

8. ## Re: Group Assortment Measure

Originally Posted by romsek
Incorporating a probability of male or female is easy enough. The formula is modified to become

$Pr[\mbox{k males}]=\begin{pmatrix}N \\ k\end{pmatrix}p^k (1-p)^{N-k}$

where $p$ is either the probability of choosing a male
Yes but then we again have the same problem:

This was also my initial thought, and works very well given a set group size and 50/50 chance of m or f. However, it does not logically score assortment across group sizes
e.g. group of 5M 5F is classed as much 'more assorted' (i.e. lower value) than 1M 1F yet they are both balanced, and, if anything, 5M 5F is more 'dissassorted' as there is a lower probability than this balance would arise by chance alone
Also, 5M 5F would be classed as 'more assorted' than 2M 0F, yet obviously 2M 0F is much more assorted than a perfectly balanced group (when the pop ratio is 1:1)

9. ## Re: Group Assortment Measure

Initially I was wanting the assortment of the actual groups too, to use these values in later models. However, yes, I also need the assortment propensity of the sampled population, in terms of the groups that are appearing.
Also, a final solution allow the propensity to differ between men and women in the same population, but I think we should stay away from that for now (unless it actually makes things simpler)

Due to the apparent difficulty of achieving a single measure of assortment for each group, lets change the goal slightly, heres an analogy (excuse the childishness):
A college canteen has mutiple groups come to it throughout year 1, I would like to calculate the amount of assortment these groups show (i.e. know if they show that male are with male and female are with female) and compare this to the amount of assortment groups shown in year 2, 3, 4 etc. And essential, in the end say, the groups in year 1 show assortment bias than groups in year t, and therefore the population is more assorted.
Any ideas on how to do this? prehaps combining binomial probabilities over all the groups in some way? It still needs to be the case that e.g. a group of 0M 10F would represent more assortment than 0M 2F, but prehaps we can assume a 50/50 ratio if that makes things more possible.