# Creating groupings over a predefined population while avoiding small intersections

• Jun 13th 2012, 05:13 AM
Alex101
Creating groupings over a predefined population while avoiding small intersections
Hello, I am a Newbie in MHF, today I am posting a problem I have been running around with for some weeks. I hope to adhere to the forum's etiquette when I am posting as a Newbie. In any case, thank you very much for any help in advance. Best, Alex101

Consider a country with 120.000 individuals. We want to create groupings (=cohorts) of individuals. These are the detailed characteristics and constraints:

1. There will be a multitude of groupings. In theory, and the solution should consider this, there will be an unlimited number of groupings. In reality, and it will be appreciated to have a solution for this, there will be not more than 50 different groupings.
2. The groupings are created independently from each other, at least one group contains all individuals.
3. Each cohort of individuals will have at least 3 individuals. For each cohort we will know the total gross salary. The term “salary” is a placeholder for any sensitive, personal data that can be subject of additive, subtractive, or multiplicative algebraic operations. This data fact is hereafter referred to as “data”.
4. Individuals in any cohort are not necessarily geographically close, i.e. an individual from the southernmost location of the country can be grouped with individuals from the northernmost location of the country.
5. We know from each individual the geographic coordinates.

The problem for which a solution is needed: Create an algorithm that checks all possible combinations of involved structures and cohorts to prevent the identification of one individuals’ data, given the multitude of structures and cohorts as described above.
The complexity of the problem arises (or at least seems to arise) from the combinations of existing groupings with potential (re-)combinations of others. The assumption is formulated that the number of combinations between groupings increases exponentially as the number of groupings increases. To further illustrate the problem, consider in the easiest case these two different groupings:

• Grouping 1 contains the individuals 1, 2, 3, and 4 and the data attached to this cohort is €17.150.
• Grouping 2 contains the individuals 1, 2, and 3 and the data attached to this cohort is €8.250.

• By subtracting the data of grouping 2 from the data of grouping 1 we disclose the data of individual 4.

Consider another example:

• Grouping 1 contains the individuals 1, 2, 3, and 4 and the data attached to this cohort is €17.150.
• Grouping 2 contains the individuals 5, 6, and 7 and the data attached to this cohort is €14.200.
• Grouping 3 contains the individuals 1, 2, 3, 4, 5, 6, 7, 8 and the data attached to this cohort is €32.050.

• By subtracting the sum of data of groupings 1 and 2 from the data of grouping 3 we disclose the data of individual 8.
• Jun 13th 2012, 09:38 AM
mfb
Re: Creating groupings over a predefined population while avoiding small intersection
You can express all groups as linear equations, the grouping 1 in the first example would give 1*x1+1*x2+1*x3+1*x4=17.150, where xi is the income of person i.
This can be expressed as a single equation Ax=b, where A is a matrix with just 0 and 1 (each line encodes the group members of one group) and b are the total incomes of the corresponding groups.
There are standard methods (like Gauß) to try to solve those systems, and if one or more values in x can be gained, I think these algorithms can identify that with a polynomial complexity.
• Jun 14th 2012, 10:31 PM
Alex101
Re: Creating groupings over a predefined population while avoiding small intersection
Thank you for your re-formulation of the problem. It has been some time since I solved linear equation systems.
One further clarification would be great:
The matrix formulated would be of substantial size, 120.000 individuals, we will have > 14.000 groupings. A solution algorithm should recombine the equations to assess rank and structure of the matrix, if I recall correctly. Our main problem was that the re-combinations of grouping, i.e. potential groupings like sums of some groupings to be substracted from all others, was the driver of the complexity. So, do you think that the solution algorithm will be somehow simulating this re-combination problem?

I am writing this without having submerging into the suggested algorithms yet, I will do so next week. In any case thank you very much in advance for your idea.

Best, Alex101
• Jun 15th 2012, 05:50 AM
mfb
Re: Creating groupings over a predefined population while avoiding small intersection
"not more than 50 different groupings" <-> "we will have > 14.000 groupings"?
If I did not make an error, you need at most O(n*m*min(n,m)) steps to simplify the matrix, where n is the number of individuals and m is the number of groups: Each iteration removes at least one individual and one group from the remaining problem. After simplification, it is easy to evaluate if any individual can be identified.

With 120000 and 14000, this would be ~c*10^13 steps. A lot, but all steps are simple additions or multiplications. A good algorithm might even cut this by one or two orders of magnitude, depending on your matrix.