Hello,
my name is Alex, I am 35 years old, engineer and econometrician, from Germany. I have problem that I have discussed with several friends and colleagues. See the following description of the problem:
Consider a country with 120.000 individuals. We want to create groupings (=cohorts) of individuals. These are the detailed characteristics and constraints:
- There will be a multitude of groupings. In theory, and the solution should consider this, there will be an unlimited number of groupings. In reality, and it will be appreciated to have a solution for this, there will be not more than 50 different groupings.
- The groupings are created independently from each other, at least one group contains all individuals.
- Each cohort of individuals will have at least 3 individuals. For each cohort we will know the total gross salary. The term “salary” is a placeholder for any sensitive, personal data that can be subject of additive, subtractive, or multiplicative algebraic operations. This data fact is hereafter referred to as “data”.
- Individuals in any cohort are not necessarily geographically close, i.e. an individual from the southernmost location of the country can be grouped with individuals from the northernmost location of the country.
- We know from each individual the geographic coordinates.
The problem for which a solution is needed: Create an algorithm that checks all possible combinations of involved structures and cohorts to prevent the identification of one individuals’ data, given the multitude of structures and cohorts as described above.
The complexity of the problem arises (or at least seems to arise) from the combinations of existing groupings with potential (re-)combinations of others. The assumption is formulated that the number of combinations between groupings increases exponentially as the number of groupings increases. To further illustrate the problem, consider in the easiest case these two different groupings:
- Grouping 1 contains the individuals 1, 2, 3, and 4 and the data attached to this cohort is €17.150.
- Grouping 2 contains the individuals 1, 2, and 3 and the data attached to this cohort is €8.250.
- By subtracting the data of grouping 2 from the data of grouping 1 we disclose the data of individual 4.
Consider another example:
- Grouping 1 contains the individuals 1, 2, 3, and 4 and the data attached to this cohort is €17.150.
- Grouping 2 contains the individuals 5, 6, and 7 and the data attached to this cohort is €14.200.
- Grouping 3 contains the individuals 1, 2, 3, 4, 5, 6, 7, 8 and the data attached to this cohort is €32.050.
- By subtracting the sum of data of groupings 1 and 2 from the data of grouping 3 we disclose the data of individual 8.
Any help on theoretical approaches are welcome! Thanks in advance, greetings from Germany! Alex.