Let us suppose that we have 600 sentences written down in order, one below the other, on a table designated as TABLE A.
We write down the same sentences, in the same order, on a second table designated as TABLE B.
On TABLE A we highlite with red colour all sentences that have grammar errors.
Let us suppose that exactly 15 sentences have grammar errors, so that they are now red highlited.
On TABLE B we choose at random 200 sentences and we highlite with red all sentences that have grammar errors.
In other words, we make red adnotations on a random subset of the initial set of 600 sentences.
Question: what is the probability that the two tables are coloured the same way?
In other words: what is the probability to have the same amount and position of red sentences on the 2 tables, despite the fact that TABLE B was adnotated on a random subset of 1/3 of the initial set.