Originally Posted by

**cl85** If I interpret the question correctly, we are removing the balls without replacement. So the pairs do not have the same distribution since the probability of picking a pair of red balls changes with each removal.

The balls are indeed removed without replacement, and that's why the pairs are strongly correlated. However, the pairs **are** identically distributed. This may seem paradoxal, and I have noticed this is a common probabilistic misconception in everyday life.

----

A few months ago, while playing Scrabble with my parents, I came upon the following situation. There were still many tiles (letters) in the bag. My mother had just played, but she forgot to take new tiles from the bag. Then my father played, and took new tiles. My mother interrupted him: "Wait, I should have picked my tiles before you. Please put your new tiles back, so we come back to the previous situation and I can pick my tiles fairly."

I protested, claiming it was the complete opposite: it would not have mattered** at all **if my mother had taken her tiles after my father or before (provided there are sufficiently many tiles in the bag). While, since my father had seen his new tiles, the distribution was about to be altered if he put them back in the bag. My mother protested: "But there were *more* tiles before, it can't be the same. For instance there may be no "E" anymore while there could be a few before."

Yet it **is** the same. And this is *exactly *the same problem here with the pairs of balls. Btw, my mother's protest matches yours: "There are less red balls at the end"... Right. But only *conditionally to the previous pairs*!! Otherwise, there is no difference, paradoxically.

----

First, an example. Take the balls one by one from the box. What is the probability that the last ball is red? It *seems *smaller than $\displaystyle \frac{r}{2n}$... Let's compute it elementarily. It is the probability that among the first $\displaystyle 2n-1$ picked balls, there were $\displaystyle r-1$ red balls (and all the others), hence it is:

$\displaystyle \frac{{r\choose r-1}{2n-1\choose 2n-1}}{{2n\choose 2n-1}}=\frac{r}{2n}.$

Exactly like for the first ball.

----

Consider the set $\displaystyle E=\{1,\ldots,n\}$, where $\displaystyle n\ge p+q$. Pick, without replacement, a random subset $\displaystyle A$ of $\displaystyle p$ elements (uniformly), and then a random subset $\displaystyle B$ of $\displaystyle q$ elements.

Claim: the distribution of $\displaystyle (A,B)$ is unchanged if we pick $\displaystyle B$ *before* $\displaystyle A$.

What about a computational proof to get convinced (of how natural this is)? Let $\displaystyle x_1,\ldots,x_p,y_1,\ldots,y_q$ be distinct elements of $\displaystyle E$. Then:

$\displaystyle P(A=\{x_1,\ldots,x_p\},B=\{y_1,\ldots,y_q\}) = \frac{1}{{n\choose p}}\frac{1}{{n-p\choose q}}=\frac{p!q!(n-p-q)!}{n!}$

is obviously independent of the order of the choices: it is symmetric in $\displaystyle p$ and $\displaystyle q$.

I let you derive a general case with $\displaystyle k$ random disjoint subsets; any order gives the same distribution. In particular, the order of the choices of the pairs in the initial problem is irrelevant for their distribution, hence they have same distribution.