# Thread: Inclusion-Exclusion - Maximizing Probabilities

1. ## Inclusion-Exclusion - Maximizing Probabilities

Let $E_1, E_2, E_3$ be three events, where $Pr[E_1]=p_1, Pr[E_2]=p_2, Pr[E_3]=p_3$. Also, $p_1 + p_2 + p_3 = 1$. Using the inclusion-exclusion principle, $Pr[E_1 \cup E_2 \cup E_3] = p_1 + p_2 + p_3 - p_1p_2 - p_1p_3 - p_2p_3 + p_1p_2p_3 = f(p_1, p_2, p_3).$

Thus, I would like to prove that $\max f(p_1,p_2,p_3)$ subject to $p_1 + p_2 + p_3 = 1$ has solution $p_1 = p_2 = p_3 = 1/3$. In order to do this I used Lagrange multipliers, and after a few manipulations on the equations, I come up with this answer.

THE QUESTION IS: how can I generalize this for $n$ events? More specifically, I want to prove that $\max Pr[E_1 \cup \ldots \cup E_n]$ subject to $p_1 + \ldots + p_n = 1$ has solution $p_1 = \ldots = p_n = 1/n$.

PS: Some directions to solve this problem would be useful too (or a reference where it was already solved).....

2. ## Re: Inclusion-Exclusion - Maximizing Probabilities

Hey andre.vignatti.

Is there a way to get a factorization of this function with respect to a particular variable?

What I mean by this is basically not to get a unique global factorization where you have products of terms, but if you can have a function to generate a factorization based on a particular variable (like factorizing with respect to p1 for example)?

The reason why I suggest this is because I suspect that factorizing with respect to a specific variable should always give the same form regardless of the variable.

As an example with three probabilities you have p1 + p2 + p3 - p1p2 - p1p3 - p2p3 + p1p2p3 gives with respect to the three variables:

p1(1 - p2 - p3 + p2p3) + p2 + p3 - p2p3
p2(1 - p1 - p3 + p1p3) + p2 + p3 - p1p3
p3(1 - p1 - p2 + p1p2) + p1 + p2 - p1p2

This helps immensely in the Lagrange multiplier process when it comes to doing derivations for n-variables. If you can show that the form is the same for every kind of factorization, then when you use the Lagrange technique, you should get a bunch of these terms that are all similar.

3. ## Re: Inclusion-Exclusion - Maximizing Probabilities

Hello Chiro! Thanks for the answer. In fact, I already try some factorization like you sugest. I wonder if the factorization that you proposed is really useful in solving the three probabilites case.... I could not understand why, if I use the factorization proposed, could I obtain that $p_1 = p_2 = p_3$.

So, let me explain how I obtained the three probabilities answer, and then I will ask a easier question that actually can solve my problem.

====================================

We want to maximize

$f(p_1,p_2,p_3) = p_1 + p_2 + p_3 - p_1p_2 - p_1p_3 - p_2p_3 + p_1p_2p_3$

subject to

$p_1+p_2+p_3=1 \iff p_1+p_2+p_3-1=0 \implies g(p_1,p_2,p_3)=p_1+p_2+p_3-1$

Using the Lagrange multipliers, we get

$\nabla f(p_1,p_2,p_3)=\lambda \nabla g(p_1,p_2,p_3)$

So we get the following system (*):

$\lambda = 1 - p_2 -p_3+p_2p_3 \\ \lambda = 1 - p_1 -p_3+p_1p_3 \\ \lambda = 1 - p_1 -p_2+p_1p_2 \\ 1 = p_1+p_2+p_3$

Using the fact that $1 = p_1+p_2+p_3$ in the top three rows, we get:

$\lambda = p_1 + p_2p_3 \\ \lambda = p_2 + p_1p_3 \\ \lambda = p_3 + p_1p_2 \\ 1 = p_1+p_2+p_3$

Multiplying $p_i$ on the $i$-th row:

$\lambda p_1 = p_1^2 + p_1p_2p_3 \\ \lambda p_2 = p_2^2 + p_1p_2p_3 \\ \lambda p_3 = p_3^2 + p_1p_2p_3 \\ 1 = p_1+p_2+p_3$

Isolate $p_1p_2p_3$ we conclude that (*) $\lambda p_1 - p_1^2 = \lambda p_2 - p_2^2 = \lambda p_3 - p_3^2$. But

$\lambda x - x^2 = \lambda y - y^2 \\ \iff \lambda x - \lambda y = x^2 - y^2 \\ \iff \lambda (x - y) = (x+y)(x-y) \\ \iff \lambda = x+y$

So, for each pair of the equalities (*), we have

$\lambda = p_1 + p_2 \\ \lambda = p_1 + p_3 \\ \lambda = p_2 + p_3$

Combining these rows, THE ONLY SOLUTION IS $p_1 = p_2 = p_3$, and because $1 = p_1 + p_2 + p_3$, we conclude that $p_1 = p_2 = p_3 = 1/3$

====================================

In the first system obtained by the Lagrange multipliers (see *), one possible solution for the system is setting $p_1 = p_2 = p_3$. This clearly solves the system, as each row must be equal to the other (except for the last row). Therefore, there is no reason to carry the calculations until the end; the only reason would be to conclude that the system has a unique solution.

So, in my attempt to generalize the three probabilities case to $n$ probabilites, I think I could easily argue that the solution $p_1 = p_2 = \ldots = p_n$ is a feasible solution to the system, although I cannot easily argue that it is the ONLY solution.

So finally, MY QUESTION IS: If one solution (maybe not unique) to the Lagrange multipliers system is obtained, can I assume that this solution would maximize the original objective function???

4. ## Re: Inclusion-Exclusion - Maximizing Probabilities

To try and answer your question mathematically, one attempt you could do is that if your function can be written in terms of

p1*g(p2,...,pn) + h(p2,....,pn) = 0
p2*g(p1,...,pn) + h(p1,....,pn) = 0

and so on, then when you do the differentiation with respect to that variable you get say for d/dp1:

g'(p2,....,pn) = 0.

Now if these functions are of the same form then you can collect, simplify and see what happens.

I don't want to say yes to your answer even though it seems likely that its true.

5. ## Re: Inclusion-Exclusion - Maximizing Probabilities

Originally Posted by andre.vignatti
Let $E_1, E_2, E_3$ be three events, where $Pr[E_1]=p_1, Pr[E_2]=p_2, Pr[E_3]=p_3$. Also, $p_1 + p_2 + p_3 = 1$. Using the inclusion-exclusion principle, $Pr[E_1 \cup E_2 \cup E_3] = p_1 + p_2 + p_3 - p_1p_2 - p_1p_3 - p_2p_3 + p_1p_2p_3 = f(p_1, p_2, p_3).$

Thus, I would like to prove that $\max f(p_1,p_2,p_3)$ subject to $p_1 + p_2 + p_3 = 1$ has solution $p_1 = p_2 = p_3 = 1/3$. In order to do this I used Lagrange multipliers, and after a few manipulations on the equations, I come up with this answer.

THE QUESTION IS: how can I generalize this for $n$ events? More specifically, I want to prove that $\max Pr[E_1 \cup \ldots \cup E_n]$ subject to $p_1 + \ldots + p_n = 1$ has solution $p_1 = \ldots = p_n = 1/n$.

PS: Some directions to solve this problem would be useful too (or a reference where it was already solved).....
Ahem. Are you sure that $p_1 = p_2 = p_3 = 1/3$ maximizes f? What about $p_1 = 1, p_2 = p_3 = 0$ ? Doesn't that give you a bigger f?

6. ## Re: Inclusion-Exclusion - Maximizing Probabilities

I'm assuming for all practical purposes, all the probabilities are non-zero ones (even though zero is a valid probability).

So you would add the constraint that p1 > 0, p2 > 0, ..., pn > 0.

In most real situations, we don't consider zero probabilities especially if we have a population distribution (there's no point in doing so).

7. ## Re: Inclusion-Exclusion - Maximizing Probabilities

This is a case where it might initially be easier to understand written in English than using definitions and sigma-sum notations:

For a generic n, f(p1, p2, ... pn) = (sum p's 1 at a time) - sum(p's 2 at a time) + sum(p's 3 at a time) - + ... (-1)^(n+1) sum(p's n at a time).

Note sum(p's n at a time) is just the product of all the p's.
Let's declare the domain of f to be: all the p's are in [0, 1).

Now fix integer k, 1<=k<=n
(sum p's 1 at a time) = pk + (sum p's other than pk, 1 at a time)
(sum p's 2 at a time) = pk(sum p's other than pk, 1 at a time) + (sum p's other than pk, 2 at a time)
(sum p's 3 at a time) = pk(sum p's other than pk, 2 at a time) + (sum p's other than pk, 3 at a time)
(sum p's 4 at a time) = pk(sum p's other than pk, 3 at a time) + (sum p's other than pk, 4 at a time)
...
(sum p's (n-1) at a time) = pk(sum p's other than pk, (n-2) at a time) + (sum p's other than pk, (n-1) at a time)
(sum p's n at a time) = pk(sum p's other than pk, (n-1) at a time) + (0)

That 0 is because there aren't n p's there once you exclude pk, so (sum p's other than pk, n at a time) isn't defined. Look back at f and you'll see this must be zero.

With that in mind, let $s_k(j)(p_1, p_2, ... p_n)$ = (sum p's other than pk, taken j at a time) for 0<=j<=n.

Note that $s_k(n) = 0$. Also, will define $s_k(0) = 1$. (This is just a matter of convenience for later sums.)

Note that the *value* of $s_k(1) = 1 - p_k$ when the values of the p's are evaluated, but that's not how the function itself is defined, but rather is the arithmetic consequence of our knowing the constraint that the sum of the p's is 1. Keeping this straight is conceptually important since we'll be taking derivatives.

I want to make clear that, by definition, $s_k(1) = p_1 + p_2 + ... + \hat{p_k} + ... + p_n$.

Note that, by construction (and with the $s_k(1)$ case hopefully clarified), have: $\frac{\partial s_k(j)}{\partial p_k} = 0 \ \forall j$.

Now from these "(sum p's 3 at a time) = pk(sum p's other than pk, 2 at a time) + (sum p's other than pk, 3 at a time)" get:

(sum p's j at a time) $= p_k s_k(j-1) + s_k(j), 1 \le j \le n$.

Check boundary special cases:

(sum p's 1 at a time) $= p_k s_k(0) + s_k(1) = p_k (1) + s_k(1) = p_k + s_k(1)$. Check.

(sum p's n at a time) $= p_k s_k(n-1) + s_k(n) = p_k s_k(n-1) + (0) = p_k s_k(n-1)$. Check.

All that is to get this (and "this" is to prepare to solve using Lagrange Multipliers):

$f(p_1, p_2, ... p_n) = \sum_{i=1}^n (-1)^{i+1}$ (all the p's taken i at a time)

$f(p_1, p_2, ... p_n) = \sum_{i=1}^n (-1)^{i+1} (p_k s_k(i-1) + s_k(i))$

And thus: $\frac{\partial f}{\partial p_k} = \sum_{i=1}^n (-1)^{i+1} s_k(i-1)$

So far, it's all formalizing/defining/structuring the problem. No meat. Here comes the meat: Will now algebraically manipulate so that can write $\frac{\partial f}{\partial p_k}$ in terms of $f$ and $p_k$ only:

$p_k\frac{\partial f}{\partial p_k} = \sum_{i=1}^n (-1)^{i+1} p_k s_k(i-1) = \sum_{i=1}^n (-1)^{i+1} ( p_k s_k(i-1) + s_k(i) - s_k(i) )$

$= \sum_{i=1}^n (-1)^{i+1} ( p_k s_k(i-1) + s_k(i) ) - \sum_{i=1}^n (-1)^{i+1} s_k(i) = f - \sum_{i=1}^n (-1)^{i+1} s_k(i)$

(now set i = j-1 in the summation) $= f - \sum_{j=2}^{n+1} (-1)^{j} s_k(j-1) = f + \sum_{j=2}^{n+1} (-1)^{j+1} s_k(j-1)$

$= f - (-1)^{1+1} s_k(1-1) + (-1)^{1+1} s_k(1-1) + \sum_{j=2}^{n+1} (-1)^{j+1} s_k(j-1)$

$= f - (-1)^{1+1} s_k(1-1) + \sum_{j=1}^{n+1} (-1)^{j+1} s_k(j-1)$

$= f - (-1)^{1+1} s_k(1-1) + (-1)^{(n+1)+1} s_k((n+1)-1) + \sum_{j=1}^{n} (-1)^{j+1} s_k(j-1)$

$= f - s_k(0) + (-1)^n s_k(n) + \sum_{j=1}^{n} (-1)^{j+1} s_k(j-1)$

$= f - (1) + (0) + \sum_{j=1}^{n} (-1)^{j+1} s_k(j-1)$

$= f - 1 + \sum_{j=1}^{n} (-1)^{j+1} s_k(j-1) = f - 1 + \frac{\partial f}{\partial p_k}$.

Have just shown that: $p_k\frac{\partial f}{\partial p_k} = f - 1 + \frac{\partial f}{\partial p_k}$.

(Here's why $p_k = 1$ had to be excluded.) Solving that for $\frac{\partial f}{\partial p_k}$ gives:

$\frac{\partial f}{\partial p_k} = \frac{1-f}{1-p_k}$.

Now do Lagrange: Let $g(p_1, p_2, ... , p_n) = p_1 + p_2 + ... + p_n$.

Then the constraint is $g(p_1, p_2, ... , p_n) = 1$, so find staionary points at $(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)$ via Lagrange:

$\left. \triangledown f\right|_{(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)} = \lamba \left. \triangledown (g - 1)\right|_{(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)}$, which gives that:

$\left. \frac{\partial f}{\partial p_k}\right|_{(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)} = \lambda \left. \frac{\partial g}{\partial p_k}\right|_{(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)} = \lambda (1) = \lambda \ \forall k \in \{1, 2, ..., n\}$.

Thus

$\left. \frac{\partial f}{\partial p_k}\right|_{(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)} = \frac{1-f(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)}{1-\tilde{p}_k} = \lambda \ \forall k \in \{1, 2, ..., n\}$.

(To solve, this will require that $\lambda \ne 0$, so $f(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n) \ne 1$. Check this requirement at the end.)

Have $1-\tilde{p}_k = \frac{1-f(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)}{\lambda} \ \forall k \in \{1, 2, ..., n\} (\lambda \ne 0)$.

Thus $\tilde{p}_k = 1 - \frac{1-f(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)}{\lambda} \ \forall k \in \{1, 2, ..., n\} (\lambda \ne 0)$.

So at a stationary point, all the $\tilde{p}_k$ equal the same constant $\left(1 - \frac{1-f(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)}{\lambda}\right)$,

and so they're all equal to one another. When that's combined with the constraint $\tilde{p}_1 + \tilde{p}_2 + ... + \tilde{p}_n = 1$, it follows that:

If $(\tilde{p}_1, \tilde{p}_2, ... , \tilde{p}_n)$ is a staionary point for $f$, then, if $f(1/n, 1/n, ... , 1/n) \ne 1$,

must have that $\tilde{p}_1 = \tilde{p}_2 = ... = \tilde{p}_n = \frac{1}{n}$.

I'm not going to do the check that $n>1, f(1/n, 1/n, ... , 1/n) \ne 1$, though it is technically necessary.

--------------------------------------
Putting it all together: If none of the p's are 1, then the only stationary point for f is at (1/n, 1/n, 1/n, ... 1/n).
--------------------------------------

Aside: To decide that it's a local maximum, compute the Bordered Hessian at the point (1/n, 1/n, 1/n, ...). You can get all the 2nd partials easily:

$\frac{\partial f}{\partial p_k} = \frac{1-f}{1-p_k} \Rightarrow \frac{\partial^2 f}{\partial p_l\partial p_k} = \frac{\partial}{\partial p_l} \left( \frac{1-f}{1-p_k} \right)$

Then $\frac{\partial}{\partial p_l} \left(\frac{1-f}{1-p_k}\right) = - \frac{1-f}{(1-p_k)(1-p_l)}$ when $l \ne k$, and

$\frac{\partial}{\partial p_k} \left(\frac{1-f}{1-p_k}\right) = \frac{(1-p_k)(- \frac{1-f}{1-p_k}) - (1-f)(-1)}{(1-p_k)^2} = 0$.

Therefore $\frac{\partial^2 f}{\partial p_l\partial p_k} = - \frac{1-f}{(1-p_k)(1-p_l)}$ if $l \ne k$, and $0$ if $l = k$.

8. ## Re: Inclusion-Exclusion - Maximizing Probabilities

Originally Posted by chiro
I'm assuming for all practical purposes, all the probabilities are non-zero ones (even though zero is a valid probability).

So you would add the constraint that p1 > 0, p2 > 0, ..., pn > 0.

In most real situations, we don't consider zero probabilities especially if we have a population distribution (there's no point in doing so).
OK, since my first hint appears to be too subtle:

You have not maximized f by choosing all the probabilities equal. You have minimized f.