Problem setup:

X1, X2... are i.i.d. random variables drawn according to PMF p(x), x $\displaystyle \in$ {1, 2, ... , m}. Thus, $\displaystyle p(x_1, x_2,..., x_n) = \prod_{i=1}^{n}p(x_i)$. We know that $\displaystyle -\frac{1}{n}log(p(X_1, X_2,..., X_n)) \to H(X)$ in probability. Let$\displaystyle q(x_1, x_2,...,x_n) = \prod_{i=1}^{n}q(x_i)$, where q is another PMF on {1, 2,...,m}.

Evaluate $\displaystyle lim -\frac{1}{n}log(q(X_1,X_2,...,X_n)),$ where X1, X2,... are i.i.d. ~p(x).

So the tricky part is that the X's are still distributed according to p(x), but now I should compute instead what the entropy would be using the wrong PMF. Or is that even the entropy I'm computing? It looks more like the entropy rate.

Solution:

I tried to solve the problem like this, do you agree, and if you do, perhaps you can see an alternative solution?

$\displaystyle lim -\frac{1}{n}log(q(X_1,X_2,...,X_n)) = -\frac{1}{n} \sum_{i=1}^{n}(log(q(x_i))) $ // need to introduce a relation between q and p since we know what this expression evaluates to when p(x) is plugged in

$\displaystyle = \left\{q(x_i} = p(x_i) + r_i\right\}$ // r_i is the difference between the real and the incorrect probability of r.v. number i

$\displaystyle = - \frac{1}{n} \sum_{i=1}^{n} log p(x_i) + r_i$

$\displaystyle = - \frac{1}{n} \sum_{i=1}^{n} log p(x_i) - \frac{1}{n} \sum_{i=1}^{n} r_i $

$\displaystyle = \left\{\frac{1}{n} \sum_{i=1}^{n} r_i = R\right\}$

$\displaystyle = H(X) - R$