# Thread: Asymptotic expansion of a discrete function

1. ## Asymptotic expansion of a discrete function

Hello,

I was wondering how to derive the asymptotic expansion of the following function:

$
L = -\sum_{i=1}^{N}\left( p_{i} \log_{2}p_{i}\right)
$

where

$p_{i}$

is the probability function of the i-th event, say.

I don't even know where to start, or if asymptotic expansion can be done.

2. Originally Posted by EmpSci
Hello,

I was wondering how to derive the asymptotic expansion of the following function:

$
L = -\sum_{i=1}^{N}\left( p_{i} \log_{2}p_{i}\right)
$

where

$p_{i}$

is the probability function of the i-th event, say.

I don't even know where to start, or if asymptotic expansion can be done.
Asymptotic expansion with respect to N? We obviously need to know how $p_i$ depends on N to do that...

3. Originally Posted by Laurent
Asymptotic expansion with respect to N? We obviously need to know how $p_i$ depends on N to do that...
Well, if you have a set of N elements, and p_i is the probability of the i-th element in this set, then that's how you could relate p_i. In fact, p_i=P(X=some value in the set). If you want, you can disregard N, I just need to see how such an expansion could be done.

4. I believe I should be more explicit in what I am looking for.

I have the following discrete function:

$
H = \sum_{i=1}^{N}\left(p_{i} log_{2}\frac{1}{p_{i}}\right)
$

where $p_{i}$ is the probability of the $i^{th}$ object in the indexed set $1,2,...,N$. Consider this probability distribution to be a theoretical distribution.

Now, from some experiments, say that I have collected the empirical probability distribution $q_{i}$ in order to build the following function:

$
S = \sum_{i=1}^{N}\left(q_{i} log_{2}\frac{1}{q_{i}}\right)
$

Obviously, these two functions are mathematical expectations on $-log_{2}p_{i}$ and $-log_{2}q_{i}$, wherein the first function, $H$, is the theoretical expectation, while the second function, $S$, is the observed expectation. Therefore, I wanted to calculate the error involved in terms of the asymptotic expansion of the difference in observed and theoretical probabilities.

That is, given the $i^{th}$ error term, $e_{i} = q_{i} - p_{i}$, I am looking for an asymptotic expansion on $e_{i}$ in order to provide an asymptotic argument on the discrepancy between the two expectations above.

Hence, I am looking to express the error $E = S - H$ in terms of some asymptotic series $\sum_{k=1}^{K}a_{k}g_{k}$ so that it can be the asymptotic expansion of function $E$. In other words, I want to find that series so that:

$
E = \sum_{k=1}^{K}a_{k}g_{k} + o\left(g_{K}\right)
$

I don't even know how to start looking for $g$. Any insights would be encouraging for me to explore further.

5. Originally Posted by EmpSci
I believe I should be more explicit in what I am looking for.
Indeed!

--
You may be looking for the following: $(x+h)\log (x+h)-x\log x\sim (\log x -1)h$ (from the derivative of $x\log x$) when $h\to 0$, hence $S-H=\sum_{i=1}^N (q_i\log q_i-p_i\log p_i) = \sum_{i=1}^N \left((\log p_i-1)e_i+o(e_i)\right)$ when $e_i\to 0$.

If necessary, you can precise the remainder using second order Taylor inequality: $|(x+h)\log (x+h)-x\log x-(\log x-1)h|\leq \frac{h^2}{2(x-h_0)}$ for any $|h|. In particular, if the points $p_i$ are fixed, $|S-H-\sum_{i=1}^N(\log p_i-1)e_i|\leq\frac{N}{2(\min_i p_i-\epsilon)}\sum_{i=1}^N e_i^2$ as soon as $q_i\geq \min_j p_j-\epsilon$ for all $i$.

6. Thank you, Laurent.

Apparently, in the second "more precise" expansion, the linear terms seem to disappear. Does that imply that the discrepancy is negligible?

7. Originally Posted by EmpSci
Thank you, Laurent.

Apparently, in the second "more precise" expansion, the linear terms seem to disappear.
I don't understand; I only put the linear term in the left-hand side.

8. I mean, how would you interpret the right-hand side of the inequality? Could you write that using Big-O notation such as the following:

$|S-H-\sum_{i=1}^N(\log p_i-1)e_i| = O\left( \frac{N}{2(\min_i p_i-\epsilon)}\sum_{i=1}^N e_i^2 \right)$

9. Ok, here's what I came up with.

The forward difference that you consider:

$
(x+h)\log (x+h)-x\log x
$

is the first-order Newton series approximation of the xlogx function:

$
(x+h)\log (x+h)-x\log x\sim (\log x -1)h
$

Then, from this, we approximated function E = S - H as:

$

E = \sum_{i=1}^N \left(e_i f'(p_i)+o(e_i)\right)
$

Now, we may consider approximating the expansion using n-th order Newton series approximations. That is, define the n-th order forward difference as:

$
\Delta_{h}^{n}[f(x)] = \sum_{i=0}{n}\left( (-1)^{n}C(n,i)f(x+(n-i)h) \right)
$

where C(n,i) is the combination "n choose i".

Then, we to obtain the n-th order derivative approximation of f, we have:

$
\frac{\Delta_{h}^{n}[f(x)]}{h^{n}} \sim f^{(n)}(x)
$

Or, we may write:

$
\Delta_{h}^{n}[f(x)] \sim f^{(n)}(x) h^{n}
$

And, the general expression for our discrepancy function E = S - H becomes:

$
E=S-H = \sum_{i=1}^{N}\left( f^{(n)}(p_i) e_{i}^{n} + o(e_{i}^{n}) \right)
$

as $e_i \to 0$, naturally.

So, for instance, for $n=2$, and using the function $-xlog_{2}x$, we have the second-order approximation of E equal to:

$
E=S-H = \sum_{i=1}^{N}\left( -\frac{1}{\ln2} \frac{e_{i}^{2}}{p_i} + o(e_{i}^{2}) \right)
$

How would you comment on this approach? I am really interested on your insights and suggestions.

10. Also, by definition $f \sim g$ is equivalent to stating that $f(x)-g(x)=o(g(x))$.

Now, shouldn't the little-o in our functions above be equal to:

$o \left( \frac{-1}{\ln2} \frac{e_{i}^{n}}{p_{i}^{n-1}} \right) = o \left( \frac{e_{i}^{n}}{p_{i}^{n-1}} \right)$

for the n-th order approximation, so that it complies with the definition of asymptotic equivalence?

11. Originally Posted by EmpSci
And, the general expression for our discrepancy function E = S - H becomes:

$
E=S-H = \sum_{i=1}^{N}\left( f^{(n)}(p_i) e_{i}^{n} + o(e_{i}^{n}) \right)
$

as $e_i \to 0$, naturally.

So, for instance, for $n=2$, and using the function $-xlog_{2}x$, we have the second-order approximation of E equal to:

$
E=S-H = \sum_{i=1}^{N}\left( -\frac{1}{\ln2} \frac{e_{i}^{2}}{p_i} + o(e_{i}^{2}) \right)
$
I don't know about this Newton approximation, but there is obviously something wrong with your formulas: I guess you have to sum the approximation terms for $n=1,2,\ldots,$ up to the order you want. For Taylor approximation, for instance, $f(x+h)=f(x)+f'(x)h+\frac{1}{2}f''(x)h^2+\cdots+\fr ac{1}{n!}f^{(n)}(x)h^n+o(h^n)$, and not just $f(x+h)=f(x)+\frac{1}{n!}f^{(n)}(x)h^n+o(h^n)$, of course.

Anyway, I would be very surprised if you need third order approximation or beyond.

12. Yes, by definition, I need to sum up the terms. Thanks for your input, though a comment of yours on how to interpret, say, the second-order approximation would also be appreciated.