Hi,
I am trying to describe one distribution and am plotting the entropy of that distribution between maximum uncertainty (i.e a uniform distribution) and a normal distribution. The entropy of the real data lies somewhere in between the entropy of the uniform and normal distribution.
Because entropy is a summation of probability * log probability of states, by increasing the number of bins in my pdf, the entropy of my distribution increases.
Therefore it could be misleading. I could use a very high 'n', which would show the distribution of my data has a low entropy (closer to the normal distribution), or I could run the same analysis with a very low 'n', which would show my data having a high entropy.
I have looked at normalizing the entropy, by dividing by log(n). This ensures that the entropy of a uniform distribution is constant at '1' (the highest degree of uncertainty). However, when normalizing the entropy of a Gaussian distribution, the resultant entropy does not remain constant as 'n' increases. (by my calculations a normal distribution with '0' mean and variance of 0.02, has an entropy that oscillates between 0 and 0.3 for n=[1:50], it then monotonically increases until entropy is 0.539 at n = 1,000, and perpetuates to an entropy of 0.6524 at n = 10,000)
Is there any resolution to this challenge in the statistical community? How do we decide on the correct 'n' for describing this distribution?
I see that the differential entropy for a normal distribution is: ln(standardDeviation*sqrt(2*pi*e) and for uniform is ln(b-a).
Therefore I could set the normalized differential entropy as '1' for the uniform distribution and set the normalized differential entropy of the gaussian distribution to
Entropy of normalized differential entropy of gaussian = ln(standardDeviation*sqrt(2*pi*e)/ln(b-a)
Yet, this would still not resolve me choosing different 'n's for calculating the entropy of my discrete distributions.


LinkBack URL
About LinkBacks