The Teachman index is often used as an estimator of diversity in the context of social sciences. It is defined as follows:
TI = - sum[ pk*ln(pk)]
where pk is the proportion of group members in the kth category. Therefore, we have maximum diversity if the sample is distributed equally across the k categories.
Unfortunately, this measure is biased in a way that it underestimates the diversity in smaller sample sizes (n). This is comparable to the biased estimation of the variance if you divide by n and not by (n-1).
I wonder if there is a correction formula to this equation that yields unbiased estimates of this index. I am not a statistician, so any hints are helpful, even the obvious ones...