Trantor75

The Teachman index is often used as an estimator of diversity in the context of social sciences. It is defined as follows:

TI = - sum[ pk*ln(pk)]

where pk is the proportion of group members in the *k*th category. Therefore, we have maximum diversity if the sample is distributed equally across the k categories.

Unfortunately, this measure is biased in a way that it underestimates the diversity in smaller sample sizes (n). This is comparable to the biased estimation of the variance if you divide by n and not by (n-1).

I wonder if there is a correction formula to this equation that yields unbiased estimates of this index. I am not a statistician, so any hints are helpful, even the obvious ones...

