well... I don't want to get in over my head w/out having reviewed some of this stuff I haven't done in 30 yrs.

but basically entropy is the expected value of the log of the distribution.

for a discrete rv X, H(X) = Sum[p_{k}ln(p_{k})] and the units are nats because we used the natural log. Use log_{2}for bits

for a continuous rv you have differential entropy given by h(X) = Integral[p(X) ln(p(x)) dx, over the support of X]

It's a measure of how random a distribution is. A drv X ~ {0.8, 0.1, 0.05, 0.05} for example has less entropy than Y ~ {0.25, 0.25, 0.25, 0.25}

It's also a measure how how expensive it is to represent the underlying random variable in terms of units of information. It takes less information to represent low entropy sources. It takes more to represent high entropy sources. This is important for compression and other types of efficient source coding.

I'll leave it to you to investigate further.