Mutual Information Comparison between Joint and Product of Marginals
I would like to ask something regarding the interpretation of a mutual information graph. We know that the product of two marginal distributions is the joint under an independence assumption. I would like to compare the joint and the product-of-marginals distributions, which can be useful in identifying "just how" dependent two random variables are. For instance, in information theory, the "mutual information" of two distributions (a measure of how much each tells us about the other) is attained by taking the relative entropy of the actual joint distribution, with respect to the product-of-marginals distribution."
I have now computed the mutual information (MI) by using the appropriate formula. I have varied one of the parameters of the distribution and performed the integration numerously. Therefore, I have computed the MI for various values of that parameter (say p) so that I can later plot MI,p and observe the result. Please find attached the graph I obtained. It looks-like a skewed distribution and the question now is how does one interpret this graph? What can we say about it? How much information does the joint in comparison to the product of marginals share? We see that for large values of p the MI reduces significantly. This indicates that for large values of p a good approximation is achieved by the product of marginals. Any other hints? Also, the MI goes up to 0.035 (for smaller values of p) which is a small value but relative to what? For example, this maximum value (0.035) may be relatively small for practical purposes and therefore the product of marginals may give an acceptable error.
Thanks for your time in reading this. Please see attachment mi.jpg. The logarithm base was set to 10.