The log_a(0)*0 term is equal to 0 for any appropriate value of a. Is this part of a data mining course (or similar) or part of a theoretical course for information theory and statistics?
We have 7 learn examples. 3 positive and 4 negative. Observed attribute A can have 2 values: V1 and V2. All learn examples where A = V1 are positive learn examples. When A = V2 we have just one positive learn example. What is the information gain for attribute A?
I was calculating this with formulas from wikipedia. And I get log2(0) * 0. Which is not defined.
Entropy for a class is
Is this also incorrect?
Can someone explain this number by number. Thank you for your help.
In terms of information gain, the idea entropy wise is to get a lower entropy (which corresponds to an increase in being able to model and predict the outcomes).
What kind of probabilities are you dealing with? Are they conditional probabilities? (Are you getting information gain based on conditional probabilities and updates)?
If you have specific formulae I can decipher what is going on for you.
This formulae looks at what is called mutual information. This is a measure of how much information two random variables share. The higher the value is, the more the values have in common. The wiki entry does a good job of explaining this:
Mutual information - Wikipedia, the free encyclopedia
Basically the higher the value, the better the ability of one variable to explain another (i.e. they are more dependent on each other as opposed to independent).
From a data mining perspective, you are finding out relationships between variables. By doing this you eliminate all sorts of redundancy and find the minimum set of independent variables that contributes to the variation explained by the data. Once you have a good idea of these variables then you can interpret what is going on in the context of your data.
Basically the formulas just use the statistical attributes of the data (you plug these into the formula). You interpret the value based on how similar random variables are in terms of their information relationship, and based on the value, you conclude whether they are highly dependent or highly independent.