
Information gain AI
We have 7 learn examples. 3 positive and 4 negative. Observed attribute A can have 2 values: V1 and V2. All learn examples where A = V1 are positive learn examples. When A = V2 we have just one positive learn example. What is the information gain for attribute A?
I was calculating this with formulas from wikipedia. And I get log2(0) * 0. Which is not defined.
Entropy for a class is $\displaystyle \frac{3}{7}\log(\frac{3}{7}) + \frac{4}{7}\log(\frac{4}{7})$
Is this also incorrect?
Can someone explain this number by number. Thank you for your help.

Re: Information gain AI
Hey Nforce.
The log_a(0)*0 term is equal to 0 for any appropriate value of a. Is this part of a data mining course (or similar) or part of a theoretical course for information theory and statistics?

Re: Information gain AI
It's for data mining course (Machine learning). Do you understand this? Because we didn't make any examples and I don't really know if I am doing right.

Re: Information gain AI
In terms of information gain, the idea entropy wise is to get a lower entropy (which corresponds to an increase in being able to model and predict the outcomes).
What kind of probabilities are you dealing with? Are they conditional probabilities? (Are you getting information gain based on conditional probabilities and updates)?
If you have specific formulae I can decipher what is going on for you.

Re: Information gain AI
http://i.imgur.com/bPdG5Lh.png
$\displaystyle n$ is number of learn examples
$\displaystyle n_k$ is number of learn examples from class $\displaystyle r_k$
$\displaystyle n_.j$ is number of learn examples with jvalue of attribute $\displaystyle A_i$
$\displaystyle n_kj$ is number of learn examples from class $\displaystyle r_k$ and with jvalue.

Re: Information gain AI
This formulae looks at what is called mutual information. This is a measure of how much information two random variables share. The higher the value is, the more the values have in common. The wiki entry does a good job of explaining this:
Mutual information  Wikipedia, the free encyclopedia
Basically the higher the value, the better the ability of one variable to explain another (i.e. they are more dependent on each other as opposed to independent).
From a data mining perspective, you are finding out relationships between variables. By doing this you eliminate all sorts of redundancy and find the minimum set of independent variables that contributes to the variation explained by the data. Once you have a good idea of these variables then you can interpret what is going on in the context of your data.
Basically the formulas just use the statistical attributes of the data (you plug these into the formula). You interpret the value based on how similar random variables are in terms of their information relationship, and based on the value, you conclude whether they are highly dependent or highly independent.

Re: Information gain AI
So $\displaystyle H_R = (\frac{3}{7}\log(\frac{3}{7}) + \frac{4}{7}\log(\frac{4}{7}))$
Is this correct? Where do we consider that attribute A has 2 values V1, and V2.