# Math Help - Information gain AI

1. ## Information gain AI

We have 7 learn examples. 3 positive and 4 negative. Observed attribute A can have 2 values: V1 and V2. All learn examples where A = V1 are positive learn examples. When A = V2 we have just one positive learn example. What is the information gain for attribute A?

I was calculating this with formulas from wikipedia. And I get log2(0) * 0. Which is not defined.

Entropy for a class is $\frac{3}{7}\log(\frac{3}{7}) + \frac{4}{7}\log(\frac{4}{7})$

Is this also incorrect?

Can someone explain this number by number. Thank you for your help.

2. ## Re: Information gain AI

Hey Nforce.

The log_a(0)*0 term is equal to 0 for any appropriate value of a. Is this part of a data mining course (or similar) or part of a theoretical course for information theory and statistics?

3. ## Re: Information gain AI

It's for data mining course (Machine learning). Do you understand this? Because we didn't make any examples and I don't really know if I am doing right.

4. ## Re: Information gain AI

In terms of information gain, the idea entropy wise is to get a lower entropy (which corresponds to an increase in being able to model and predict the outcomes).

What kind of probabilities are you dealing with? Are they conditional probabilities? (Are you getting information gain based on conditional probabilities and updates)?

If you have specific formulae I can decipher what is going on for you.

5. ## Re: Information gain AI

$n$ is number of learn examples
$n_k$ is number of learn examples from class $r_k$
$n_.j$ is number of learn examples with j-value of attribute $A_i$
$n_kj$ is number of learn examples from class $r_k$ and with j-value.

6. ## Re: Information gain AI

This formulae looks at what is called mutual information. This is a measure of how much information two random variables share. The higher the value is, the more the values have in common. The wiki entry does a good job of explaining this:

Mutual information - Wikipedia, the free encyclopedia

Basically the higher the value, the better the ability of one variable to explain another (i.e. they are more dependent on each other as opposed to independent).

From a data mining perspective, you are finding out relationships between variables. By doing this you eliminate all sorts of redundancy and find the minimum set of independent variables that contributes to the variation explained by the data. Once you have a good idea of these variables then you can interpret what is going on in the context of your data.

Basically the formulas just use the statistical attributes of the data (you plug these into the formula). You interpret the value based on how similar random variables are in terms of their information relationship, and based on the value, you conclude whether they are highly dependent or highly independent.

7. ## Re: Information gain AI

So $H_R = -(\frac{3}{7}\log(\frac{3}{7}) + \frac{4}{7}\log(\frac{4}{7}))$

Is this correct? Where do we consider that attribute A has 2 values V1, and V2.