We have 7 learn examples. 3 positive and 4 negative. Observed attribute A can have 2 values: V1 and V2. All learn examples where A = V1 are positive learn examples. When A = V2 we have just one positive learn example. What is the information gain for attribute A?

I was calculating this with formulas from wikipedia. And I get log2(0) * 0. Which is not defined.

Entropy for a class is $\displaystyle \frac{3}{7}\log(\frac{3}{7}) + \frac{4}{7}\log(\frac{4}{7})$

Is this also incorrect?

Can someone explain this number by number. Thank you for your help.