1. ## Information Gain for 2x2 Contingency Table

This is probably a somewhat elementary question, but I've been working on it for some time without any luck.

I have a 2x2 contingency table:

 1 0 Y 3 92 N 8 743

1,0 refers to a word (w=1 means word present; w=0 means word absent). N,Y refers to a disease (D=1 means disease present; D=0 means disease absent).

I need to calculate the information gain using the statistic (sorry for poor formatting):

 I(w,D)=Sigmaj=0,1Sigmak=0,1P(w=k,D=j)log2 P(w=k,D=j) P(w=k)P(D=j)

The probability calculations are straightforward, but I am unsure how the statistics should look when expanded out - should there be 4 different calculations? For instance, (w=1,D=1)+(w=1,D=0)+(w=0,D=1)+(w=0,D=0)

2. ## Re: Information Gain for 2x2 Contingency Table

Hey CSHowe.

Your expansion should have four terms as you have pointed out since you have summations that have independent indices of 2 each so 2*2 gives 4 terms in total.

3. ## Re: Information Gain for 2x2 Contingency Table

Hi Chiro,

Thank you very much for the clarification.

