Information Gain for 2x2 Contingency Table

Hi all,

This is probably a somewhat elementary question, but I've been working on it for some time without any luck.

I have a 2x2 contingency table:

1,0 refers to a word (w=1 means word present; w=0 means word absent). N,Y refers to a disease (D=1 means disease present; D=0 means disease absent).

I need to calculate the information gain using the statistic (sorry for poor formatting):

I(w,D)=Sigma_{j=0,1}Sigma_{k=0,1}P(w=k,D=j)log_{2} | __P(w=k,D=j)__
P(w=k)P(D=j) |

The probability calculations are straightforward, but I am unsure how the statistics should look when expanded out - should there be 4 different calculations? For instance, (w=1,D=1)+(w=1,D=0)+(w=0,D=1)+(w=0,D=0)

Any assistance gratefully received! (Happy)

Re: Information Gain for 2x2 Contingency Table

Hey CSHowe.

Your expansion should have four terms as you have pointed out since you have summations that have independent indices of 2 each so 2*2 gives 4 terms in total.

Re: Information Gain for 2x2 Contingency Table

Hi Chiro,

Thank you very much for the clarification.

C