Hi all,
This is probably a somewhat elementary question, but I've been working on it for some time without any luck.
I have a 2x2 contingency table:
1 0 Y 3 92 N 8 743
1,0 refers to a word (w=1 means word present; w=0 means word absent). N,Y refers to a disease (D=1 means disease present; D=0 means disease absent).
I need to calculate the information gain using the statistic (sorry for poor formatting):
I(w,D)=Sigmaj=0,1Sigmak=0,1P(w=k,D=j)log2 P(w=k,D=j)
P(w=k)P(D=j)
The probability calculations are straightforward, but I am unsure how the statistics should look when expanded out - should there be 4 different calculations? For instance, (w=1,D=1)+(w=1,D=0)+(w=0,D=1)+(w=0,D=0)
Any assistance gratefully received!![]()


LinkBack URL
About LinkBacks