Dear all,

I'm trying to better understand Bayes theorem.

I have as outcome variable DIS, being either 0 or 1
I have as possible symptoms A {A1,A2}, B {B1,B2,B3}, C {C1,C2,C3,C4}

P(DIS=1|A=A1,B=B1,C=C1) =
p(DIS=1) * p(A=A1,B=B1,C=C1|DIS=1) / p(A=A1,B=B1,C=C1)

- naive Bayes assumes conditional independence, simplifying the equation to
p(DIS=1) * p(A=A1|DIS=1)*p(B=B1|DIS=1)*p(C=C1|DIS=1) / p(A=A1,B=B1,C=C1)
- bayesian networks are of course right to use, but cannot be done on the back of an envelope
- in order to understand things, I really want to try the above equation as it stands, which is generally avoided, in order to figure out why it should be avoided

--is it then so difficult to estimate p(A=A1,B=B1,C=C1|DIS=1) with a dataset at hand ? this is what you need to have for bayesian networks anyway
-- I have difficulty to decompose p(A=A1,B=B1,C=C1|DIS=1) analytically without making assumptions, although I feel the chain rule must make this possible; how does one do this ??
-- I am very curious whether using the chain rule to decompose p(A=A1,B=B1,C=C1|DIS=1) analytically will involve all the values the symptoms can take on, and not only A1, B1 and C1, since everyone seems to justify the necessity for naive Bayes by referring to the combinatorial explosion when having several symptoms with each several possible values

Kind thanks for your help !