EM Algorithm Clustering Nominals

Feb 2009
I am trying to derive the EM algorithm (applied to clustering) for the case where there is a numeric variable assumed to be Gaussian distributed (x1) and a nominal variable (x2) with three levels (a1,a2,a3). I am assuming there are two clusters (C_1 and C_2). Also, assuming that the variables are independent and that the examples (aka cases, aka records) as independent.

I am attaching my work because this question may be easier to follow.

Anyone know how to set this up to then take the derivative etc to derive the maximum likelihood estimates?




Last edited: