Results 1 to 3 of 3

Math Help - A question regarding conditional probabilities in Bayesian Classifier theory

  1. #1
    Newbie
    Joined
    May 2011
    Posts
    5

    A question regarding conditional probabilities in Bayesian Classifier theory

    Hello,

    I googled my way up here, so I'm hoping I'm asking this in the right place. If not, I apologize.
    Anyway, I have a dilemma about some basics in probabilty and pattern recognition, and, hopefully, someone can help me.

    I'm not sure I understand what class-conditional pdf f(x|w_{i}) really means, and it's bothering me. Let me elaborate.

    When we use terms such as 'conditional probability pdf and cdf', by that we mean:



    where A is some event, a subset of a sample space. This event A must also be the domain of our functions defined above. It's a 'new universe', so to speak, for conditional probability cdfs and pdfs, and they only make sense if we look at them over this event A. For example, if we look at the random variable X with Gaussian distribution, and we denote event A as A={1.5<X<4.5}, then the corresponding conditional probability functions (pdf & cdf) look like



    As you can see, they're defined only over interval {1.5<x<4.5}, otherwise they wouldn't make sense.
    Often we are interested in conditional probabilites functions where the event A = (Y=y_{0}), and then we have

    f(x|y)= f(x|Y=y_{0})=f(x,y_{0})/f_{Y}(y_{0})

    We can interpret function f(x|y) as an intersection of a joint pdf f(x,y) with a plane y=y_{0} (with f_{Y}(y_{0}) as a normalization factor).

    This is all fairly basic stuff, I reckon. And these types of conditional probability functions are the only types I know that exist. But class-conditional probability functions, such as f(x|w_{i}) in Bayes classifiers theory, seem like a different beast to me.

    First of all, let me say that everything about naive Bayes classifier is perfectly intuitive and I don't have a problem from that POV. But when I try to define everything rigorously from a mathematical POV, I get stuck.

    First of all, we have these classes w_{i}. What exactly are they, mathematically speaking?! Their priors sum up to one, and they will eventually be represented by regions in our sample space, so I will define them as events in my sample space. If we look at the simplest example in 1-D, the conditional probability density functions would look something like this


    And then we could tell that event w_{1} is (-inf, \, x_{0}) and event w_{2} is (x_{0},\, +inf).

    If you ask me, this doesn't make sense. Conditional probability density function, by its very definition, must be confined to a space of the event it's conditioned with. In other words, p(x|w_{1}) should be constrained to the w_{1} region! But not only that it isn't, it spreads out over the w_{2} region as well! That shouldn't be possible, because w_{1} and w_{2} are mutually exclusive events, and their respective regions also do not overlap, which makes sense. But conditional probability density functions defined over them do? Wait, what?!

    Of course, this is how we define the error of our classification, but all this doesn't look very convincing to me, strictly mathematically speaking.
    Conditional pdf p(x|A) must be defined over the region which corresponds to the event A, period. Functions p(x|w_{1}) and p(x|w_{2}) shouldn't overlap each other like that, because the regions w_{1} and w_{2} are mutually exclusive. This is what basic theory of conditional probability density functions tells us.

    So, this is why I think that p(x|w_{i}) is not an ordinary conditional pdf like the one defined in the beginning of this post. But what is it then?! I don't know, I'm confused. Or maybe I shouldn't interpret classes w_{i} as regions in the sample space, and that's the mistake I'm making here. But what are they then, how should I interpret them?

    Also, if I assume that it's okay to interpret classes w_{i} as as regions in space, isn't there a recursive problem, because we first define p(x|w_{i}) over, supposedly known, event w_{i}, but we actually don't know what region the event w_{i} occupies in sample space? Because, that's, like, the point of classification, to determine these regions, that's what this is all about.
    But is this really okay, to define a function in the beginning which domain is actually unknown?

    Hopefully, I made at least some sense here, and thanks in advance for any help I can get.
    Cheers.
    Follow Math Help Forum on Facebook and Google+

  2. #2
    MHF Contributor
    Joined
    May 2010
    Posts
    1,028
    Thanks
    28
    Quote Originally Posted by lajka View Post
    Functions p(x|w_{1}) and p(x|w_{2}) shouldn't overlap each other like that, because the regions w_{1} and w_{2} are mutually exclusive.
    im not an expert on baysian classifiers but why exactly do you think this is a problem? If two events are mutually exclusive then those two events cant both together, but there's no restriction on what values another variable can take if one or other event happens. For example consider the degenerate case where x is actually independent of \omega. The conditional PDFs would exactly coincide and therefore overlap everywhere.
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Grand Panjandrum
    Joined
    Nov 2005
    From
    someplace
    Posts
    14,972
    Thanks
    4
    Quote Originally Posted by lajka View Post
    Hello,

    I googled my way up here, so I'm hoping I'm asking this in the right place. If not, I apologize.
    Anyway, I have a dilemma about some basics in probabilty and pattern recognition, and, hopefully, someone can help me.

    I'm not sure I understand what class-conditional pdf f(x|w_{i}) really means, and it's bothering me. Let me elaborate.

    When we use terms such as 'conditional probability pdf and cdf', by that we mean:



    where A is some event, a subset of a sample space. This event A must also be the domain of our functions defined above. It's a 'new universe', so to speak, for conditional probability cdfs and pdfs, and they only make sense if we look at them over this event A. For example, if we look at the random variable X with Gaussian distribution, and we denote event A as A={1.5<X<4.5}, then the corresponding conditional probability functions (pdf & cdf) look like



    As you can see, they're defined only over interval {1.5<x<4.5}, otherwise they wouldn't make sense.
    Often we are interested in conditional probabilites functions where the event A = (Y=y_{0}), and then we have

    f(x|y)= f(x|Y=y_{0})=f(x,y_{0})/f_{Y}(y_{0})

    We can interpret function f(x|y) as an intersection of a joint pdf f(x,y) with a plane y=y_{0} (with f_{Y}(y_{0}) as a normalization factor).

    This is all fairly basic stuff, I reckon. And these types of conditional probability functions are the only types I know that exist. But class-conditional probability functions, such as f(x|w_{i}) in Bayes classifiers theory, seem like a different beast to me.

    First of all, let me say that everything about naive Bayes classifier is perfectly intuitive and I don't have a problem from that POV. But when I try to define everything rigorously from a mathematical POV, I get stuck.

    First of all, we have these classes w_{i}. What exactly are they, mathematically speaking?! Their priors sum up to one, and they will eventually be represented by regions in our sample space, so I will define them as events in my sample space. If we look at the simplest example in 1-D, the conditional probability density functions would look something like this


    And then we could tell that event w_{1} is (-inf, \, x_{0}) and event w_{2} is (x_{0},\, +inf).

    If you ask me, this doesn't make sense. Conditional probability density function, by its very definition, must be confined to a space of the event it's conditioned with. In other words, p(x|w_{1}) should be constrained to the w_{1} region! But not only that it isn't, it spreads out over the w_{2} region as well! That shouldn't be possible, because w_{1} and w_{2} are mutually exclusive events, and their respective regions also do not overlap, which makes sense. But conditional probability density functions defined over them do? Wait, what?!

    Of course, this is how we define the error of our classification, but all this doesn't look very convincing to me, strictly mathematically speaking.
    Conditional pdf p(x|A) must be defined over the region which corresponds to the event A, period. Functions p(x|w_{1}) and p(x|w_{2}) shouldn't overlap each other like that, because the regions w_{1} and w_{2} are mutually exclusive. This is what basic theory of conditional probability density functions tells us.

    So, this is why I think that p(x|w_{i}) is not an ordinary conditional pdf like the one defined in the beginning of this post. But what is it then?! I don't know, I'm confused. Or maybe I shouldn't interpret classes w_{i} as regions in the sample space, and that's the mistake I'm making here. But what are they then, how should I interpret them?

    Also, if I assume that it's okay to interpret classes w_{i} as as regions in space, isn't there a recursive problem, because we first define p(x|w_{i}) over, supposedly known, event w_{i}, but we actually don't know what region the event w_{i} occupies in sample space? Because, that's, like, the point of classification, to determine these regions, that's what this is all about.
    But is this really okay, to define a function in the beginning which domain is actually unknown?

    Hopefully, I made at least some sense here, and thanks in advance for any help I can get.
    Cheers.
    x is a feature that can be observed from both classes w_1 and w_2, so p(x|w_1) and p(x|w_2) can both be simultaneously non-zero.

    For instance suppose x denotes a subjects height and the classes are w_1:"men" w_2:"women".

    Now I observe a height x=2.1m for a subject, we expect p(x=2.1|w_1)>p(x=2.1|w_2) but the latter is not zero (or outside the possible range of heights for class w_2)

    CB
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Conditional Probabilities
    Posted in the Statistics Forum
    Replies: 7
    Last Post: January 6th 2010, 08:28 PM
  2. Conditional probabilities
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: December 10th 2009, 10:15 AM
  3. Conditional Probabilities
    Posted in the Advanced Statistics Forum
    Replies: 3
    Last Post: November 7th 2009, 04:33 AM
  4. Replies: 0
    Last Post: October 4th 2009, 05:20 PM
  5. Conditional Probabilities
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: April 27th 2008, 11:04 AM

Search Tags


/mathhelpforum @mathhelpforum