Results 1 to 5 of 5

Math Help - Probability that a value belongs to a dataset

  1. #1
    Newbie
    Joined
    Jan 2013
    From
    Adelaide
    Posts
    12

    Probability that a value belongs to a dataset

    G-day Math help forum, once again i find myself in need of your collective knowledge...

    I'm utterly stuck on a problem, and have been for some time now, wherein I need to calculate the % probability that a new value belongs to a particular dataset. I only have the subject value and the values of the dataset. Obviously the closer to the mean the value to greater the probability however the change in probability with distance from the mean is not likely to be linear (i.e. a bell curve)
    I've attempted to work it out using z values and standard deviations however i keep getting referrals to z value tables and calculator functions that the program I'm using simply doesn't have (Namely the field calculator in the attribute table of ESRI's ArcGIS).

    if for example my dataset was:

    2,5,7,6,4,5,6,4,5,6

    a new value of 5 or 6 would be significantly more probable than 1 or 9

    Does anyone have any idea's on this one? I'm sure I'm making it far more complicated than necessary but its got me stumped...
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Super Member
    Joined
    Oct 2012
    From
    Ireland
    Posts
    584
    Thanks
    155

    Re: Probability that a value belongs to a dataset

    You can use prediction intervals to find this. Find the Z score that would mean the new value belongs to the data set, then convert the z score to a percentile about the mean.
    If for example the Z score was 1, that corresponds to 68% of the population around the mean so tha chance your data point is part of the data set is 1-0.68= 32%
    Prediction interval - Wikipedia, the free encyclopedia
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Newbie
    Joined
    Jan 2013
    From
    Adelaide
    Posts
    12

    Re: Probability that a value belongs to a dataset

    I considered that and it seems ideal however this requires a z score table or an equivalent function and the field calculator doesn't have these, no one seems to be able to tell me how to calculate the standard normal properties without these.
    Follow Math Help Forum on Facebook and Google+

  4. #4
    MHF Contributor

    Joined
    Apr 2005
    Posts
    15,365
    Thanks
    1311

    Re: Probability that a value belongs to a dataset

    I have no idea what you mean by a "field calculator". The only way to find a normal probability if you cannot look it up in a table or use a calculator that has a "normal probabilty function", is to do a numerical integration- \int_a^b e^{-(x- \mu)^2/2\sigma} dx.
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Member
    Joined
    Apr 2012
    From
    Erewhon
    Posts
    164
    Thanks
    108

    Re: Probability that a value belongs to a dataset

    Quote Originally Posted by Mattrnfnr View Post
    G-day Math help forum, once again i find myself in need of your collective knowledge...

    I'm utterly stuck on a problem, and have been for some time now, wherein I need to calculate the % probability that a new value belongs to a particular dataset. I only have the subject value and the values of the dataset. Obviously the closer to the mean the value to greater the probability however the change in probability with distance from the mean is not likely to be linear (i.e. a bell curve)
    I've attempted to work it out using z values and standard deviations however i keep getting referrals to z value tables and calculator functions that the program I'm using simply doesn't have (Namely the field calculator in the attribute table of ESRI's ArcGIS).

    if for example my dataset was:

    2,5,7,6,4,5,6,4,5,6

    a new value of 5 or 6 would be significantly more probable than 1 or 9

    Does anyone have any idea's on this one? I'm sure I'm making it far more complicated than necessary but its got me stumped...
    What you ask cannot be done without more information and calculation than you are likely to be able to provide. What follows is a "desperate statistical" approach that will often be adequate.

    The simplest approach to your real problem (of deciding if a new value is likely to be from the same distribution as your data set) is to set a pair of control limits. Typically we would set these to be \pm 2 s from the mean (where s is your estimate of the standard deviation of the distribution your data set is sampled from). Though as the value 2 is in your test data and you have a small sample I would go with the slightly more conservative \pm 2.3 s control limits (based on the t-distribution which is often used for small sample work even though not strictly applicable).

    With these control limits you will reject about 5% of cases where the new value does come from the same distribution as your data set.
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Dataset Annalysis
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: March 10th 2012, 11:11 AM
  2. [SOLVED] Read the dataset into R
    Posted in the Math Software Forum
    Replies: 1
    Last Post: August 21st 2009, 01:51 PM
  3. Replies: 2
    Last Post: February 4th 2009, 06:42 PM
  4. Dataset from same distribution.
    Posted in the Statistics Forum
    Replies: 2
    Last Post: September 13th 2008, 08:21 PM
  5. Dataset for Regression Analysis
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: May 27th 2008, 08:55 PM

Search Tags


/mathhelpforum @mathhelpforum