Results 1 to 5 of 5

Math Help - sample size and accuracy of generalizations

  1. #1
    Senior Member
    Joined
    Feb 2008
    Posts
    410

    sample size and accuracy of generalizations

    Hi, all.

    I'd like to ask you folks about sampling, and the relationship between the sample size and the accuracy of a generalization about a population. What I most want to know is, under what circumstances does the accuracy of a generalization depend on the ratio of a sample size to the total population, as opposed to simply the sample size.

    For example...

    Suppose 1,000 squirrels populate a fenced-in park, and you want to determine what percentage of that population has been infected with Yersinia pestis. To estimate this, we have available to us a random sample of n_1 squirrels.

    Now suppose that 10,000 squirrels populate a second fenced-in park, and you also want to determine what percentage of that population is infected. For this population, we have available to us a random sample of n_2 squirrels.

    If n_1=n_2, what can we say about the comparative accuracy of the tests? How is accuracy even measured?

    Also, how large would n_2 have to be, with respect to n_1, in order for the accuracy of the two estimates to be about the same?

    I hope I've been clear enough about what I'm asking, because to be honest I'm unsure how to articulate it. Anyway, your help would be much appreciated!

    Thanks!
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Flow Master
    mr fantastic's Avatar
    Joined
    Dec 2007
    From
    Zeitgeist
    Posts
    16,948
    Thanks
    5
    Quote Originally Posted by hatsoff View Post
    Hi, all.

    I'd like to ask you folks about sampling, and the relationship between the sample size and the accuracy of a generalization about a population. What I most want to know is, under what circumstances does the accuracy of a generalization depend on the ratio of a sample size to the total population, as opposed to simply the sample size.

    For example...

    Suppose 1,000 squirrels populate a fenced-in park, and you want to determine what percentage of that population has been infected with Yersinia pestis. To estimate this, we have available to us a random sample of n_1 squirrels.

    Now suppose that 10,000 squirrels populate a second fenced-in park, and you also want to determine what percentage of that population is infected. For this population, we have available to us a random sample of n_2 squirrels.

    If n_1=n_2, what can we say about the comparative accuracy of the tests? How is accuracy even measured?

    Also, how large would n_2 have to be, with respect to n_1, in order for the accuracy of the two estimates to be about the same?

    I hope I've been clear enough about what I'm asking, because to be honest I'm unsure how to articulate it. Anyway, your help would be much appreciated!

    Thanks!
    First read this: Stats: Estimating the Proportion

    Since \sqrt{\frac{pq}{n}} varies little for small changes in p, the substitution of \hat{p} for p and \hat{q} for q produces little error in calculating the exact value.

    The assumption that the normal distribution is a good approximation for the binomial distribution is OK when n is 'sufficiently large' ..... How large 'sufficiently large' is depends on p ...... The rule of thumb is np > 5 and n(1-p) > 5. So you won't know for sure whether n is sufficiently large since you don't know p. But you can use \hat{p} to get an idea as to whether n was 'sufficiently large'.

    If n is small, I have some thoughts but no time at the moment. I'll continue later unless someone else does.
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Flow Master
    mr fantastic's Avatar
    Joined
    Dec 2007
    From
    Zeitgeist
    Posts
    16,948
    Thanks
    5
    Quote Originally Posted by mr fantastic View Post
    First read this: Stats: Estimating the Proportion

    Since \sqrt{\frac{pq}{n}} varies little for small changes in p, the substitution of \hat{p} for p and \hat{q} for q produces little error in calculating the exact value.

    The assumption that the normal distribution is a good approximation for the binomial distribution is OK when n is 'sufficiently large' ..... How large 'sufficiently large' is depends on p ...... The rule of thumb is np > 5 and n(1-p) > 5. So you won't know for sure whether n is sufficiently large since you don't know p. But you can use \hat{p} to get an idea as to whether n was 'sufficiently large'.

    If n is small, I have some thoughts but no time at the moment. I'll continue later unless someone else does.
    In the small sample case (that is, when n is not large enough to use a normal approximation) finding a confidence interval is tedious.

    I should also point out that when sampling from a small population (that is, the sample size is more than 10% of the population size, say) then the standard error for the proportion involves the finite population correction factor \sqrt{\frac{N-n}{N-1}} where N is the population size and n is the sample size.
    Follow Math Help Forum on Facebook and Google+

  4. #4
    Senior Member
    Joined
    Feb 2008
    Posts
    410
    I only understand about half of what was posted (I'm still wading through calc 3, and haven't made it to stats/probability), but from what I can gather, it seems that in this case the confidence of a \hat{p} score does not depend at all on the total population, but on the sample size n alone.

    So, if n_1=n_2, and \hat{p_1}=\hat{p_2}, then the confidence for each \hat{p} value is the same, no matter how different are the sizes of the population... or do I have this completely wrong?

    I mean, obviously the population sizes would place absolute limits on the error. For example, if you sample 10 members of a population of 100, and you get 5 positive results, then your error cannot be more than 45%, whereas if you had a 5/10 result from a pop. of 1000, your error could technically be as great as 49.5%. When working with larger numbers (in the millions, e.g.), does the ratio of n/pop. matter, or is it simply n that makes a difference, independent of the pop. size?
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Flow Master
    mr fantastic's Avatar
    Joined
    Dec 2007
    From
    Zeitgeist
    Posts
    16,948
    Thanks
    5
    Quote Originally Posted by hatsoff View Post
    I only understand about half of what was posted (I'm still wading through calc 3, and haven't made it to stats/probability), but from what I can gather, it seems that in this case the confidence of a \hat{p} score does not depend at all on the total population, but on the sample size n alone.

    So, if n_1=n_2, and \hat{p_1}=\hat{p_2}, then the confidence for each \hat{p} value is the same, no matter how different are the sizes of the population... or do I have this completely wrong?

    I mean, obviously the population sizes would place absolute limits on the error. For example, if you sample 10 members of a population of 100, and you get 5 positive results, then your error cannot be more than 45%, whereas if you had a 5/10 result from a pop. of 1000, your error could technically be as great as 49.5%. When working with larger numbers (in the millions, e.g.), does the ratio of n/pop. matter, or is it simply n that makes a difference, independent of the pop. size?
    There are different cases that all boil down to the size of the sample, the size of the sample compared to the size of the population, and the (unknown) value of p.

    If the sample size n is 'large', but 'small' compared to the population size, and if p is not too extreme (close to 0 or 1) then the confidence interval depends only on n.
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. sample size
    Posted in the Advanced Statistics Forum
    Replies: 2
    Last Post: November 25th 2011, 03:48 AM
  2. Sample size
    Posted in the Advanced Statistics Forum
    Replies: 3
    Last Post: December 1st 2009, 10:05 AM
  3. Sample Size
    Posted in the Statistics Forum
    Replies: 1
    Last Post: December 9th 2008, 08:26 PM
  4. sample size
    Posted in the Statistics Forum
    Replies: 7
    Last Post: May 12th 2008, 08:22 AM
  5. sample size?
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: February 15th 2008, 01:33 AM

Search Tags


/mathhelpforum @mathhelpforum