Results 1 to 6 of 6

Math Help - sample selection based on unequal probability

  1. #1
    Newbie
    Joined
    Oct 2009
    Posts
    3

    sample selection based on unequal probability

    Anyone know of a routine that can be used in Excel to randomly select n elements based on the value of a parameter for the elements? So a higher parameter value would create a higher probability that element would be selected than another element that had a lower value.
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Super Member
    Joined
    Mar 2008
    Posts
    934
    Thanks
    33
    Awards
    1
    Quote Originally Posted by Translate View Post
    Anyone know of a routine that can be used in Excel to randomly select n elements based on the value of a parameter for the elements? So a higher parameter value would create a higher probability that element would be selected than another element that had a lower value.
    Here is a way that I think will work.

    Let's say the item identifiers are in cells A1:Am and their associated weights are in cells B1:Bm.

    1. Put the following formula in cell C1: =B1*rand()

    2. Copy cell C1 and paste into cells C2:Cm.

    3. Copy cells C1:Cm and Paste Special-Values into C1:Cm, over the same cells. This fixes the random numbers in these cells so they now will not change.

    4. Sort the data on column C and select the top n items.
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Newbie
    Joined
    Oct 2009
    Posts
    3

    Thumbs up Works

    Not sure of the theoretical underpinnings, but empirically works great. Thanks.
    Follow Math Help Forum on Facebook and Google+

  4. #4
    Super Member
    Joined
    Mar 2008
    Posts
    934
    Thanks
    33
    Awards
    1
    I have continued to mull over this problem, and I have decided the solution I gave was wrong. What is more, I don't have a correct solution at this time.

    Although the OP did not state the requirement explicitly, I assume the goal of the algorithm is to select a subset of size n, with the probability of selection of each subset being proportional to the sum of the weights of the items in the subset. The algorithm I gave does not do that.

    In particular, suppose there are only 2 elements and we want to select a subset of size 1. If the weights are a and b for items 1 and 2, respectively, then we want item 1 (i.e., the subset consisting of item 1) to be selected with probability a/(a+b) and item 2 to be selected with probability b/(a+b). Instead, it can be shown that the method I proposed will select item 1 with probability a/(2b) if a < b and probability (2a-b)/(2a) if a > b. This isn't what we want, unless a=b, which is the unweighted case.

    I will continue to think about this problem, but so far I haven't come up with a workable algorithm.
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Newbie
    Joined
    Oct 2009
    Posts
    3
    Uh oh. But your initial procedure was so elegant! I would appreciate any other suggestions you or others might devise.
    Follow Math Help Forum on Facebook and Google+

  6. #6
    Super Member
    Joined
    Mar 2008
    Posts
    934
    Thanks
    33
    Awards
    1
    I continued to think about this problem, and I finally came up with a solution.

    As before, let's suppose the item identifiers are in cells A1:Am, and their associated weights are in cells B1:Bm.

    1. Put the following formula into cell C1: = -LN(RAND())/B1

    2. Copy cell C1 and paste into cells C2:Cm.

    3. Copy cells C1:Cm and Paste Special-Values over the original cells C1:Cm. This fixes the values of those cells so they will not change during the next step.

    4. Sort the data in cells A1:Cm in ascending order based on column C.
    Your sample is in the first n rows (which have the lowest values in column C).

    This process is based on two properties of the Exponential distribution.
    First, if X has an Exponential( \lambda) distribution and Y has an Exponential( \mu) distribution, then
    \Pr(X < Y) = \frac{\lambda}{\lambda + \mu}.
    Second, you can simulate a draw from an Exponential( \lambda) distribution on a computer by computing
    X = -\frac{1}{\lambda} \ln U
    where U is a (pseudo-) random number drawn from a Uniform(0,1) distribution.
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. random selection, probability.
    Posted in the Statistics Forum
    Replies: 2
    Last Post: October 12th 2011, 08:16 AM
  2. unequal probability sampling
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: March 27th 2011, 08:51 AM
  3. probability of a selection
    Posted in the Discrete Math Forum
    Replies: 6
    Last Post: November 29th 2010, 05:30 AM
  4. Replies: 1
    Last Post: April 29th 2010, 03:51 AM
  5. probability of selection
    Posted in the Statistics Forum
    Replies: 4
    Last Post: February 1st 2010, 11:49 AM

Search Tags


/mathhelpforum @mathhelpforum