Results 1 to 2 of 2

Math Help - Filtering the top 5% in real-time

  1. #1
    Newbie
    Joined
    Jan 2008
    Posts
    9

    Question Filtering the top 5% in real-time

    Hi,
    I am a computing student, and I wanted some advice on how to solve a problem using maths.

    I have a program which receives numbers from an external source. Currently the program stores a histogram of these numbers, for example the value of the number, as well as the frequency of its occurrence. To clarify I could receive the following numbers:
    1 10 2 3 9 1 9 1 4 5,
    and then I would make the table:
    1 3
    9 2
    2 1
    3 1
    4 1
    5 1
    10 1

    After collecting 800 numbers from one experiment I find that ~80% have a frequency of 1, ~95% less than 5, and the remaining 5% are between 5 and 35. The CDF of these frequencies fits a Weibull distribution quite well.

    Now the problem I'm having is that after collecting 800 numbers I have a table with ~500 entries, whereas I'm only really interested in the top 5\% (40 entries). I want to decrease the size of the table I need to store the values. Is there a way to filter out some numbers as they arrive and not insert small values into my table. Or some way to prune the table as it grows to remove the smaller values without throwing away a potentially larger value. I don't mind if I lose the frequencies of the values, just as long as I know which are more popular.

    After writing this down it seems clearer to maybe state the problem as: Values are randomly generated from a Weibull distribution, how many values do I need before I can identify the top 5%?

    Would sampling techniques be of any use? I need this to occur quickly in real-time, i.e. I still want to be able to identify the top 5\% after say 100 numbers.

    thanks for any advice or suggestions. I'm happy to be sent away to read up on specific topics.

    Andrew
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Newbie Firefly's Avatar
    Joined
    Jul 2008
    Posts
    3
    I think what might help you here is calculating the sample size needed for an accurate (enough) prediction of the distribution. This tells you after how many numbers you can stop counting.

    Check Sample size - Wikipedia, the free encyclopedia
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. curve fitting of real time flight position data
    Posted in the Advanced Math Topics Forum
    Replies: 6
    Last Post: September 13th 2010, 05:00 AM
  2. particle filtering
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: December 9th 2009, 10:41 PM
  3. Geometric Proofs= this time for real
    Posted in the Geometry Forum
    Replies: 7
    Last Post: August 10th 2009, 08:37 PM
  4. is there a real time preview of latex code
    Posted in the LaTeX Help Forum
    Replies: 3
    Last Post: May 2nd 2008, 08:34 PM
  5. Real Time Mentorship
    Posted in the Math Forum
    Replies: 0
    Last Post: October 17th 2005, 06:48 PM

Search Tags


/mathhelpforum @mathhelpforum