Results 1 to 2 of 2

Thread: Filtering the top 5% in real-time

  1. #1
    Jan 2008

    Question Filtering the top 5% in real-time

    I am a computing student, and I wanted some advice on how to solve a problem using maths.

    I have a program which receives numbers from an external source. Currently the program stores a histogram of these numbers, for example the value of the number, as well as the frequency of its occurrence. To clarify I could receive the following numbers:
    1 10 2 3 9 1 9 1 4 5,
    and then I would make the table:
    1 3
    9 2
    2 1
    3 1
    4 1
    5 1
    10 1

    After collecting 800 numbers from one experiment I find that ~80% have a frequency of 1, ~95% less than 5, and the remaining 5% are between 5 and 35. The CDF of these frequencies fits a Weibull distribution quite well.

    Now the problem I'm having is that after collecting 800 numbers I have a table with ~500 entries, whereas I'm only really interested in the top 5\% (40 entries). I want to decrease the size of the table I need to store the values. Is there a way to filter out some numbers as they arrive and not insert small values into my table. Or some way to prune the table as it grows to remove the smaller values without throwing away a potentially larger value. I don't mind if I lose the frequencies of the values, just as long as I know which are more popular.

    After writing this down it seems clearer to maybe state the problem as: Values are randomly generated from a Weibull distribution, how many values do I need before I can identify the top 5%?

    Would sampling techniques be of any use? I need this to occur quickly in real-time, i.e. I still want to be able to identify the top 5\% after say 100 numbers.

    thanks for any advice or suggestions. I'm happy to be sent away to read up on specific topics.

    Follow Math Help Forum on Facebook and Google+

  2. #2
    Newbie Firefly's Avatar
    Jul 2008
    I think what might help you here is calculating the sample size needed for an accurate (enough) prediction of the distribution. This tells you after how many numbers you can stop counting.

    Check Sample size - Wikipedia, the free encyclopedia
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. curve fitting of real time flight position data
    Posted in the Advanced Math Topics Forum
    Replies: 6
    Last Post: Sep 13th 2010, 05:00 AM
  2. particle filtering
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: Dec 9th 2009, 10:41 PM
  3. Geometric Proofs= this time for real
    Posted in the Geometry Forum
    Replies: 7
    Last Post: Aug 10th 2009, 08:37 PM
  4. is there a real time preview of latex code
    Posted in the LaTeX Help Forum
    Replies: 3
    Last Post: May 2nd 2008, 08:34 PM
  5. Real Time Mentorship
    Posted in the Math Forum
    Replies: 0
    Last Post: Oct 17th 2005, 06:48 PM

Search Tags

/mathhelpforum @mathhelpforum