Results 1 to 6 of 6

Math Help - Can I improve the accuracy of this method?

  1. #1
    Newbie
    Joined
    Oct 2011
    Posts
    3

    Question Can I improve the accuracy of this method?

    I hope it is OK to post here - I am a total non-expert with only basic maths ability, but with what I think is a tricky problem to solve. I hope someone here may be able to help.

    I need to estimate the value of X (unknown) when I know only the value for Y. To do this I have a list of other occurrences of X and Y, and I have to determining any pattern in the correlation of X to Y. Although X and Y are not totally random, there is no definite or consistent correlation other than there is a tendency for the value of X to be larger if Y is larger. The best that can be hoped for is as accurate a guess as possible. Here is an example:

    1. X = 380 Y = 320
    2. X = 840 Y = 5500
    3. X = 5176 Y = 499
    4. X = 5315 Y = 1074
    5. X = 6484 Y = 140
    6. X = 7244 Y = 5002
    7. X = 10950 Y = 1104
    8. X = 11040 Y = 279
    9. X = 16791 Y = 3001
    10. X = 17096 Y = 2788
    11. X = 20700 Y = 9194
    12. X = 169901 Y = 20886
    13. X = 178322 Y = 94437
    14. X = 184739 Y = 1198
    15. X = 525257 Y = 29129

    X = ? if Y = 1198
    X = ? if Y = 8062
    X = ? if Y = 31419

    As a non-expert, at the moment my best guess at the Xs involves the following process: I divide X by Y for all known values, in the example above this gives results ranging from 0.15 (No.2) to 154.21 (No. 14). I then find the median value across the sample set, in the example above this is 6.13 (represented by No. 10). I then apply this value to Y in the case of unknown X, so in this case I would estimate the 3 unknown Xs to be 7346 and 49420 and 192599.

    I have two questions.

    a) My simple method gets within +-50% in 60% of cases (the actual data sets are larger than the small example above, with anything from 50 to 500 known X and Ys). Is there a more advanced approach that would give more accurate estimates?

    b) If I have to use a formula for estimating 100 unknown X's, and my goal is to get, say, 75% of my estimates to within, say, +-50%. As all values are positive, the most my result can be in error on the negative side is close to -100%, but my method can easily give errors on the positive side in excess of +100%, +200%, even 1,000%. So the scope for positive error is unlimited while the scope for negative error is limited to -100%. Can this fact be harnessed to improve the statistical accuracy and get more results within my target tolerance range?

    For practical purposes any suggestions or solutions ideally need to be executable on Excel.

    Thank you in advance for your patience with this problem and any help.
    Last edited by mr fantastic; October 8th 2011 at 01:45 PM. Reason: Deleted begging in title.
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Grand Panjandrum
    Joined
    Nov 2005
    From
    someplace
    Posts
    14,972
    Thanks
    4

    Re: Can I improve the accuracy of this method?

    Quote Originally Posted by davidwhite View Post
    I hope it is OK to post here - I am a total non-expert with only basic maths ability, but with what I think is a tricky problem to solve. I hope someone here may be able to help.

    I need to estimate the value of X (unknown) when I know only the value for Y. To do this I have a list of other occurrences of X and Y, and I have to determining any pattern in the correlation of X to Y. Although X and Y are not totally random, there is no definite or consistent correlation other than there is a tendency for the value of X to be larger if Y is larger. The best that can be hoped for is as accurate a guess as possible. Here is an example:

    1. X = 380 Y = 320
    2. X = 840 Y = 5500
    3. X = 5176 Y = 499
    4. X = 5315 Y = 1074
    5. X = 6484 Y = 140
    6. X = 7244 Y = 5002
    7. X = 10950 Y = 1104
    8. X = 11040 Y = 279
    9. X = 16791 Y = 3001
    10. X = 17096 Y = 2788
    11. X = 20700 Y = 9194
    12. X = 169901 Y = 20886
    13. X = 178322 Y = 94437
    14. X = 184739 Y = 1198
    15. X = 525257 Y = 29129

    X = ? if Y = 1198
    X = ? if Y = 8062
    X = ? if Y = 31419

    As a non-expert, at the moment my best guess at the Xs involves the following process: I divide X by Y for all known values, in the example above this gives results ranging from 0.15 (No.2) to 154.21 (No. 14). I then find the median value across the sample set, in the example above this is 6.13 (represented by No. 10). I then apply this value to Y in the case of unknown X, so in this case I would estimate the 3 unknown Xs to be 7346 and 49420 and 192599.

    I have two questions.

    a) My simple method gets within +-50% in 60% of cases (the actual data sets are larger than the small example above, with anything from 50 to 500 known X and Ys). Is there a more advanced approach that would give more accurate estimates?

    b) If I have to use a formula for estimating 100 unknown X's, and my goal is to get, say, 75% of my estimates to within, say, +-50%. As all values are positive, the most my result can be in error on the negative side is close to -100%, but my method can easily give errors on the positive side in excess of +100%, +200%, even 1,000%. So the scope for positive error is unlimited while the scope for negative error is limited to -100%. Can this fact be harnessed to improve the statistical accuracy and get more results within my target tolerance range?

    For practical purposes any suggestions or solutions ideally need to be executable on Excel.

    Thank you in advance for your patience with this problem and any help.
    We would need to know more about the process producing these numbers, and would need to see a lot more data to offer any meaningful opinion.

    CB
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Newbie
    Joined
    Oct 2011
    Posts
    3

    Re: Can I improve the accuracy of this method?

    The X and Y values are asset types, and I can have data sets where, for example, I have 500 records in a data set. For 200 records both X and Y values may be known, for the remaining 300 records only Y is known. I'm trying to establish the most accurate method of estimating X in these cases.

    I know there is no consistent relationship between the two values, but there is a tendency / probability (I use these terms as a layperson) for both to increase in size. I notice that X does not always increase with Y, so I suspect part of the problem may involve quantifying the probability for X and Y to both increase, as well as calculating the most probable proportion of one value to the other. Or maybe the approach required is totally different?

    I have attached a larger data set of several hundred records as requested.
    Attached Files Attached Files
    Last edited by davidwhite; October 9th 2011 at 01:31 AM.
    Follow Math Help Forum on Facebook and Google+

  4. #4
    Grand Panjandrum
    Joined
    Nov 2005
    From
    someplace
    Posts
    14,972
    Thanks
    4

    Re: Can I improve the accuracy of this method?

    Quote Originally Posted by davidwhite View Post
    The X and Y values are asset types, and I can have data sets where, for example, I have 500 records in a data set. For 200 records both X and Y values may be known, for the remaining 300 records only Y is known. I'm trying to establish the most accurate method of estimating X in these cases.

    I know there is no consistent relationship between the two values, but there is a tendency / probability (I use these terms as a layperson) for both to increase in size. I notice that X does not always increase with Y, so I suspect part of the problem may involve quantifying the probability for X and Y to both increase, as well as calculating the most probable proportion of one value to the other. Or maybe the approach required is totally different?

    I have attached a larger data set of several hundred records as requested.
    Look at a log-log plot of this data.

    CB
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Newbie
    Joined
    Oct 2011
    Posts
    3

    Re: Can I improve the accuracy of this method?

    I've produced a log-log of the example data set where X and Y values are both known. I have attached the log-log graph generated in Excel using base 10 (Excel default, it can be changed). It produces two Y axis, which is no doubt correct. I'm sure this is the approach and I can see that there is a clustering which appears to hint at potential order. But I have no idea how to use this data to produce an Excel formula which I can then apply to future known Y values to estimate unknown X values.

    I understand there is a degree of 'figure it out yourself' appropriate for students of maths but I'm not - not even a CSE in maths 30 years ago! If this forum isn't the right place for non-mathematicians to get help can you point me in the right direction? Thanks.
    Attached Thumbnails Attached Thumbnails Can I improve the accuracy of this method?-log-log.png  
    Follow Math Help Forum on Facebook and Google+

  6. #6
    Grand Panjandrum
    Joined
    Nov 2005
    From
    someplace
    Posts
    14,972
    Thanks
    4

    Re: Can I improve the accuracy of this method?

    Quote Originally Posted by davidwhite View Post
    I've produced a log-log of the example data set where X and Y values are both known. I have attached the log-log graph generated in Excel using base 10 (Excel default, it can be changed). It produces two Y axis, which is no doubt correct. I'm sure this is the approach and I can see that there is a clustering which appears to hint at potential order. But I have no idea how to use this data to produce an Excel formula which I can then apply to future known Y values to estimate unknown X values.

    I understand there is a degree of 'figure it out yourself' appropriate for students of maths but I'm not - not even a CSE in maths 30 years ago! If this forum isn't the right place for non-mathematicians to get help can you point me in the right direction? Thanks.
    The problem with a non-educational problem that may be used commercially is the MHF cannot do anything that might result in us being liable in any way for losses financial or otherwise of the advice.

    But a linear realationship in a log-log plot implies a relationship of the form:

    Y=k X^{\alpha}

    where k and \alpha are constants.

    CB
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. How to improve math skills?
    Posted in the Math Topics Forum
    Replies: 4
    Last Post: September 27th 2010, 01:53 PM
  2. Improve
    Posted in the Algebra Forum
    Replies: 3
    Last Post: January 25th 2008, 01:32 AM

Search Tags


/mathhelpforum @mathhelpforum