We would need to know more about the process producing these numbers, and would need to see a lot more data to offer any meaningful opinion.I hope it is OK to post here - I am a total non-expert with only basic maths ability, but with what I think is a tricky problem to solve. I hope someone here may be able to help.

I need to estimate the value of X (unknown) when I know only the value for Y. To do this I have a list of other occurrences of X and Y, and I have to determining any pattern in the correlation of X to Y. Although X and Y are not totally random, there is no definite or consistent correlation other than there is a tendency for the value of X to be larger if Y is larger. The best that can be hoped for is as accurate a guess as possible. Here is an example:

1. X = 380 Y = 320

2. X = 840 Y = 5500

3. X = 5176 Y = 499

4. X = 5315 Y = 1074

5. X = 6484 Y = 140

6. X = 7244 Y = 5002

7. X = 10950 Y = 1104

8. X = 11040 Y = 279

9. X = 16791 Y = 3001

10. X = 17096 Y = 2788

11. X = 20700 Y = 9194

12. X = 169901 Y = 20886

13. X = 178322 Y = 94437

14. X = 184739 Y = 1198

15. X = 525257 Y = 29129

X = ? if Y = 1198

X = ? if Y = 8062

X = ? if Y = 31419

As a non-expert, at the moment my best guess at the Xs involves the following process: I divide X by Y for all known values, in the example above this gives results ranging from 0.15 (No.2) to 154.21 (No. 14). I then find the median value across the sample set, in the example above this is 6.13 (represented by No. 10). I then apply this value to Y in the case of unknown X, so in this case I would estimate the 3 unknown Xs to be 7346 and 49420 and 192599.

I have two questions.

a) My simple method gets within +-50% in 60% of cases (the actual data sets are larger than the small example above, with anything from 50 to 500 known X and Ys). Is there a more advanced approach that would give more accurate estimates?

b) If I have to use a formula for estimating 100 unknown X's, and my goal is to get, say, 75% of my estimates to within, say, +-50%. As all values are positive, the most my result can be in error on the negative side is close to -100%, but my method can easily give errors on the positive side in excess of +100%, +200%, even 1,000%. So the scope for positive error is unlimited while the scope for negative error is limited to -100%. Can this fact be harnessed to improve the statistical accuracy and get more results within my target tolerance range?

For practical purposes any suggestions or solutions ideally need to be executable on Excel.

Thank you in advance for your patience with this problem and any help.

CB