hi,

I've build power ranking in excel and I need some help with checking it's accuracy.
Below you see 4 columns in tennis matches. A,B,C,D - where A is PlayerAwin and B is PlayerBwin and respected probabilities. Excel is sorting winners on left hand side so all percentages from column C are winners %.



1 0 60% 40%
1 0 36% 64%
1 0 47% 53%
1 0 43% 57%
1 0 45% 55%
1 0 52% 48%
1 0 44% 56%
1 0 22% 78%
1 0 49% 51%
1 0 45% 55%
1 0 61% 39%
1 0 57% 43%
1 0 53% 47%
1 0 42% 58%
1 0 41% 59%
1 0 51% 49%
1 0 48% 52%

I wanted to check how accurate my model is so I've put percentages into groups and checked how many of them are winning ones within that range.

Range ALL Win %
90-100 158 143 91%
80-90 1654 1405 85%
70-80 4578 3425 75%
60-70 8782 5784 66%
50-60 16076 8902 55%
40-50 15126 6697 44%
30-40 8782 2998 34%
20-30 4576 1153 25%
10-20 1652 249 15%
0-10 158 15 9%

So I checked for example how many times model estimated win probability of any player in a range of 0-10% and how many times that player won.
You can see that there were 158 matches with player estimated to have 10 or less % chance of winning and he won 15 of them.


I've put this on a graph - and used trendline option with displaying R-squared value and it gave me result of r2 =0.9968 which seems a bit high.

Does it make any sense to create groups and then use regression? what's the best way of checking my model?


thanks