hi,

I've build power ranking in excel and I need some help with checking it's accuracy.

Below you see 4 columns in tennis matches. A,B,C,D - where A is PlayerAwin and B is PlayerBwin and respected probabilities. Excel is sorting winners on left hand side so all percentages from column C are winners %.

1 0 60% 40% 1 0 36% 64% 1 0 47% 53% 1 0 43% 57% 1 0 45% 55% 1 0 52% 48% 1 0 44% 56% 1 0 22% 78% 1 0 49% 51% 1 0 45% 55% 1 0 61% 39% 1 0 57% 43% 1 0 53% 47% 1 0 42% 58% 1 0 41% 59% 1 0 51% 49% 1 0 48% 52%

I wanted to check how accurate my model is so I've put percentages into groups and checked how many of them are winning ones within that range.

Range ALL Win % 90-100 158 143 91% 80-90 1654 1405 85% 70-80 4578 3425 75% 60-70 8782 5784 66% 50-60 16076 8902 55% 40-50 15126 6697 44% 30-40 8782 2998 34% 20-30 4576 1153 25% 10-20 1652 249 15% 0-10 158 15 9%

So I checked for example how many times model estimated win probability of any player in a range of 0-10% and how many times that player won.

You can see that there were 158 matches with player estimated to have 10 or less % chance of winning and he won 15 of them.

I've put this on a graph - and used trendline option with displaying R-squared value and it gave me result of r2 =0.9968 which seems a bit high.

Does it make any sense to create groups and then use regression? what's the best way of checking my model?

thanks