Hello,

I have a big collection of sudoku matrices, most of which are easy. I want to find out if the runtime of a sudoku solver on each of these matrices is correlated with certain sudoku difficulty measures. The problem I see is that the correlation coefficient is influenced by the distribution of game difficulties: most of them are easy (>80% have a rating <2 on a scale of 0 to 10). What algorithm can I use to remove instances so that the resulting distribution will be approximately uniform, or what correlation coefficient should I use to get a proper image of the statistical relationship between the solver runtime and the difficulty measures?

Thank you!