How can I compute correlation coefficient on a non-uniform distribution?
I have a big collection of sudoku matrices, most of which are easy. I want to find out if the runtime of a sudoku solver on each of these matrices is correlated with certain sudoku difficulty measures. The problem I see is that the correlation coefficient is influenced by the distribution of game difficulties: most of them are easy (>80% have a rating <2 on a scale of 0 to 10). What algorithm can I use to remove instances so that the resulting distribution will be approximately uniform, or what correlation coefficient should I use to get a proper image of the statistical relationship between the solver runtime and the difficulty measures?
Re: How can I compute correlation coefficient on a non-uniform distribution?
I don't think you have to alter the data in any way to deal with most of your trials being easy. I have never heard anyone talk about taking into account the distribution of your trials. If you get a really unrealistic fit then you could use the averages of each difficulty time.