# Thread: Adjusting Probabilities To Match Empirical CDF

1. ## Adjusting Probabilities To Match Empirical CDF

I have two data sets and can calculate the empirical CDF accordingly for each in Matlab. However, the ecdf's may or may not be similar to each other. My end goal is to force one data set to match the ecdf of the other. However, each ecdf does not necessarily match some known distribution (exponential, weibull, etc) very well, so I cannot rely on using inverse functions either (something some previous work has done - but I don't quite believe their assumptions that their ecdf matched the chosen distribution well enough). I also cannot reduce the number of instances of a certian data value in the set I am modifying - only increase them. This means I cannot simply have the two sets contain the same number of values and change all values in set 1 to match that of set 2. In the end, this means my modified set 1 will be larger than set 2, but overall the individual probabilities should still line up.

I'm not sure if there is a way or method that already does this for me, but it sounds very similar to a problem when tuning a floating tremelo guitar. By increasing or decreasing certain strings to try to get those individual notes to match, it affects all other strings. In the end, they all need to be balanced...

If we ignore the two ecdf and just try to make the probabilities of each value within the two data sets match up, and we want modify data set 1 to match data set 2. Say f1(1) = 10% and f2(1) = 15%. Then we need to add more 1 values to data set 1 until it reaches 15%. Unfortunately, after we do this, all other probabilities have changed as well. Now we do f1(2) = 8% and f2(2) = 11%. So we increase the number of 2's in data set 1. However, this affects all other values, and the tuned value for 1's has just been "untuned".

In the end, I need to do this process on tens to hundreds of possible data values, all aligning to the proper probabilities. If I could get the probabilities to match within each data set, then the CDF would naturally match as well.

Is there already an algorithm / method in matlab / something that would make my life easier in solving this? Otherwise I'm about to start doing some major coding. Bleh.

2. Originally Posted by superman859
I have two data sets and can calculate the empirical CDF accordingly for each in Matlab. However, the ecdf's may or may not be similar to each other. My end goal is to force one data set to match the ecdf of the other. However, each ecdf does not necessarily match some known distribution (exponential, weibull, etc) very well, so I cannot rely on using inverse functions either (something some previous work has done - but I don't quite believe their assumptions that their ecdf matched the chosen distribution well enough). I also cannot reduce the number of instances of a certian data value in the set I am modifying - only increase them. This means I cannot simply have the two sets contain the same number of values and change all values in set 1 to match that of set 2. In the end, this means my modified set 1 will be larger than set 2, but overall the individual probabilities should still line up.

I'm not sure if there is a way or method that already does this for me, but it sounds very similar to a problem when tuning a floating tremelo guitar. By increasing or decreasing certain strings to try to get those individual notes to match, it affects all other strings. In the end, they all need to be balanced...

If we ignore the two ecdf and just try to make the probabilities of each value within the two data sets match up, and we want modify data set 1 to match data set 2. Say f1(1) = 10% and f2(1) = 15%. Then we need to add more 1 values to data set 1 until it reaches 15%. Unfortunately, after we do this, all other probabilities have changed as well. Now we do f1(2) = 8% and f2(2) = 11%. So we increase the number of 2's in data set 1. However, this affects all other values, and the tuned value for 1's has just been "untuned".

In the end, I need to do this process on tens to hundreds of possible data values, all aligning to the proper probabilities. If I could get the probabilities to match within each data set, then the CDF would naturally match as well.

Is there already an algorithm / method in matlab / something that would make my life easier in solving this? Otherwise I'm about to start doing some major coding. Bleh.
I would consider Kernel Density Estimation and then see what I could do with the resulting density.

CB