# Thread: reconciling 2 or more data sets when data volume is varying and wide

1. ## reconciling 2 or more data sets when data volume is varying and wide

Hello,

I am a real estate researcher by trade. I am doing a report on median and mean home prices
in my town over a comparable time period. i.e.- quarter to quarter and year over year.

My problem is this:

for example, in the 4th Qtr of 2009, there was 35 sales with a median price of $335,500 and a mean price of$343,960.

In Q1 of 2010 there were only 11 sales with a median of $349,900 and a mean of$368,245

My problem is comparing two or more data sets, that have a large delta in the number of sales.
In this instance, 35 versus 11. With a such a low volume of sales in Q1 I would think that the data is more volatile, yes?

Is there a formula or solution to 'normalize' or show the difference in the two data sets
with regard to the wide delta of sales. I think that the median and mean prices can't be compared correctly
using such high differential sets of sales numbers, am I right or way off?

I did perform a mean price comparison using a 'trimmed mean' analysis but I wanted something with more bite...
to show the volatility of the data when volume is erratic and to conclude that the median and mean
can only be reliable when sales volume is close to each data set.

Basically, to breakdown the data and make it reliable and comparative.
any ideas?

2. My suggestions would be to compare data sets of the same seasonal preiod. I.e Qtr 4 2009 with Qtr 4 2010. Otherwise you could be introducing a bias in your analysis. It might be known that a certain time of year i.e. summer has a better clearance rate for properties than in winter. You need to be clear in avoiding such facotrs to influence your conclusions.

If you do want to make some inferences between data sets that have a different sample size (and in your case a very small sample size) you can employ a 2 sample t-test for differences in the mean.