Results 1 to 9 of 9
Like Tree1Thanks
  • 1 Post By chiro

Math Help - Unequal data sets

  1. #1
    Newbie
    Joined
    Jul 2012
    From
    manchester
    Posts
    9

    Unequal data sets

    Hi

    I am looking at some data which looks at 2 sets of data which have the same attributes but are not equal in their size, one group is 6000+ and the other is about 160.

    I am trying to look at the difference in the attributes between the 2 sets so I have done some descriptive statistcs, a t-test with unmatched and also building some histograms for both sets.

    I am just unsure because of the big size differences between the groups that I am going to come to any valid conclusion.

    Is there any advice anyone can give?

    Regards

    Alex
    Follow Math Help Forum on Facebook and Google+

  2. #2
    MHF Contributor
    Joined
    Sep 2012
    From
    Australia
    Posts
    3,667
    Thanks
    608

    Re: Unequal data sets

    Hey aivoryuk.

    What are you trying to look at specifically (i.e. the attributes)? What are you trying to answer? What is the directed goal of your experiment?
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Newbie
    Joined
    Jul 2012
    From
    manchester
    Posts
    9

    Re: Unequal data sets

    H

    I have 2 data sets which have the same 24 attributes/columns etc. I would like to compare each attribute/column between the 2 data sets to see if there is a significant difference between the values.

    Once I have ascertained that their is a difference for each attribute, I would like to know what that difference is. It maybe for one attribute that the 1st data set has a higher average value than the other.

    With the data sets being so different in size I am not sure if this would skew the interpretation.
    Follow Math Help Forum on Facebook and Google+

  4. #4
    MHF Contributor
    Joined
    Sep 2012
    From
    Australia
    Posts
    3,667
    Thanks
    608

    Re: Unequal data sets

    What kind of difference though? What is the data type? Is it continuous or discrete data? Categorical data? Are you just comparing each attribute in each data set? What kind of test do you want to use? Have you checked whether your data and the context of the investigation support this?

    If you are testing differences then the typical techniques include 2-sample t-tests and the non-parametric equivalents. T-tests assume all data points are independent, the sample mean as a rough normal distribution (large enough samples ensure this by CLT) and that sample variance is roughly chi-square (same sort of argument) and that the sample mean and sample variance are independent to each other (they may not be).

    If these are met consider doing a t-test and take note that t-tests allow for different sample sizes in each sample.

    Also consider whether you want to use a pooled test, paired test or un-paired/un-pooled test. If there is a relationship between the data sets then consider using a paired t-test.
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Newbie
    Joined
    Jul 2012
    From
    manchester
    Posts
    9

    Re: Unequal data sets

    Hi thanks for your time

    yes the data is continuous and I am compating each attribute from each so as there are 24 attributes I would be performing 24 t-tests (or whatever test is appropriate).

    It would appear that t-tests are the way to go?
    Follow Math Help Forum on Facebook and Google+

  6. #6
    MHF Contributor
    Joined
    Sep 2012
    From
    Australia
    Posts
    3,667
    Thanks
    608

    Re: Unequal data sets

    If you have reason to believe the data is Poisson, exponential, or one of those other distributions where mean and variance are related then you shouldn't use this so you may want to take a look at the histogram and see if there is any particular pattern there before you continue.
    Follow Math Help Forum on Facebook and Google+

  7. #7
    Newbie
    Joined
    Jul 2012
    From
    manchester
    Posts
    9

    Re: Unequal data sets

    Quote Originally Posted by chiro View Post
    If you have reason to believe the data is Poisson, exponential, or one of those other distributions where mean and variance are related then you shouldn't use this so you may want to take a look at the histogram and see if there is any particular pattern there before you continue.
    Thanks I have just completed the Histograms for all the attributes.
    Here is aan example
    Set A
    Bin Frequency Cumulative %
    0 5543 83.71%
    1 796 95.73%
    2 224 99.11%
    3 48 99.83%
    4 7 99.94%
    5 4 100.00%
    6 0 100.00%
    More 0 100.00%

    Set B
    Bin Frequency Cumulative %
    0 125 77.16%
    1 29 95.06%
    2 8 100.00%
    3 0 100.00%
    4 0 100.00%
    5 0 100.00%
    6 0 100.00%
    More 0 100.00%

    the majority of the histograms have this pattern.

    Where would this data fall in regards to the patterns you have described.

    Thanks for your help so far.
    Follow Math Help Forum on Facebook and Google+

  8. #8
    MHF Contributor
    Joined
    Sep 2012
    From
    Australia
    Posts
    3,667
    Thanks
    608

    Re: Unequal data sets

    One good indicator of distributions with mean tied to variance is the skewness of that distribution. If you can get a statistical measure of the kurtosis or skewness of the distribution then that would be a good indicator to report.

    There is no certainty in this kind of thing when you are dealing with data, but it's always wise to make sure things don't clearly violate the assumptions of techniques to make them useless.
    Thanks from aivoryuk
    Follow Math Help Forum on Facebook and Google+

  9. #9
    Newbie
    Joined
    Jul 2012
    From
    manchester
    Posts
    9

    Re: Unequal data sets

    Quote Originally Posted by chiro View Post
    One good indicator of distributions with mean tied to variance is the skewness of that distribution. If you can get a statistical measure of the kurtosis or skewness of the distribution then that would be a good indicator to report.

    There is no certainty in this kind of thing when you are dealing with data, but it's always wise to make sure things don't clearly violate the assumptions of techniques to make them useless.
    Thanks , I have the kurtosis and Skewness for the measures so will look into that and try to tie it all together.

    Thanks for you help
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Replies: 1
    Last Post: July 4th 2012, 02:02 PM
  2. Finding Data Sets
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: September 30th 2011, 06:55 AM
  3. Replies: 9
    Last Post: November 6th 2010, 12:47 PM
  4. Replies: 2
    Last Post: July 6th 2010, 06:33 PM
  5. Correlation between 2 sets of data!
    Posted in the Statistics Forum
    Replies: 0
    Last Post: December 7th 2009, 03:21 PM

Search Tags


/mathhelpforum @mathhelpforum