# greatest distribution of samples

• Jun 22nd 2006, 03:29 AM
melysion
greatest distribution of samples
Hi there

OK, this may be a little tricky to explain so bear with me.

Imagine a 'class' that contains samples. Each sample may, or may not, be related to other samples in different classes.

I have a number of bar charts - one for each 'class' - that show how many samples is related to other samples in a different class.

Data looks like this

Class 1 data:

0 345
1 564
2 34
6 12

and so on

What the above tells you is that for class 1, there are 345 samples that are not related to any samples in a different class, 34 samples that are related to 2 samples in a different class and so on ...

I need to find a method that tells me which class has samples that are related to the greatest number of samples from other classes and also get a general feeling for the distribution of samples being related to other samples in different classes for each graph.

I considered standard deviation but have a feeling that won't tell me much.

Thanks a lot for any help
• Jun 22nd 2006, 04:34 AM
Quick
Do you have the data for all the classes? it seems to me that your first question can be answered just by looking at all the data.

Ignore this: ${\small{\boxed{\begin{array}{ccc}\mathfrak{Q}&\mat hcal{U}\\\mathrm{I}&\mathbb{C}\\\mathbf{K}&\mathds {!!!}\end{array}}}}$
• Jun 22nd 2006, 05:20 AM
JakeD
Quote:

Originally Posted by melysion
I need to find a method that tells me which class has samples that are related to the greatest number of samples from other classes and also get a general feeling for the distribution of samples being related to other samples in different classes for each graph.

Hi, melysion.

For the method, you could compare the mean number of related samples. For example, suppose this is the data.

Code:

Class 1                  Class 2 0    50                0    100 1    25                1    60 2    15                2    20    3    10                3    20 N    100                N    200 Sum  85                Sum  160 Mean .85                Mean .80
Class 1 has 100 samples which are related to a total of 85 samples from other classes. This is a mean of .85 related samples per sample. Class 2 has more samples and related samples, but the mean of .80 related samples per sample is lower. So Class 1 is on average related to a greater number of samples from other classes.

To get a general feeling for the distribution, you could combine all the samples and create a bar chart for that.
• Jun 22nd 2006, 05:44 AM
melysion
Thanks very much ;)