# Statistical Analysis of a Mass Distribution

• Mar 22nd 2010, 01:56 PM
hyperchondriac
Statistical Analysis of a Mass Distribution
Hi all,

I'm looking at the analysis of a sample of a mixture of materials, with the aim of determining the minimum sample size required in order to conform to a certain error. The basics are:

- I assume that the population of my mixture is infinite/very large

- There are N fractions, with the ith fraction having an estimated mass fraction and average mass of $\displaystyle x_i$ and $\displaystyle m_i$ respectively. Obviously, the sum of all the mass fractions is 1.

What could I use to determine my minimum sample size with this info? Or would I need something else? I have tried to use some sort of Binomial approximation to the number fractions, but I really don't thing it's particularly good! Is there any way I could do it by considering the sample as a combination of a series of normal distributions?

Thanks in advance - I've been banging my head on the keyboard for about three weeks over this!
• Mar 22nd 2010, 02:56 PM
TKHunny
You lost me with "population of my mixture". I'm realtively sure that is not well defined.

Are you chopping up your mixture into discrete samples? If so, why would expect the constituents of the various samples to be other than Normal?

Are you measuring proportions in each sample? Are you somehow counting something in each sample?

Let's first define the experiment instead of continuing to shoot in the dark
• Mar 22nd 2010, 11:51 PM
hyperchondriac

OK, I'll try and go into some more detail. My 'mixture' is essentially a load of mixed plastic, metal, etc. I have an enormous mound (population) of this stuff out of which I'll take a sample. I then sort that sample into various fractions (plastic type A, plastic type B, copper, paper...) and weigh each fraction to get a mass distribution.

So, if say my 200 kg sample is 25% by weight copper, what can I say about the copper fraction in the population? Assuming my sampling method is ok. Or, if I want to know the mass fraction of each component to within +/-10% (relative %, not absolute), how much sample do I need to take?

The Binomial approximation came about by thinking of the sample sorting as picking an item from the population one at a time with an estimated probablilty of picking a piece of copper. I know this relies on the number distribution, not mass distribution, and I wasn't too happy with it, but it was the only way I could think of tackling it at the time.

I hope this makes a bit more sense.
• Mar 23rd 2010, 06:04 PM
TKHunny
Good deal. So you are creating discrete samples and measuring the relaitive masses of the constituent parts.

Here is how you would find a useful sample size:

First, you need independent samples. It will NOT do to take them from the same part of the pile. Even all from the exterior might be insufficient. You simply must figure out how to randomize it.

A = Expected proportion of the smallest mass proportion.
B^2 = Variance of the smallest mass proportion.
C = Confidence Coefficient (Usually around 2)
n = Sample Size

$\displaystyle A - C\cdot \frac{B}{\sqrt{n}} > 0$

It is a relatively simple algebra problem to solve for 'n'. Don't worry about the inequality. Sove the related equality and pick the next greater integer.

You may have some idea of what is in the pile. I hope so! Is there any pitch-blend in there? I would want to know that before I started digging around?

Lacking some a priori information concerning the constituents, you can do some test samples to get an idea.

After selecting a sample size as I have indicated above, you will have to give a look to any portion with variance greater than B^2. If you find that at your confidence level something could be zero, that is likely to be an inappropriate result. Of course, you may wish to worry about the top, too. 105% probably isn't a great result, either.
• Mar 23rd 2010, 11:58 PM
hyperchondriac
Thanks very much - that's helped a lot. Sorry to be a pain, but could you tell me how I'd go about deriving the equation, or where I'd find some sort of reference to it? I need to produce a short note on the maths behind the method to prove to our clients that it's their fault the composition has changed, and not down to our sampling method. (Edit: You don't need to give the full deriviation, just an idea of where to start from and I'll have a crack at it myself.)

As for the variance - do you think we could do maybe 5 10 kg samples to get a rough estimate?

Yes, we generally know what's in our piles. Just nice stuff like rusty wires and sharp metal!

Thanks again.
• Mar 24th 2010, 06:08 PM
TKHunny
I misspoke a bit. That "confidence coefficient" probably would not be called that. The confidence coefficient might be used to determine the value of C. Still, without much information, C probably should not wander very far from two (2).

Oh, probably any standard textbook on statistics would have at least a hint how to proceed. I wouldn't get a third grade version (and they do exist). Find one with some calculus in it. That would be your best bet. In such a text book, my "C" lokely would be called a z-score or a t-score.

5 10 kg would be a good start. If I were doing it, I would do it progressively. Do 5 first. See what the rough approximations give you. This will also give you experience in conducting the survey. If it's easier than you thought, maybe you won't mind doing more samples. Of course, it could be very much more difficult than expected, too. I have little doubt that the arithmetic will ask for at least 10 more. Do 10 more, even if it asks for 500, and adjust your calculations. This will give you 15 observations and quite a bit more confidence in the result. Of course, I don't know how hard it is to take these samples. 15 may be massively oppressive. This is an important consideration. How many 10 kg samples are in your pile? 100s? 1000s? 10000s? More? Is it the size of Connecticut?
• Mar 27th 2010, 12:58 AM
hyperchondriac
You've helped me realised that I've been thinking way too hard into this, and completely missed the obvious! All I need to do is a t-test on the samples I take, compared to my estimated value, no?

I don't think taking 10/15/20 samples would be too prohibitive, or we could just take a few large samples.

Just to check - the 'sample size' is in kg (or whatever mass unit), yes?
• Mar 27th 2010, 05:03 AM
TKHunny
t-test is likely to over-suggest on the sample size. If you're okay with that, then you're on your way.

I did give some thought to sample units, whether it should be volume or mass. I came up with only a couple of points:

1) The arithmetic is a little trickier if you're not consistent. A 10 kg sample and a 5 kg sample don't combine quite as easily as three 5 kg samples, for instance. I once read a Master's Thesis on combining computer simulations with particularly desctrucive experiments. Suffice it to say that it's a rather tricky business. Be consistent and eliminate your need to worry about it.

2) It's probably harder to judge what a sample is if you try to use volume. You'll have to include "air" as one of your constituents. I'd use mass.

3) Remember to rendomize. Don't take the samples from the same location. Walk around the pile. Use a ladder or a bucket. Pluck from the surface and dig deep.

4) You may discover that your sample unit is too small. For example, one sample may have only iron filings and another only old car parts and another just razor blades. A bunch of disparate 100%s aren't likely to do you much good. You may wish to stir the pile and increase the sample size.

Experimental Design is an interesting field. It is lack of careful thought on such things that leads to much of the junk science we hear on the news - usually intended to frighten someone. Good call on not being part of the silly crowd of thoughtless experimenters.
• Mar 28th 2010, 09:08 AM
hyperchondriac
Quote:

Originally Posted by TKHunny
Good call on not being part of the silly crowd of thoughtless experimenters.

Thank you!

I'm going to be away for the next week, so I'll let you know what's happening with it (if you're interested!) then.