Let's say you were given a small sample (n = 15 or so), and you knew the sample mean, and sample variance, but you did not know the distribution.
Would you use a z-test or a t-test?
Using the t-distribution is only allowed if we know the population that the sample came from is approx. normal. So regardless of whether you knew all the stats, unless the distribution of your population was approx. normal, you wouldn't be able to use the t-distribution.
With a sample size that small, your only option is to use the t-distribution (and applicable tests). However, unless you know the underlying distribution of the population, you can't do anything with it.
Remember the central limit theorem is based on the fact that with a large enough sample size (30 or more), no matter what the underlying population distribution, the population of samples means from samples sizes n=(30 or more) will be normally distributed.
The reason the t-distribution works, is because we know the distribution of the population underneath, so whether we have 15 values or 5, those values will still behave as if pulled from a normal population.
Can you write the actual problem you have?
I'm not working on a problem. I was just curious to what I would in that situation if it ever arose.
So same problem, except n = 41. So according the CLT, since n is large enough (41>30), the population distribution is approximately normal, I could then use the t-distribution or the normal distribution?
And if the population distribution was a normal distribution (the problem in my first post), I could either use the t-distribution or the normal distribution?
The population distribution doesn't have to be approximately normal - thats the great thing about the Central Limit Theorem and what it says. So long as the population you are drawing from allows you to draw a sufficiently size sample (and the rule of them is 30 or more), then the NEW population composed of samples size n=N, will have their means normally distributed. That's the big difference you have to square before you go further - when we talk about the CLT, we are talking about two "population": the original population our date comes from, and a NEW population that is composed of random variables - each of these random variables is the MEAN of a 30 item sample pulled from the population we are interested in, and we have done this for EVERY possible combination of values that we could possible pull from the population (so in essence, say we have 1000 items to choose from, 1000C30 - that is the number of items in our NEW population, even though we only started with 1000), and the mean of our new population, will be the same as the mean of the old population. Of course we will have outliers - because think about it; if we are making groups of 30 items, is it possible that there will be a random variable (remember our random variable for the new population is the mean of 30 items), will have 30 items that all are way off from the mean (in the original population) - sure! How likely is it though, that of all the items we could have sampled from that 1000 population, we would get 30 items of the most extreme value?Originally Posted by statsmajor
THAT is what the Central Limit Theorem is about, and what allows us to do hypothesis testing: regardless of the underlying population, what are the chances that if I draw a sample of size "n", that all the values of that sample will give me a mean of some number that differs from the stated mean of my population. So the underlying population could be exponential, it could be uniform, it could be some crazy equation f(x)=e^-e*x-sin(e^x), or whatever - doesn't matter. Large enough population, and you're good to go.
In your case, your sample size is less than 15. So you would have to know the distribution of the population to construct any hypothesis testing.
In your reworked problem if you had a sample size of 41, you could use either a z-test, or a t-test (although a t-test would be slightly pointless as you have population statistics - mean, variance and std).
For small samples (N<6), Student-t is perfect because it's built in with a correction factor, but there are some other methods you will find interesting. Check out the U-test and H-test. They are supprising easy. For testing, I found the non-parametric tests are more practicle. Try non-parametric tests, and you will have a lot of fun. It's like playing rather than toiling. If I were an actuary, I would do non-parametric test before I do serious number crunching. These are usefull for determining whether your sample is random or not. If your sample is not randomly distributed, why bother?