I don't understand 'standard error'...
I'm really having a hard time understanding how standard error works and its driving me crazy. Okay, let's start from the top.
-Assume we have a distribution that's none normal. We don't know the population mean or standard deviation since the population is too big and we don't have time to get all the values.
-Now out of that huge population, let's say we take a sample size of 50 values and calculate the mean. We do this repeatedly for 20 times. So now we have 20 sample means. We plot these 20 means to get a sample distribution.
Question 1: In my above example, I only calculated 20 sample means. In statistic textbooks, does it assume that we calculate an infinite amount of sample means and create a continuous distribution? It never mentions this so I'm already confused. And in practice, we can't calculate an infinite amount of sample means. This has been confusing me because we need to know what 'n' is and I don't know if n is the size of the sample (i.e. 50) or the number of samples we take (i.e. 20). I want to say 'n' is 50 but that assumes that the number of samples is infinite, which is impractical. So why learn something that's impractical?
Oh and this brings me to another question. In the textbook, it says that if we know the population standard deviation, then the standard error is just:
Standard error = population standard deviation / sqrt(n)
My question is, why in the world do we even learn this? If we know what the population standard deviation is, that means we must know what the population mean is. So why calculate a sample mean to try to estimate the population mean when we already know what it is?
Ugh...I have so many more questions too but to keep it simple, I'll ask these first!
Thanks for the help!
Re: I don't understand 'standard error'...
The continuous distributions are ones that take an uncountable (infinite) number of values within a given fixed interval and a discrete distribution takes a finite number of values in a fixed interval.
The population distribution is basically the limit as if you had infinitely many samples and calculating its distribution and there are different schools of thought on this, but if you are think of an empirical distribution (i.e one based on data), then the population can be considered as if you have an infinite sample size.
With regards to your question on the population mean, the answer is that it's different for every case.
Sometimes we know (or assume) the underlying distribution (parametric) and sometimes we don't (non-parametric). Sometimes we assume we know some parameters for the population, and sometimes we estimate them which includes all possible combinations of parameters (we may assume we know population variance, but not the mean or vice-versa or know none at all but assume normal).
Statistics is mainly concerned with trying to figure out population values given a sample and if we assume population values then we use them but if not we estimate them.
Sometimes we know (or assume) one parameter and don't know another and the point of doing exercises like this is is that when you have to estimate attributes (this is what statistics is all about) you need to know how to deal with the various kinds of information and how they are used to get the most accurate representation for the stuff you don't actually know (or are trying to estimate).