Testing Statistical Significance - Hypothesis Tests
In basic terms statistics can be broken up into two categories being either descriptive or inferential.
Descriptive statistics are often used to describe either the spread or central tendency of a data set. Examples of these are the mean, median, proportion, standard deviation or the inter quartile range.
Inferential statistics allow us to make a decision relating to these descriptive statistics for the entire population when given only sample data. This is most important as we don’t often get the chance to sample an entire population. The hypothesis test allows us to use sample parameters and apply to a population.
Before continuing it is assumed the reader has an understanding of basic probability and statistics and basic probability distributions. This includes the central limit theorem and the distribution of the sample mean.
So how the data is tested, how is the test significant?
A test is said to be significant if the sample data reveals a result different to what is already understood or expected. For example the mean number of aces a tennis player John serves per set might be 5. After observing John play it might be thought he actually serves more than the hypothesised 5. If our test reveals this then it is said to be significant. We use levels of significance denoted by $\displaystyle \alpha$ which delivers a confidence level of $\displaystyle (1-\alpha)$ usually expressed as a percentage.
There are three ways to perform the hypothesis test, using:- A confidence interval
- A Test statistic, or
- A p-value
The demonstrations shown here will use the method of employing a test statistic.
So let’s discuss the process and see an example.
The following steps are taken
1. Formulate the test.
State a ‘Null hypothesis’ $\displaystyle H_0$ – What is known or what is to be disproven, then state an Alternate (or Research) Hypothesis $\displaystyle H_A$ – The new hypothesised value.
In the example above using John the tennis player we would say
$\displaystyle H_0:\mu = 5$
$\displaystyle H_A:\mu >5$
2. Consider any Assumptions
This is one of the most important and often understated steps, in most practice problems you will be told the distribution of the data and what parameters you have available. I.e. you may be told that the population is normal and given a sample mean and standard deviation. If you don’t have this information you can find these things yourself before returning to perform the test.
3. Decide and calculate a test statistic, decide the level of significance.
From your assumptions choose and construct a test statistic. This is a calculation about your hypothesised values and given parameters
4. Calculate a critical value
The critical value is what is expected given the distribution of your data set, this is found from a statistical table that you will compare to your test statistic.
5. Decide to accept or reject the null hypothesis and state a conclusion about the test.
If |Test Statistic|< |Critical Value| do not reject $\displaystyle H_0$ otherwise you can reject the $\displaystyle H_0$ in favour of $\displaystyle H_A$
In our example if |Test Statistic|< |Critical Value| we would conclude there is no evidence to conclude $\displaystyle \mu >5$ at the desired level of significant. I.e. the test is not significant.
This will all make more sense after reading through an example.
Some common one sample Hypothesis Tests
One Sample z-test
When one independent sample of scores is taken that is either large (>30) or normally distributed and the population standard deviation is known then a one sample z-test can be used.
Given $\displaystyle H_0: \mu = \mu_0$ and $\displaystyle H_A: \mu \neq \mu_0$
The test statistic is $\displaystyle \displaystlye z_{\text{calc}} = \frac{\bar{x}-\mu_0}{\frac{\sigma}{\sqrt{n}}} $
The critical value is $\displaystyle \displaystlye z_{\text{crit}} = z_{\frac{\alpha}{2}}$
If $\displaystyle \displaystlye |z_{\text{calc}}| < |z_{\text{crit}}| $ do not reject $\displaystyle H_0$
One Sample t-test
When one independent sample of scores is taken that is normally distributed and the population standard deviation is unknown then a one sample t-test can be used.
Given $\displaystyle H_0: \mu = \mu_0$ and $\displaystyle H_A: \mu \neq \mu_0$
The test statistic is $\displaystyle \displaystlye t_{\text{calc}} = \frac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}} $
The critical value is $\displaystyle \displaystlye t_{\text{crit}} = t_{\frac{\alpha}{2},n-1}$
If $\displaystyle \displaystlye |t_{\text{calc}}| < |t_{\text{crit}}| $ do not reject $\displaystyle H_0$
Stay tuned, my next post will show some examples of these tests...