# Hypothesis Testing

• December 28th 2010, 07:48 PM
pickslides
Hypothesis Testing
Testing Statistical Significance - Hypothesis Tests

In basic terms statistics can be broken up into two categories being either descriptive or inferential.

Descriptive statistics are often used to describe either the spread or central tendency of a data set. Examples of these are the mean, median, proportion, standard deviation or the inter quartile range.

Inferential statistics allow us to make a decision relating to these descriptive statistics for the entire population when given only sample data. This is most important as we don’t often get the chance to sample an entire population. The hypothesis test allows us to use sample parameters and apply to a population.

Before continuing it is assumed the reader has an understanding of basic probability and statistics and basic probability distributions. This includes the central limit theorem and the distribution of the sample mean.

So how the data is tested, how is the test significant?

A test is said to be significant if the sample data reveals a result different to what is already understood or expected. For example the mean number of aces a tennis player John serves per set might be 5. After observing John play it might be thought he actually serves more than the hypothesised 5. If our test reveals this then it is said to be significant. We use levels of significance denoted by $\alpha$ which delivers a confidence level of $(1-\alpha)$ usually expressed as a percentage.

There are three ways to perform the hypothesis test, using:
1. A confidence interval
2. A Test statistic, or
3. A p-value
The demonstrations shown here will use the method of employing a test statistic.

So let’s discuss the process and see an example.

The following steps are taken

1. Formulate the test.
State a ‘Null hypothesis’ $H_0$ – What is known or what is to be disproven, then state an Alternate (or Research) Hypothesis $H_A$ – The new hypothesised value.

In the example above using John the tennis player we would say

$H_0:\mu = 5$

$H_A:\mu >5$

2. Consider any Assumptions

This is one of the most important and often understated steps, in most practice problems you will be told the distribution of the data and what parameters you have available. I.e. you may be told that the population is normal and given a sample mean and standard deviation. If you don’t have this information you can find these things yourself before returning to perform the test.

3. Decide and calculate a test statistic, decide the level of significance.

From your assumptions choose and construct a test statistic. This is a calculation about your hypothesised values and given parameters

4. Calculate a critical value

The critical value is what is expected given the distribution of your data set, this is found from a statistical table that you will compare to your test statistic.

5. Decide to accept or reject the null hypothesis and state a conclusion about the test.
If |Test Statistic|< |Critical Value| do not reject $H_0$ otherwise you can reject the $H_0$ in favour of $H_A$

In our example if |Test Statistic|< |Critical Value| we would conclude there is no evidence to conclude $\mu >5$ at the desired level of significant. I.e. the test is not significant.

This will all make more sense after reading through an example.

Some common one sample Hypothesis Tests

One Sample z-test

When one independent sample of scores is taken that is either large (>30) or normally distributed and the population standard deviation is known then a one sample z-test can be used.

Given $H_0: \mu = \mu_0$ and $H_A: \mu \neq \mu_0$

The test statistic is $\displaystlye z_{\text{calc}} = \frac{\bar{x}-\mu_0}{\frac{\sigma}{\sqrt{n}}}$

The critical value is $\displaystlye z_{\text{crit}} = z_{\frac{\alpha}{2}}$

If $\displaystlye |z_{\text{calc}}| < |z_{\text{crit}}|$ do not reject $H_0$

One Sample t-test

When one independent sample of scores is taken that is normally distributed and the population standard deviation is unknown then a one sample t-test can be used.

Given $H_0: \mu = \mu_0$ and $H_A: \mu \neq \mu_0$

The test statistic is $\displaystlye t_{\text{calc}} = \frac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}}$

The critical value is $\displaystlye t_{\text{crit}} = t_{\frac{\alpha}{2},n-1}$

If $\displaystlye |t_{\text{calc}}| < |t_{\text{crit}}|$ do not reject $H_0$

Stay tuned, my next post will show some examples of these tests...
• December 29th 2010, 03:52 PM
pickslides
Examples of a one sample z-test and t-test
Here's some examples to follow the explanations in post #1

Example 1

A fast food restaurant claims its average time to take an order is 60 seconds. A sample of 50 customer orders were timed which returned an average of 72 seconds, it is known that the population standard deviation is 12 seconds for such orders. Does this sample support the restaurant's claim?

Test the claim using a significance level of $\displaystyle \alpha = 0.05$

As the population standard deviation is known and the sample is large >30 then we employ a z-test.

The hypothesised problem formulation will be:

$\displaystyle H_0:\mu = 60$
$\displaystyle H_A:\mu \neq 60$

The test statistic will be:

$\displaystyle z_{calc} = \frac{\bar{x}-\mu_0}{ \frac{\sigma}{\sqrt{n}}}=\frac{72-60}{ \frac{12}{\sqrt{50}}} = \frac{12}{ \frac{12}{\sqrt{50}} }= 7.07
$

The critical value is:

$\displaystyle z_{crit} = z_{\frac{\alpha}{2}}=z_{\frac{0.05}{2}} = z_{0.025} = 1.96$

Decision:

As $\displaystyle |z_{calc}|= 7.07 > |z_{crit}|= 1.96$ then we reject $\displaystyle H_0: \mu=60$ and conclude there is evidence against the restaurant's claim.

Example 2

A teacher claims that the average mark out of 10 for a spelling test is 8. A sample of 15 students returned a mean of 7.2 and a standard deviation of 2.5.

Test the teacher's claim at $\displaystyle \alpha = 0.05$

As the population standard deviation is not known and the sample is quite small so we employ a t-test

The hypothesised problem formulation will be:

$\displaystyle H_0:\mu = 8$
$\displaystyle H_A:\mu \neq 8$

The test statistic will be:

$\displaystyle t_{calc} = \frac{\bar{x}-\mu_0}{ \frac{s}{\sqrt{n}}}=\frac{7.2-8}{ \frac{2.5}{\sqrt{15}}} = -1.24
$

The critical value is:

$\displaystyle t_{crit} = t_{\frac{\alpha}{2},n-1}=t_{\frac{0.05}{2},14} = t_{0.025,14} = 2.5096$

Decision:

As $\displaystyle |t_{calc}|= 1.24 < |t_{crit}|= 2.51$ then we do not reject $\displaystyle H_0: \mu=8$ and conclude there is evidence to support the teacher's claim.

Question 1

A garbage collection agency claims the average weight per waste bin is 25kg. The standard deviation is known to be 12kg. A sample of 100 bins were taken with an average weight of 28.5kg. Test the agencies claim at $\displaystyle \alpha = 0.1$

Spoiler:

As the population standard deviation is known and the sample is large >30 then we employ a z-test.

$\displaystyle H_0:\mu = 25$
$\displaystyle H_A:\mu \neq 25$

$\displaystyle z_{calc} = \frac{\bar{x}-\mu_0}{ \frac{\sigma}{\sqrt{n}}}=2.92$

$\displaystyle z_{crit} = 1.645$

As $\displaystyle |z_{calc}| =2.92> |z_{crit}| = 1.645$ reject $\displaystyle H_0:\mu = 25$ and conclude there is evidence against the agencie's claim.

Question 2

A company claims the average number of sick days taken per year is 5 per employee. 25 employees were sampled with a mean of 7.9 and a standard deviation of 2.1. Test the agencies claim at $\displaystyle \alpha = 0.05$

Spoiler:

As the population standard deviation is not known and the sample is quite small so we employ a t-test

$\displaystyle H_0:\mu = 5$
$\displaystyle H_A:\mu \neq 5$

$\displaystyle t_{calc} = \frac{\bar{x}-\mu_0}{ \frac{s}{\sqrt{n}}}= 6.9
$

$\displaystyle t_{crit} = t_{\frac{\alpha}{2},n-1}=t_{\frac{0.05}{2},24} = t_{0.025,24} = 2.39$

As $\displaystyle |t_{calc}| =6.9> |t_{crit} |= 2.39$ reject $\displaystyle H_0:\mu = 25$ and conclude there is evidence against the companie's claim.

The next post will show the difference between using $\geq , \leq$ instead of $= , \neq$ when formulating the hypothesis.

If you have any questions about any of these examples please pm me (Wink)

• March 21st 2012, 06:12 AM
oliver1
Re: Hypothesis Testing

Blue Green Grey Brown
UV 0 23 27 19 33
UV 1 35 16 20 20
UV 2 30 26 15 23
UV 3 54 23 21 15

data set contains the results of scientific tests performed on a sample of 400 inhabitants of Transylvania, who were selected at random from the population of that region. Eye colour was classified as follows: blue; green; grey; brown. Reaction to controlled exposure of ultraviolet light at a given fixed intensity was classified as follows:

uv0 = mild reaction;
uv1 = moderate reaction;
uv2 = severe reaction;
uv3 = extremely severe reaction.

Consider the following pair of hypotheses:

H0: ultraviolet light sensitivity is independent of eye colour ;
H1: ultraviolet light sensitivity is not independent of eye colour .

Using a suitable hypothesis test - based on the above hypotheses, and performed at the 5% significance level - on your individual dataset, investigate the independence of the eye colour of a Transylvanian and their reaction to ultraviolet light.
Your answers to parts (a),(b) and (c) should be given correct to three decimal place
(a). Enter the critical value:
(b). Enter the expected frequency (under the assumption that the null hypothesis is true) for the cell (UV0,Brown):
(c). Enter the value of the test statistic:
(d). Select the hypothesis test decision, based on your answers to parts (a) and (c): reject or dont reject H0?
• October 24th 2012, 01:25 PM
ShaunRLS
Re: Hypothesis Testing
Help with unbiased estimator (1/(n-2))(sigma^2) = (1/(n-2))(SSE)
Last step, =n(sigma^2) + (...) + (...) - 2(sigma^2) - Sxx(b1^2)

How do you get (1/(n-2))(sigma^2) ???

Thanks