# measure for hypothesis testing

#### Cottonshirt

I have a population of athletes who have run races over five different distances. those selected for this study have met a criteria for "success."

the successful athletes are divided into two groups, those born in summer and those born in winter, where winter is defined as the first six months of the school year in their country.

the hypothesis is: the proportion of successful athletes born in winter will increase as race distance increases.

we have three variables: sum = number born in summer, win = number born in winter, T = total (sum + win)

since we have a different number of athletes at each race distance, the calculation I am doing is: (Abs(sum-win)/T)*100

so this is the difference between sum and win, as a percentage of their total.

my question is: is this an appropriate measure for testing this hypothesis. or would it be more appropriate to just use the absolute percentage of winter born athletes, or something else altogether.

thank you.

#### romsek

MHF Helper
is all this on a per distance basis? Are you coming up with 5 separate sets of statistics?

#### Cottonshirt

is all this on a per distance basis? Are you coming up with 5 separate sets of statistics?
the numbers are calculated for each race distance. at 400m the difference between summer and winter is 2.5% and at 800m it is 7.07% and at 1500m it is 9.18% and so forth. the hypothesis was that the proportion of winter-born athletes would increase as race distance increases. the numbers certainly seem to support that and my question is simply asking whether measuring the difference between summer and winter is equivalent to the statement of the hypothesis.

thank you.

#### romsek

MHF Helper
Well... without delving into it too deeply basically what you want to do is

a) for each set of races estimate p = the proportion of winter athletes that win. This gives you 5 numbers in the range of [0,1]

b) Your hypothesis test statistic will be the estimated regression of these 5 numbers. A slope of zero or less would tend to invalidate it.
A positive regression would support the hypothesis at varying degrees of confidence.

Your question addressed (a). The maximum likelihood estimate of p is just the number of winter winners divided by the total.
The statistic you came up with is unnecessarily complicated.

The maximum likelihood estimate of the regression slope is just the linear least squares estimate.

#### Cottonshirt

a) for each set of races estimate p = the proportion of winter athletes that win. ...
I'm afraid this doesn't have very much to do with the question I asked.

winning is not relevant.

the athletes in the population have been chosen BECAUSE they have already met the criteria for success. it doesn't matter whether they won or not. these are the successful athletes.

for each race distance, x percent of them were born in winter, and n percent of them were born in summer.

the hypothesis says that, the proportion of them that were born in winter will increase as race distance increases.

the longer the race distance, the greater the proportion of this group who were born in winter.

all I am asking is does calculating the difference between summer and winter equate to the statement of the hypothesis.

thank you.

#### Cottonshirt

you seem to have got a bit stuck on the idea of hypothesis testing which isn't really what my question is about. so I'm going to try asking the question in a different way to see if that will elicit a different sort of response.

we have three numbers, A>0, B>A, C = (B - A), and A + B = 100

the hypothesis says that B will increase as race distance increases.

I can show that C increases as race distance increases, which is mathematically equivalent to the hypothesis.

all I 'm asking is, are the people in statistics land sufficiently pedantic that saying C increases will not be accepted as proving the hypothesis. will they insist that if the hypothesis says that B increases then you have to show specifically that B itself actually increases?

thank you.

#### romsek

MHF Helper
$C=B-A \Rightarrow B = A+C$

$A+B=100 \Rightarrow B = 100-A$

$2B = 100+C$

$B = \dfrac{100+C}{2} = 50 + \dfrac 1 2 C$

so yes. There is a positive correlation between $B$ and $C$
For any given increase in $C$ you can translate this to the increase in $B$
and that will keep the evil statisticians happy.

#### Cottonshirt

For any given increase in $C$ you can translate this to the increase in $B$ and that will keep the evil statisticians happy.
thank you.