Hey lpd.
What are you trying to do? Are you doing a non-parametric test to see if you have evidence that the medians are the same or different?
hi!
what is the interpretation behind the confidence interval in a mann-whitney test or a wilcoxon rank sum test?
Two group of data, symmetric empirical distributions...
I got a result of (3.9, 6.0) for 95% CI (from the computer, i.e. minitab)
However, the difference in medians for both are 24.3-24.4=0.1 (where 24.3=median of A, 24.4=median of B).
Why doesn't the difference in medians included in the 95% CI? And what is the meaning of the CI for this?
Btw, the p-value is 1.708x10^-8
Thank-you!!
Basically, I'm using a non-parametric test to see if I have evidence that the medians are the same.
and I get a W=7443.5, and p-value=1.708x10^(-8)
The point estimator of the median difference is (24.3-24.4)=-.1,
yet the 95% CI doesnt contain that interval. I get (3.9, 6.0)
So my question is howcome the difference between the medians is not in the confidence interval I found... and what is the meaning behind the confidence interval of the wilcoxon test?
Yeah that's weird if you don't have the point estimate of an estimator within the interval itself (regardless of the estimator, any estimator will contain its point estimate).
I'm wondering given this wiki site:
Mann
Can you get either the point estimate and standard errors for the actual estimator, or get the n1,n2 values listed in the wiki site.
The reason I suggest this is because if you have a lot of samples, then the normal approximation in the wiki should reflect the statistics and interval generated but if not then you know something is a-miss.
Can you post the entire generated output?
This is the output that I got...
> wilcox.test(score ~ group, alternative="two.sided", conf.int=TRUE)
Wilcoxon rank sum test with continuity correction
data: score by group
W = 7443.5, p-value = 1.708e-08
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
3.899979 5.999932
sample estimates:
difference in location
4.900035
If I do a 2-sample 95% CI, for mean_A - mean_B, using the independent samples t test method, its (2.7,6.9)
EDIT: when I work out the medians, its 24.3 and 24.4 for samples A and B respectively...
I don't think this statistic refers to the median: this is going to refer to the Wilcoxin Signed Rank test statistic W.
Now if you are getting a really low p-value for the alternative hypothesis that there is a difference in medians and subsequently reject that hypothesis, then what this is telling you is that there is evidence that there is no difference in the medians of the two populations.
Just be aware that this test is not like the hypothesis tests or means and variances: this is a completely different kind of test that looks at a particular kind of transformation and this happens quite a lot in statistics where you look at tests that test things that have no obvious connection to the thing you are making an inference on.
So if this is indeed rejecting the H1 and failing to reject H0, then it matches the observation of your calculated means which are nearly zero and the test statistic if it is reporting what I say is reporting, is also supporting that statement.
Well the point estimate is given as 4.900035, but again this is referring to a statistic that doesn't have a simple connection to the parameter of interest like you would have for say a mean or a variance or a proportion or something along those lines.
The CI is given as (3.899979,5.999932) but this is only relation to this specific test statistic and distribution.
Since it says continuity correction, I'm going to hazarda guess to say that it may use some kind of normal approximation. The website outlines how to get the mean and variance of this normal distribution and you could see if you get the same parameters for the asymptotic normal that is given by the output in the statistical program.
It would probably be a real pain to do manually, but it's always an option if you really want to see yourself (maybe you could do it with a few really small data sets as opposed to doing it for large ones, but if you do this then you can't use the normal approximation but instead use the tables).