# spearman's rank critical values formula?

• May 23rd 2011, 02:06 PM
joelyboy94
spearman's rank critical values formula?
hi everyone,
i have searched and i cant seem to find a formula which can be used to calculate spearman's rank critical values for n<around 60 (like in the tables) using the value of n and the significance level. im trying to write a program which i need this for and i cant find a formula anywhere (Headbang)
• May 25th 2011, 01:03 AM
bryangoodrich
Quote:

Originally Posted by joelyboy94
hi everyone,
i have searched and i cant seem to find a formula which can be used to calculate spearman's rank critical values for n<around 60 (like in the tables) using the value of n and the significance level. im trying to write a program which i need this for and i cant find a formula anywhere (Headbang)

I don't know what tables you're using. I've never seen a Spearman rank correlation test before, but it makes sense for a correlation analysis. See Wikipedia's example of two such test statistics that use the sample Spearman rank correlation coefficient (r) to test the null hypothesis that $r\neq 0$. I would assume your tables are based on the simpler Student t-distribution. As the wiki article details,we're interested in calculating:

$t^* = r \sqrt{\frac{n-2}{1-r^2}},\ t(1-\alpha / 2, n - 2)$
• May 25th 2011, 02:31 PM
joelyboy94
hi there,
thanks for answering. im with the mei board so i use the mei tables from here http://www.mei.org.uk/files/pdf/formula_book_mf2.pdf
and thank you i saw that formula but it didnt think it was the right one because i thought the critical values were calculated without r? just with n and the significance level?
• May 25th 2011, 03:15 PM
bryangoodrich
If r isn't included, how would it relate to the correlation coefficient?
• May 25th 2011, 03:30 PM
joelyboy94
ummm im not sure its just that you dont have to know r to use the values in the formula book? you just have to know n and the sig level?
• May 25th 2011, 08:59 PM
bryangoodrich
Well, now I'm not sure. The wikipedia article talked about using the coefficient to calculate a t-statistic that you can hypothesis test with the Student's t-distribution. I don't know if that is the case in these MEI tables. It defines a statistic (p. 6):

$r_s = 1-\frac{6\sum d_i^2}{n(n^2-1)}$

And it then later claims this as a test statistic, but doesn't list a distribution (p. 8). I've found a few different tables and they all fail to specify anything about the table other than its a test for independence, testing to see if the coefficient exceeds the critical value. I think it might be some sort of nonparametric thing for low values of n. One of the sites I found specified that "For n > 40, assuming independence, ρ is approximately an observation from a normal distribution with mean 0 and variance 1/(n − 1)." If I find out anything else, I'll let you know, but I'm stumped. Like I said originally, I didn't even know about this test in the first place! That's why I'm here though: to learn. Cheers.
• May 25th 2011, 10:41 PM
bryangoodrich
I don't know if you've heard of permutation tests before, but I believe that is how the tables are calculated. This document (pdf) specifies similar things to what the other website I found said regarding larger samples. In particular, for large sample sizes, we approximately have,

$Z = r_s\sqrt{n-1} \sim N(0,1)$

This is why the tables only report for sizes of around 30 to 40. Now, I'm not quite sure how the permutation test goes, but in other cases I've seen before, it goes something like this. For a given sample size, under the null hypothesis that $\rho = 0$, the data is uncorrelated. Therefore, if we resample every possible permutation of the data into its two groups, it should stay uncorrelated. To the extent that this does or does not happen, we can reject or fail to reject the outcome. Mechanically, we basically calculate $r_s$ under all permutations of the data sets of size n. The result gives us a distribution from which we can run our test. In other words, given the permutation distribution (i.e., how r is distributed under all permutations of the data assuming the data are uncorrelated), how likely is our observation? There's no quick and dirty formula for this. It has to be run on a computer, and since it only matters for smaller sample sizes, you might as well just look it up in the table.

For those that haven't seen how a permutation distribution works, feel free to pick up R and look at my code below. This example comes from the end of Chapter 16, page 715, in Kutner, et al., "Applied Linear Statistical Models" (5th ed.). It does 1680 permutation samples out of a choice of 9 items into three groups and calculates the F-statistic. It then plots the histogram of the permutation distribution and overlays the approximate F-distribution to compare. The F-distribution approximates the permutation very well. The base case by which we generate the permutations has an F* of 4.38577, and 126 of the 1680 are greater than it. Thus, the probability that you get a value equal to or greater than the observed in this permutation distribution is 7.560%. The equivalent test using the F-distribution gets a very similar result. The following code is part of an ongoing project of mine to produce the ALSM examples in R.

Code:

################################################################################ ## TABLE 16.5                                                          (p 715) # ## Randomization Samples and Test Statistics--Quality Control Example          # ## FIGURE 16.8                                                                # ## Randomization Distribution of F* and Corresponding                          # ## F Distribution--Quality Control Example                                    # ##                                                                            # ## Since there is no algorithm to compute this example we had to devise one.  # ## It should come as rather straight-forward. The Xi's are as in the above    # ## examples. The 'y' will hold the 1,680 cases of 9-sequences consisting of    # ## the response variables. The 'ti' implies the treatment group. In this case  # ## t1 is the first group (3-sequence) and t12 is the composite of t1 and t2.  # ## The 'remainder' function is a wrapper for grabing a subset of 'set' based  # ## on those values not in 'x'. The 'seq6' is the 6-sequence remainder after t1 # ## is defined. The whole process took less than 10 seconds on a 2.4 GHz        # ## processor. As for the output, the columns are arbitrarily labeled 1-9.      # ## Clearly they represent the three treatment groups based on groups of three. # ## The function 'f' uses the matrix algebra discussed in Ch. 5. It is possible # ## to get away with merely fitting an 'lm' object, and then extract the        # ## f-statistic in a single call. However, this requires a lot of additional    # ## work for each of the 1680 rows. It took somewhere between 30-60 seconds to  # ## produce the same result.                                                    # ################################################################################ remainder <- function(x, set) set[!set %in% x] f <- function(Y, X) {   Y <- matrix(Y)                                ## Turn row-vector into column   p <- ncol(X);    n <- nrow(X)   J <- matrix(1, n, n)                          ## (5.18)   H <- X %*% solve(t(X) %*% X) %*% t(X)        ## (5.73a)   SSE <- t(Y) %*% (diag(n) - H) %*% Y          ## (5.89b)   SSR <-  t(Y) %*% (H - (1/n)*J) %*% Y          ## (5.89c)   fstar <- (SSR / (p - 1)) / (SSE / (n - p))    ## (6.39b) } base <- c(1.1, 0.5, -2.1, 4.2, 3.7, 0.8, 3.2, 2.8, 6.3) t2  <- t12 <- t123 <- list() y    <- NULL X    <- cbind(   X1 = c(1, 1, 1, 0, 0, 0, 0, 0, 0),   X2 = c(0, 0, 0, 1, 1, 1, 0, 0, 0),   X3 = c(0, 0, 0, 0, 0, 0, 1, 1, 1) ); t1  <- t(combn(base, 3)) seq6 <- t(combn(base, 3, remainder, set = base)) for (i in 1:84)  t2[[i]] <- t(combn(seq6[i, ], 3)) for (i in 1:84) t12[[i]] <- cbind(t1[i, 1], t1[i, 2], t1[i, 3], t2[[i]]) for (i in 1:84)   t123[[i]] <- cbind(t12[[i]], t(apply(t12[[i]], 1, remainder, set = base))) for (i in 1:84) y <- rbind(y, t123[[i]]) fstar <- apply(y, 1, function(Y) f(Y, X)) cbind(y, data.frame(f = fstar)) hist(fstar, freq = FALSE, ylim = c(0, 1), col = "gray90", main = "") curve(df(x, 2, 6), add = TRUE, lwd = 2) rm(base, fstar, i, remainder, seq6, t1, t2, t12, t123, f, X, y)