# Questions about calculating inter-rater reliability

• May 1st 2013, 07:58 AM
RASimmons
Questions about calculating inter-rater reliability
I apologize in advance if this is the wrong forum for this.

Anyway, I am having some trouble trying to figure out how to calculate inter-rater reliability for a data set.

My data consists of 2211 RATEES, with 10 RATERS (the design is fully-crossed, so all raters rated all items once). The ratees are actually word pairs (e.g. "alligator-ostrich", "alligator-pineapple", "bottle-ostrich," etc.) which were given a similarity score by the raters on a continuous scale.

The way the rating process worked, in case it is relevant, is that the word pair appeared on a computer screen above a slide bar. The rater was instructed to set the position of the slider in the bar (with the left-hand side being "unrelated" and the right-hand side being "related") to assess word pair similarity. Therefore, the raters were not explicitly assigning ranks, numbers, or categories to each rater - they just set the value of a continuous scale. The program outputted a number (between 0 and 100) to a datafile, but at no point did the raters actually see these values.

So, in any case, the data looks a bit like this:

RATER 1 RATER 2 RATER 3 ...
RATEE 1 5 0 43
RATEE 2 33 18 86
RATEE 3 12 4 52
...

What I want to do is calculate the inter-rater reliability or concordance of the data set. I know there are a number of different methods for doing so, but I can't seem to find one that fits the criteria of my data set, so I was hoping somebody hear could help me out. Maybe I am misunderstanding some of these tests, or there is a major one I don't know about. Anyway, these are the different tests I have looked at.

Cohen's kappa: inappropriate because it requires qualitative/categorical data
Fleiss' kappa: same as above
Joint probability of agreement: again, only works for nominal data
Concordance correlation: only works for paired data
Kendall's W: requires rank ordering, which may or may not be appropriate?
Intra-class correlation: may be appropriate, but searching online just confuses me on this. Some sources say it is only for paired data, others do not. And it seems to require ANOVA?

Anyway, I'm sorry if this is a basic question, but Wikipedia and other online sources are just confusing me more on this, because the information seems to be contradictory from one place to the next. Is ICC the most appropriate method? Or Kendall's or something else that rank orders the data? Should I treat it as nominal and do a kappa?
• May 2nd 2013, 02:26 AM
chiro
Re: Questions about calculating inter-rater reliability
Hey RASimmons.

This model looks like a contigency/catiegorical data problem. You might want to model this as such and then do a statistical test for association for the rating for a particular word combination cell against the variable corresponding to a rater.

If you have evidence of an association, a chi-square analysis should pick it up. If you don't, then it is equivalent to rejecting the hypothesis that there is an assocation.
• May 2nd 2013, 05:06 AM
RASimmons
Re: Questions about calculating inter-rater reliability
Thanks for the reply, chiro!

How do I model this as a contingency/categorical data problem? I don't quite understand how to apply a chi-square analysis to this data set ...
• May 2nd 2013, 05:04 PM
chiro
Re: Questions about calculating inter-rater reliability
Do you have access to SAS? SAS has a function called PROC FREQ that will do the test for you, but if you want to do it manually, you simply compare an observed distribution with a uniform distribution.

The idea is that if there is an association then P(A|B) will not be constant across different values of B. If P(A|B) = P(A), this means A and B are not associated (independent).

So what you are doing is your observed distribution is P(A = a|B) for some fixed a and varied B, and your expected distribution is a uniform distribution.

If the result is extreme enough, you can reject that the two variables independent.
• May 3rd 2013, 05:20 AM
RASimmons
Re: Questions about calculating inter-rater reliability
I do have access to SAS. I just wanted to know the logic/method behind what I was doing. I don't like being one of those people that just plugs things blindly into a program without actually understanding what it is doing. I will look at that function.

Thanks for your help!