Results 1 to 5 of 5
Like Tree2Thanks
  • 1 Post By chiro
  • 1 Post By chiro

Math Help - Questions about calculating inter-rater reliability

  1. #1
    Newbie
    Joined
    Oct 2011
    Posts
    13

    Questions about calculating inter-rater reliability

    I apologize in advance if this is the wrong forum for this.

    Anyway, I am having some trouble trying to figure out how to calculate inter-rater reliability for a data set.

    My data consists of 2211 RATEES, with 10 RATERS (the design is fully-crossed, so all raters rated all items once). The ratees are actually word pairs (e.g. "alligator-ostrich", "alligator-pineapple", "bottle-ostrich," etc.) which were given a similarity score by the raters on a continuous scale.

    The way the rating process worked, in case it is relevant, is that the word pair appeared on a computer screen above a slide bar. The rater was instructed to set the position of the slider in the bar (with the left-hand side being "unrelated" and the right-hand side being "related") to assess word pair similarity. Therefore, the raters were not explicitly assigning ranks, numbers, or categories to each rater - they just set the value of a continuous scale. The program outputted a number (between 0 and 100) to a datafile, but at no point did the raters actually see these values.

    So, in any case, the data looks a bit like this:

    RATER 1 RATER 2 RATER 3 ...
    RATEE 1 5 0 43
    RATEE 2 33 18 86
    RATEE 3 12 4 52
    ...

    What I want to do is calculate the inter-rater reliability or concordance of the data set. I know there are a number of different methods for doing so, but I can't seem to find one that fits the criteria of my data set, so I was hoping somebody hear could help me out. Maybe I am misunderstanding some of these tests, or there is a major one I don't know about. Anyway, these are the different tests I have looked at.

    Cohen's kappa: inappropriate because it requires qualitative/categorical data
    Fleiss' kappa: same as above
    Joint probability of agreement: again, only works for nominal data
    Concordance correlation: only works for paired data
    Kendall's W: requires rank ordering, which may or may not be appropriate?
    Intra-class correlation: may be appropriate, but searching online just confuses me on this. Some sources say it is only for paired data, others do not. And it seems to require ANOVA?

    Anyway, I'm sorry if this is a basic question, but Wikipedia and other online sources are just confusing me more on this, because the information seems to be contradictory from one place to the next. Is ICC the most appropriate method? Or Kendall's or something else that rank orders the data? Should I treat it as nominal and do a kappa?
    Follow Math Help Forum on Facebook and Google+

  2. #2
    MHF Contributor
    Joined
    Sep 2012
    From
    Australia
    Posts
    3,694
    Thanks
    618

    Re: Questions about calculating inter-rater reliability

    Hey RASimmons.

    This model looks like a contigency/catiegorical data problem. You might want to model this as such and then do a statistical test for association for the rating for a particular word combination cell against the variable corresponding to a rater.

    If you have evidence of an association, a chi-square analysis should pick it up. If you don't, then it is equivalent to rejecting the hypothesis that there is an assocation.
    Thanks from RASimmons
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Newbie
    Joined
    Oct 2011
    Posts
    13

    Re: Questions about calculating inter-rater reliability

    Thanks for the reply, chiro!

    How do I model this as a contingency/categorical data problem? I don't quite understand how to apply a chi-square analysis to this data set ...
    Follow Math Help Forum on Facebook and Google+

  4. #4
    MHF Contributor
    Joined
    Sep 2012
    From
    Australia
    Posts
    3,694
    Thanks
    618

    Re: Questions about calculating inter-rater reliability

    Do you have access to SAS? SAS has a function called PROC FREQ that will do the test for you, but if you want to do it manually, you simply compare an observed distribution with a uniform distribution.

    The idea is that if there is an association then P(A|B) will not be constant across different values of B. If P(A|B) = P(A), this means A and B are not associated (independent).

    So what you are doing is your observed distribution is P(A = a|B) for some fixed a and varied B, and your expected distribution is a uniform distribution.

    If the result is extreme enough, you can reject that the two variables independent.
    Thanks from RASimmons
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Newbie
    Joined
    Oct 2011
    Posts
    13

    Re: Questions about calculating inter-rater reliability

    I do have access to SAS. I just wanted to know the logic/method behind what I was doing. I don't like being one of those people that just plugs things blindly into a program without actually understanding what it is doing. I will look at that function.

    Thanks for your help!
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Conditional Reliability
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: July 16th 2011, 12:09 AM
  2. Reliability
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: June 8th 2010, 02:55 AM
  3. Inter event times Poisson Process
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: January 13th 2010, 04:47 PM
  4. Two questions related to calculating CPI
    Posted in the Math Topics Forum
    Replies: 1
    Last Post: August 17th 2009, 08:36 PM
  5. inter quartile range help!!!
    Posted in the Statistics Forum
    Replies: 2
    Last Post: May 12th 2007, 01:46 AM

Search Tags


/mathhelpforum @mathhelpforum