On the inherent struggles of inter-rater reliability

Today, I met with my colleague, Angie Johnson, to code 20 percent of her data from a recent descriptive study of 6th graders’ evaluation of websites. We met at 10 am. Angie figured the whole process would take a couple of hours. Seven hours later, we finished. Epic day.

Percent agreement 72%.

Tough outcome.

We certainly resolved all differences. And, in the end, Angie resolved to collapse three codes that were especially difficult for me to differentiate. This will mean much higher agreement — but coming to that conclusion was a slog.

Through the process, I learned a lot. Here’s a list of the big ideas:

1. When doing IRR, it’s extremely difficult to learn another researcher’s coding scheme. The easiest codes to learn were those that pertained to very concrete ideas. The most difficult were those meant to capture levels of specificity or comparisons along one criterion, but not others.

2. Angie made blank Excel templates with all of her data for me to use. She shared the blank templates on dropbox. It worked really well.

3. When you’re negotiating differences, it’s important to retain the first codes that you both gave…and not just change the code to the agreed-upon code. If you want to analyze which codes were especially problematic afterward, it’s important to have not just the final word, but also evidence of where the negotiations began.

4. Expect IRR to take much longer than you expected. Today, I coded about 250 chunks of written text. The first four transcripts took a long time because I was learning the codes and needed to compare/contrast/think through each item very carefully. I felt I owed Angie this level of thought — but I think we were both glad we hadn’t scheduled anything else for the day.

5. When possible, it seems that there would be advantages to doing a couple of rounds of IRR — the first round when you’re developing your codes and the second when the data have all been coded. An early round of IRR would help you to recognize codes that are redundant, unclear, difficult to apply systematically or could benefit from a slightly different definition.

Despite the challenges of coming to consensus, IRR is an essential part of data analyses. The added confidence in the coding scheme is the obvious advantage of the process. However, for the second coder, the process of IRR is also a great way to learn the ins and outs of qualitative data analysis and to learn about the nuances of data collected in a certain way with a certain sample group.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.