Hi, my first post here so I hope its ok! I know this looks long, but please don't be put off, it's probably way more simple than it looks, its probably a basic 'statistics ethics' question so please have a go!


I conducted a survey, which had many questions for which the answers were yes/no, or 'tick all that apply' or 'choose from 1-5', etc.

But the 2nd part was forced-response Randomised Response Trial (RRT) data. For those that don't know, this is a way of asking sensitive YES/NO questions from people. You have a dice, the person answering the question rolls the dice, and is instructed that if they roll a 5, they MUST answer YES, regardless of the truth, if they roll a 6, they must asnwer NO. If they roll 1-4, they should answer honestly. As I don't see the dice roll, I can't know whether they told the truth or not, making it safe for them.

In theory, 1/6 of the responses should be forced YES's, and 1/6 forced NO's.

I then ran a bootstrap on the data, and used a formula (not sure how to type it here) which basically removes the probable forced responses, leaving an estimate of the truth. The proportions of people who have admitted to something can then be guessed.

This is all fine and I've done with that bit, the bit I'm struggling with is this: how do I perform correlations between rows of unprocessed RRT data, or between unprocessed RRT and normal answers? I know the mechanisms, and a Spearman rank seems most appropriate to me, but I need to know whether I actually can.

Here's what I'm struggling with; the untouched data has false answers in it (forced yes's and no's), and so it seems to be bad form to use this as if it were true. But as there should be the same number of forced yes's and no's, they should cancel out, right? Meaning I can run correlation tests on them. What I need to know most is this; is treating the RRT data the same as the data from the 'normal' questions OK to do? Or will it be frowned upon?


Carry on reading here if you need to know more...

If I've been confusing, I'll give an example;

Say the questionnaire concerned speeding in driving. One of the normal questions asks how old the driver is, in age bracket categories (18-30, 31-40 etc...), simple.

The RRT question asks 'Have you been prosecuted for speeding in the past year?'

So I end up with my two rows of data, one of which (age) is normal, and the other has some forced responses in it. Can I perform a correlation to find out whether the two correlate? Or would it not be allowed?

Thanks, and please ask away if there's anything more you need to know, Sorry it was so complicated!