I was thinking about this again today and I realized that they really needed to measure intraclass correlation in order to determine whether there was a construct that was initially supervised incorrectly. Classify the guys who are classifying the girls- I guess turn around if fair play