from an MIT student.  And it contained a discussion of a remarkable study I hadn’t come across before, on how to eliminate the gap between men and women on mental rotation.  It turns out that just spending a little time imagining oneself as a stereotypical male raises women’s scores so dramatically the gap is reduced to statistical insignificance.

In a now-famous study, psychologists at the University of Berlin falsely told participants that they had been selected to participate in a series of tests “to measure the ability to put oneself in someone else’s position” – a fabrication devised to avoid confounding factors in their real study on gender identity priming. They prepared a text describing a day in the life of a “stereotypical woman” who takes care of her family, works part time, and is insightful, helpful, and agreeable. They also prepared an equivalently-structured text outlining the activities of a stereotypical manly man who is tough, risk-taking, and does weight training after work. Subjects were randomly given one of the two texts, and then asked: “If you were the person described in the text, which adjectives would you use to describe yourself?”

Soon after participants described themselves with either the male- or female-associated traits, they were asked to take a mental rotation test presented as independent of the first part of the study, supposedly to measure their personal spatial aptitude. On this mental rotation test, women who were “primed” with the female identity scored an average of 3.86 on the exercise, compared to the female-primed males’ average of 5.14. Okay, expected. But then when primed with the male text, women scored an average of 5.49, while men scored 5.53… wait a second, what?

As it turns out, there is zero statistically significant gender difference in mental rotation ability after test-takers are asked to imagine themselves as stereotypical men for a few minutes. None. An entire standard deviation of female underperformance is negated on this condition, just as a man’s performance is slightly hindered if he instead imagines himself as a woman. (well then.) Although this study is of course not a logically definitive answer to all things “nature versus nurture,” it does add a tremendous structural asset to the growing mountain of evidence that “natural” ability differences are confounded by identity and subconscious self-stereotyping. Demographic expectations may be subtle or overt, but they are omnipresent, and they are likely much more powerful than most of us have ever considered.

Thanks to S, and S, for calling this article to my attention!

4 thoughts on “Lovely overview of stereotype threat

  1. I would be a little nervous drawing too large a conclusion from any given stereotype-threat result. My understanding is that meta-analyses – while they do suggest that there’s some effect here – find that effect to be pretty small and probably somewhat inflated by publication bias. Here’s a recent meta-analysis: (looking at stereotype threat on maths testing results). From its conclusion:

    “We estimated a small average effect of stereotype threat on the MSSS test-performance of school-aged girls; however, the studies show large variation in outcomes, and it is likely that the effect is inflated due to publication bias. This finding leads us to conclude that we should be cautious when interpreting the effects of stereotype threat on children and adolescents in the STEM realm. To be more explicit, based on the small average effect size in our meta-analysis, which is most likely inflated due to publication bias, we would not feel confident to proclaim that stereotype threat manipulations will harm mathematical performance of girls in a systematic way or lead women to stay clear from occupations in the STEM domain.”

  2. In addition to David Wallace’s point, I would like to point out that the student’s presentation of the results of Steele and Aronson, 1995 is misleading. Indeed, her presentation suggests that Steele and Aronson’s study has shown that stereotype threat can account for the whole gap between whites and blacks in performance on standardized tests, which it certainly does not as was noted by Sackett et al., 2004.

    The reason is that, in their famous study, Steele and Aronson chose groups of blacks and whites that were known to perform equally well on the SAT prior to the experiment. What they showed is that it was possible to induce the black students, who prior to the experiment performed equally well as the white students, to significantly underperform by priming them in the right way. At best, this suggests that *part* of the gap between whites and blacks in performance on standardized test might be explained by stereotype threat, but many other assumptions would have to be tested before we can reach even this much weaker conclusion.

    That being said, it’s really not surprising that a student made that mistake, given that Steele and Aronson’s results are consistently mischaracterized in that way even by professional scholars. Philosophers, unfortunately, seem particularly prone to this mischaracterization. I don’t know how many times I’ve read or heard a philosopher interpret Steele and Aronson, 1995 in that way. For instance, Pigliucci, 2013 writes “Steele and Aronson (1995), among others, looked at IQ tests and at ETS tests (e.g. SATs, GREs, etc.) to see whether human intellectual performance can be manipulated with simple psychological tricks priming negative stereotypes about a group that the subjects self-identify with. Notoriously, the trick worked, and as a result we can explain almost all of the gap between whites and blacks on intelligence tests as an artifact of stereotype threat, a previously unknown testing situation bias.”

    In general, it seems to me that people, philosophers in particular, are way too quick in drawing conclusions from the results of studies about stereotype threat. I think it’s also true, though often for different reasons, about the way in which they draw conclusions from studies about implicit bias.

  3. David, what do you think about this method of checking for publication bias in stereotype threat? I would have thought it better to replicate an outlier study, since ST seems sensitive to context and cues, the kinds of differences easily lost in a meta-analysis. In fact, the authors of that article don’t look at the modifiers found to have the biggest effect on women in this meta-analysis, from what I can tell: . Do you know of attempts to replicate these studies?

  4. Quick disclaimer: I have no specialist knowledge here; this is just lay interest.

    I can certainly see the advantages of the “replicate outliers” strategy, and I don’t know if it’s been done much in this case. (My general understanding is that replications ought to be done more across the board but they don’t sound very sexy to funding bodies.) Part of what’s interesting about the meta-analysis I linked is that it applies some statistical checks for publication bias (i.e. only the more interesting results ending up published) and finds some evidence for that.

