Wendy Williams and Stephen Ceci have just published an article in PNAS titled “National hiring experiments reveal 2:1 faculty preference for women on STEM tenure track” (here). The article is striking, and seems to show a great deal of progress in gender equity in hiring (notwithstanding worries that some have expressed that this study demonstrates “reverse discrimination”). There has been interesting discussion of the article on Facebook (FB), the Daily Nous, and New APPS, and most of what I say here is a reworking of points that others have already made. First I’ll make a couple positive points about the article; then raise a worry about the authors’ interpretation of their data; and then raise a few questions about the data.
On the positive side, W&C’s data tells us more than we knew before about how gender attitudes and gender discrimination work. As Edouard Machery said on FB, we need to know the facts in order to create effective interventions. This seems right. The question, I think, is what exactly the study shows, and whether it shows what the authors think it shows.
Also on the positive side—although not emphasized by the authors—W&C’s data do a nice job of illustrating the context-specificity of bias and discrimination. W&C focused exclusively on “highly accomplished candidates.” As Bryce Huebner has pointed out (also Susan Sterrett on FB), attitudes toward this group of job applicants (or stereotypes about them) may be quite different from attitudes toward candidates perceived as more moderately accomplished. (Evidence shows that biases are more active in ambiguous situations; see below for more discussion on this.) W&C may help us to see, in other words, how gender bias targets different women differently. In the literature on implicit intergroup attitudes, as well as explicit intergroup attitudes, it is clear that specific biases target specific groups of people in specific contexts. Susan Fiske’s “Stereotype Content Model” is exemplary on this point, when it comes to explicit attitudes. Many studies in the implicit attitude literature (e.g., Rudman & Kilianski 2000, on gender-authority implicit stereotypes) also demonstrate context-specificity. This can also be seen by thinking about the strengths and weaknesses of the Implicit Association Test (IAT). As Alex Madva and I have argued (here), because the IAT measures generic preferences, its principle strength is predicting a wide range of behaviors. Its principle weakness, however, is small effect sizes, which arguably have to do with the way in which particular combinations of stereotypes and prejudices are activated in particular contexts. The IAT is like a jack of all trades, and master of none. (This is not to endorse recent criticism of the IAT, particularly from Oswald et al. 2013. See below for more on this.)
Of course, W&C claim something much more sweeping for their study, namely, that “it is a propitious time for women launching careers in academic science.” (Similarly: “Efforts to combat formerly widespread sexism in hiring appear to have succeeded. After decades of overt and covert discrimination against women in academic hiring, our results indicate a surprisingly welcoming atmosphere today for female job candidates in STEM disciplines . . ..”) This claim seems unwarranted for several reasons. This first, of course, is that the study can only tell us about perceptions of “highly accomplished candidates,” not candidates of all kinds. Second, there are, of course, many other locations and ways in which women face bias and discrimination in academic careers (as this blog clearly demonstrates, and as Helen de Cruz has pointed out on FB). W&C focus specifically on hiring for Assistant-level tenure-track positions. Their data don’t tell us about hiring for other kinds of positions (lectureships, senior hires, etc.), nor do the data tell us about promotion decisions, publication biases, salary issues, micro-aggressions, chilly atmospheres, and, of course, explicit harassment. (As Bryce Huebner points out, there are also all the forms of bias that affect candidates before they reach the stage of being perceived as academic superstars.) All of this adds up (as Virginia Valian (1998, 2005) has argued, and more recently, Greenwald and colleagues have argued too (here), in responding to Oswald and colleagues’ critique of the IAT; Greenwald and colleagues’ basic point (aside from pointing out methodological problems in Oswald’s meta-analysis) is that small effect sizes add up to significant social forces).
But this is all granting W&C’s data, and I’m not sure we should do that too quickly. One very general worry, as Alex Madva suggests, is that a 2:1 preference for women over men in STEM hires just doesn’t seem to pass the smell test. Is this really plausible, if we step back and consider everything else we know about gender and science?
A more specific worry stems from the materials W&C used. In other hiring studies (such as Moss-Racusin et al. 2012), which have shown clear gender bias against women, all participants received the same one set of application materials, with the only change between participants being the name at the top. There is an advantage to W&C’s approach, which was to have participants review a set of candidates at once; the advantage is a kind of ecological validity. Normally, we evaluate sets of candidates all at once. But there is a big disadvantage to W&C’s approach too, which is that it enables the comparative social identities of the candidates to become more salient. In studies in which participants evaluate just one CV or dossier, the presentation of the social identity of the candidate is much subtler. If implicit biases are in play, they are more likely to be elicited in the latter, more carefully controlled, scenario. (The broader point here is that there may be a trade-off between an experiment’s ecological validity and its ability to identify specific factors affecting people’s judgment and behavior. There is a reason why controlled laboratory experiments aren’t exactly lifelike!)
Another methodological worry has to do with W&C’s mock narrative descriptions of the candidates. The narrative descriptions include a number of possible confounds, including information about the fictitious candidate’s marital status and “partner/family issues.” Moreover, in one of their validation studies, which did use CVs, W&C report that they abandoned using identical CVs for top male and top female candidates. Another worry about one of the validation studies—the one that used CVs instead of narrative descriptions—is the very low number of participants (n = 35).
Adding all this up: (a) participants in W&C’s study know that this is only a mock hire; (b) they know they are comparing applications from men and women; and (c) all of the candidates are described as excellent (e.g., “a powerhouse”). Given all of this, it seems reasonable to think that participants took this as an opportunity to appear unbiased. The stakes are low; gender is salient; and all of the candidates are stellar. So it remains very plausible that biased hiring persists in this (i.e., TT junior hires in STEM) and other areas, when the stakes are higher and the quality of the candidates is harder to decipher.
This last point can be broadened. When it comes to implicit bias, at least, stereotypes and prejudices seem to affect us the most when we have to make hard, ambiguous decisions. They intrude when you are looking at two applications (or papers for publication in a journal, or student papers), and you are searching for reasons to prefer one to the other, thinking “uhhh . . . I don’t know . . . this one has these strengths but these weaknesses . . . but this one has these other strengths and these other weaknesses . . . well, I guess I’ll pick this one . . . it just seems a little better.” I don’t think W&C captures this situation accurately.*
Finally, a small point. The authors note their “surprise” at their findings, and in a Nature article describing the study, Williams says she was “shocked” by the data. This strikes me (and Helen de Cruz on FB) as strange, and possible disingenuous, given than W&C have been publishing articles purporting to show the absence of gender bias in the academy for years.
(Thanks so much to Alex Madva, Bryce Huebner, and Jenny Saul for help putting these thoughts together.)
*In their 2012 study, Corinne Moss-Racusin and colleagues anticipate this very problem with W&C’s study. They write: “Following conventions established in previous experimental work (11, 12), the laboratory manager application was designed to reflect slightly ambiguous competence, allowing for variability in participant responses and the utilization of biased evaluation strategies (if they exist). That is, if the applicant had been described as irrefutably excellent, most participants would likely rank him or her highly, obscuring the variability in responses to most students for whom undeniable competence is frequently not evident. Even if gender-biased judgments do typically exist when faculty evaluate most undergraduates, an extraordinary applicant may avoid such biases by virtue of their record. This approach also maintained the ecological validity and generalizability of results to actual undergraduate students of mixed ability levels.”