New study shows preference for women?

There’s a lot of buzz this morning about a new study which seems to show a preference for women over otherwise identical men. I haven’t had a chance to look in detail yet, but so far I am struck by one fact about the study: Rather than giving subjects CVs to evaluate, the experimenters gave them vignettes which included marital status and history and number of children– that is, precisely the stuff you’re not allowed to ask about in hiring. In reflecting on why this study has obtained such different results from all the CV studies, I find myself suspecting this played a role. Perhaps being explicitly given such prejudicial information put people on guard in a way that led them to overcompensate for possible biases.

At any rate, this certainly doesn’t count against my view that as much of hiring as possible should be conducted anonymously. Indeed, it shows that such anonymity may be necessary to block the operation of different biases under different circumstances.

15 thoughts on “New study shows preference for women?

  1. There is also a worry that because the subjects knew that it wasn’t a “real” hiring situation but only a “mock” one, they made different judgments than they would have in the real situation, e.g., to prove how non-sexist they are.

  2. It doesn’t seem to be explained by the difference between narratives and CVs.

    “In two validation studies, 35 engineering faculty provided rankings using full curricula vitae instead of narratives, and 127 faculty rated one applicant rather than choosing from a mixed-gender group; the same preference for women was shown by faculty of both genders.”

  3. Yeah, I found the methodology on this one incredibly bizarre. The vignettes included all sorts of personal information (whether someone had been divorced, whether they took parental leave when they had children, etc) that I have never, in the various search committees at various levels I’ve served on, been in any kind of position to know. Indeed, as you say, this is precisely the kind of information hiring committees aren’t supposed to know, and are prohibited from finding out.

    Who knows what the upshot is. The hypothesis you suggest in the OP is interesting. I was also wondering if women seem more ‘relatable’ when we know personal details about their private/family lives (and these same kind of details might count against men when presented in a hiring setting, where they usually aren’t considered). Again, who knows!!!

  4. I also have some concerns about social desirability bias: a tendency for respondents in an experiment to pick the candidate that shows that they’re not sexist, rather than the candidate they would pick in a real-world situation. Experiments 4 and 5 are designed to test for these kinds of concerns; while they look simple initially (and support the methodology), the actual structure of the series of experiments and statistical tests is rather complicated (the Supplemental Information, with all of these details, is about 30 pages long). I don’t have time to take a closer look now.

  5. I wonder if another factor is that the CVs presented were apparently those of star-level researchers. So that the hiring committees might be thinking something like “Wow, she’s accomplished all this while raising a child on her own–she must be really amazing!” (A not unreasonable inference.) Which might not extend to cases where the committee is dealing with two comparably good candidates that don’t walk on water and one of them is a single mom.

  6. Gopher, that validation study is eyebrow-raisingly narrow, and I’m not sure how much can be gleaned from it. That asked a pool of only 35 faculty to rate CVs (I couldn’t find how many CVs were rated from their brief description of the experiment, but I wasn’t reading closely), and all the faculty were in engineering departments. Both the very small sample size and the fact that they sampled only engineering faculty (rather than the full spread of disciplines they studied the narratives with) are striking.

  7. It’s another Ceci and Williams argument. Last time I saw one was near Halloween. I guess this one is for the Easter Bunny.

    Ceci and Williams are beloved of right wing columnists. We need to approach their work with scepticism, as commenters here have largely done.

  8. Before I say more about this specific paper, maybe it’ll helpful to clarify the theoretical background of where I’m coming from.

    In my view, no study or paper is ever clear evidence for anything. These original research papers just provide data points for future meta-analyses. As such, it would definitely be a mistake to definitively conclude that there is no gender bias in hiring in academia, or that there is a gender bias in hiring in academia in favor of women, on the basis of this one set of studies. The flip side of this view is that we should not expect any one study, or set of studies, to control for all potentially relevant factors in studying extremely complex social phenomena. For any study, there is nearly always a near infinite number of potentially relevant factors that it cannot control for. Again, what matters is the totality of evidence, and not individual studies.

    In an ideal world, we would all have preregistered experimental designs and planned analyses, which can be debated prior to the results for their appropriateness. In this non-ideal world, I think a good enough proxy is to think about actually similar studies and counterfactually similar studies with different outcomes. Operationalizing the latter: I think to myself something like “if this study had shown an effect in the opposite direction, would I have accepted it as evidence for the general phenomenon being studied?” A lot of times, there are many design choices that are all reasonable, each with its own deficiencies. (And that’s why we really need to look at lots of lots of studies rather than just one!)

    To that extent, I think many of the design choices that Williams and Ceci made in this study are reasonable enough. For example, it is true that the experimental setup does not involve making a real hire, but that is true of the often-cited Steinpress et al (1999) and Moss-Racusin et al (2012) too. For another example, I don’t find it too odd to include manipulation on “lifestyle” (what a terrible name chosen by the authors) variable that is orthogonal to the gender of applicant variable. There are previous CV studies that manipulate similar variables, often via listing of professional society memberships, such as Pedulla (2014). And, statistical analyses can tell us whether the gender variable has an effect, whether the lifestyle variable has an effect, or whether there is an interaction. Similarly, I think choices about using a narrative rather than a CV and doing a contrastive rather than non-contrastive design are all reasonable: if the authors had found an effect of preference against women in hiring using the same exact design, I would have certainly accepted it as additional evidence of gender bias in academia.

    (I agree that the CV vs narrative check appears underpowered, though not atypical for validity checks. The justification for using narrative seems (p. 13 of supplementary material) seems reasonable to me. However, again just to compare with studies that are often cited, in Steinpress et al and Moss-Racusin et al each condition has about N = 60, which isn’t exactly highly powered. (For power, what matters is not the total N but N per condition.))

    So I guess, in the end, I don’t think we need to try to explain or explain away these results qua results of a single set of studies. Instead, I think the design choices are reasonable enough for us to accept it into the larger set of evidence on this interesting phenomenon which needs to be investigated further. For the same reason, I am much more skeptical of the bigger-picture discussions and normative conclusions that the authors attempt to draw.


    By the way, I am not sure I would characterize, say, the preference of a single mother academic over a father with stay at home spouse academic with the same credentials as a “bias”. (This result is from Study 3.) That seems to me just a reasonable inference given what we know about the unjust distribution of “home” responsibilities in our society. Indeed, it’s the inability to make such reasonable inferences that makes me pro tanto skeptical about the value of anonymous hiring procedures.

  9. I thought the original CV studies (the ones that seemed to show anti-woman bias) were also done using “mock” hiring situations. Is that wrong?
    Magicalersatz, there were three CVs for each ‘rater’, in the validation study with 35 engineering faculty members. Although n=35 may seem small, the results were still highly statistically significant (P=.003). The authors explain why they used narratives instead of CVs for the main experiment.
    What’s striking to me is that the results of the validation study are the same as the results of the narrative study. That’s quite a coincidence if pro-woman bias is explained by two quite different factors in the two kinds of studies.

  10. Shen-yi, I agree that the explanation of why they went with narrative summaries rather than CVs in general makes good sense (I take it that the main justification is that no single CV would’ve been plausible across the range of universities they were studying). But what I don’t get is why those narrative summaries included the sorts of personal information (about whether the candidate was single, married, divorced, employed full time child care, had taken maternity leave, etc). Again, this is the kind of information that most HR departments would faint if they saw included in a hiring summary.

    The other thing that’s striking is that the kinds of narrative summaries they went with included things like:

    “.In addition, the chair’s comments about fit with the department were noted:
    “Dr. Z: Z struck the search committee as a real powerhouse.
    Based on her vita, letters of recommendation, and their own
    reading of her work, the committee rated Z’s research record as
    ‘extremely strong.’ Z’s recommenders all especially noted her
    high productivity, impressive analytical ability, independence,
    ambition, and competitive skills, with comments like, ‘Z produces
    high-quality research and always stands up under pressure,
    often working on multiple projects at a time.’ They described her
    tendency to ‘tirelessly and single-mindedly work long hours on
    research, as though she is on a mission to build an impressive
    portfolio of work.’ She also won a dissertation award in her final
    year of graduate school. Z’s faculty job talk/interview score was
    9.5/10. At dinner with the committee, she impressed everyone as
    being a confident and professional individual with a great deal to
    offer the department. During our private meeting, Z was enthusiastic
    about our department, and there did not appear to be
    any obstacles if we decided to offer her the job.”

    So I take it that what their study suggests is something conditional: if there isn’t gender bias in the way men and women get reports like these, then women have an advantage in hiring. That’s a pretty big ‘if’! Williams and Ceci note, for example, that they wanted to include information about women with children because women with children often perceive bias in STEM fields (and they suggest that their findings are indicative that this perception is mistaken). But I take it that what many women in STEM – and other fields! – perceive is that when they have children, and especially if they take time off, they are much less likely to be described as having a ‘tendency to tirelessly and single-mindedly work long hours on research, as though she is on a mission’, less likely to be believed if they say that there aren’t ‘any obstacles if we decide to offer her the job’, and so on.

  11. That being said, hopefully one valuable upshot of this discussion – which I think Shen-yi’s comment really brings out – is that its a mistake to treat any of these individual studies as definitively ‘showing’ anything. Sometimes I worry that its easy for those of us committed to combating sexism to take the individual studies with conclusions *we like* and run with them – without much pause for question – as though they provide clear evidence or establish scientific fact. It then feels intellectually dishonest when we carefully criticize the methodology of the studies with results we don’t like as much (and surely as a rhetorical move, it’s the kind of thing that will bite you in the ass.)

  12. @magicalersatz comment #11

    That’s really helpful! I don’t think the inclusion of “lifestyle” information is that weird given one research question they want to test: whether “lifestyle” “choices” (like de Cruz, I’m skeptical that these are really choices) is one thing that explains gendered preferences in hiring. But it’s exactly right to say that their conclusion should be a conditional one, and the antecedent is not well supported by the body of literature on, say, differences in recommendation letters. And, as such, the more normative conclusions that Williams and Ceci want to draw are overreaches at best.

Comments are closed.