A guest-post:
Wendy Williams and Stephen Ceci have just published an article in PNAS titled “National hiring experiments reveal 2:1 faculty preference for women on STEM tenure track” (here). The article is striking, and seems to show a great deal of progress in gender equity in hiring (notwithstanding worries that some have expressed that this study demonstrates “reverse discrimination”). There has been interesting discussion of the article on Facebook (FB), the Daily Nous, and New APPS, and most of what I say here is a reworking of points that others have already made. First I’ll make a couple positive points about the article; then raise a worry about the authors’ interpretation of their data; and then raise a few questions about the data.
On the positive side, W&C’s data tells us more than we knew before about how gender attitudes and gender discrimination work. As Edouard Machery said on FB, we need to know the facts in order to create effective interventions. This seems right. The question, I think, is what exactly the study shows, and whether it shows what the authors think it shows.
Also on the positive side—although not emphasized by the authors—W&C’s data do a nice job of illustrating the context-specificity of bias and discrimination. W&C focused exclusively on “highly accomplished candidates.” As Bryce Huebner has pointed out (also Susan Sterrett on FB), attitudes toward this group of job applicants (or stereotypes about them) may be quite different from attitudes toward candidates perceived as more moderately accomplished. (Evidence shows that biases are more active in ambiguous situations; see below for more discussion on this.) W&C may help us to see, in other words, how gender bias targets different women differently. In the literature on implicit intergroup attitudes, as well as explicit intergroup attitudes, it is clear that specific biases target specific groups of people in specific contexts. Susan Fiske’s “Stereotype Content Model” is exemplary on this point, when it comes to explicit attitudes. Many studies in the implicit attitude literature (e.g., Rudman & Kilianski 2000, on gender-authority implicit stereotypes) also demonstrate context-specificity. This can also be seen by thinking about the strengths and weaknesses of the Implicit Association Test (IAT). As Alex Madva and I have argued (here), because the IAT measures generic preferences, its principle strength is predicting a wide range of behaviors. Its principle weakness, however, is small effect sizes, which arguably have to do with the way in which particular combinations of stereotypes and prejudices are activated in particular contexts. The IAT is like a jack of all trades, and master of none. (This is not to endorse recent criticism of the IAT, particularly from Oswald et al. 2013. See below for more on this.)Of course, W&C claim something much more sweeping for their study, namely, that “it is a propitious time for women launching careers in academic science.” (Similarly: “Efforts to combat formerly widespread sexism in hiring appear to have succeeded. After decades of overt and covert discrimination against women in academic hiring, our results indicate a surprisingly welcoming atmosphere today for female job candidates in STEM disciplines . . ..”) This claim seems unwarranted for several reasons. This first, of course, is that the study can only tell us about perceptions of “highly accomplished candidates,” not candidates of all kinds. Second, there are, of course, many other locations and ways in which women face bias and discrimination in academic careers (as this blog clearly demonstrates, and as Helen de Cruz has pointed out on FB). W&C focus specifically on hiring for Assistant-level tenure-track positions. Their data don’t tell us about hiring for other kinds of positions (lectureships, senior hires, etc.), nor do the data tell us about promotion decisions, publication biases, salary issues, micro-aggressions, chilly atmospheres, and, of course, explicit harassment. (As Bryce Huebner points out, there are also all the forms of bias that affect candidates before they reach the stage of being perceived as academic superstars.) All of this adds up (as Virginia Valian (1998, 2005) has argued, and more recently, Greenwald and colleagues have argued too (here), in responding to Oswald and colleagues’ critique of the IAT; Greenwald and colleagues’ basic point (aside from pointing out methodological problems in Oswald’s meta-analysis) is that small effect sizes add up to significant social forces).
But this is all granting W&C’s data, and I’m not sure we should do that too quickly. One very general worry, as Alex Madva suggests, is that a 2:1 preference for women over men in STEM hires just doesn’t seem to pass the smell test. Is this really plausible, if we step back and consider everything else we know about gender and science?
A more specific worry stems from the materials W&C used. In other hiring studies (such as Moss-Racusin et al. 2012), which have shown clear gender bias against women, all participants received the same one set of application materials, with the only change between participants being the name at the top. There is an advantage to W&C’s approach, which was to have participants review a set of candidates at once; the advantage is a kind of ecological validity. Normally, we evaluate sets of candidates all at once. But there is a big disadvantage to W&C’s approach too, which is that it enables the comparative social identities of the candidates to become more salient. In studies in which participants evaluate just one CV or dossier, the presentation of the social identity of the candidate is much subtler. If implicit biases are in play, they are more likely to be elicited in the latter, more carefully controlled, scenario. (The broader point here is that there may be a trade-off between an experiment’s ecological validity and its ability to identify specific factors affecting people’s judgment and behavior. There is a reason why controlled laboratory experiments aren’t exactly lifelike!)
Another methodological worry has to do with W&C’s mock narrative descriptions of the candidates. The narrative descriptions include a number of possible confounds, including information about the fictitious candidate’s marital status and “partner/family issues.” Moreover, in one of their validation studies, which did use CVs, W&C report that they abandoned using identical CVs for top male and top female candidates. Another worry about one of the validation studies—the one that used CVs instead of narrative descriptions—is the very low number of participants (n = 35).
Adding all this up: (a) participants in W&C’s study know that this is only a mock hire; (b) they know they are comparing applications from men and women; and (c) all of the candidates are described as excellent (e.g., “a powerhouse”). Given all of this, it seems reasonable to think that participants took this as an opportunity to appear unbiased. The stakes are low; gender is salient; and all of the candidates are stellar. So it remains very plausible that biased hiring persists in this (i.e., TT junior hires in STEM) and other areas, when the stakes are higher and the quality of the candidates is harder to decipher.
This last point can be broadened. When it comes to implicit bias, at least, stereotypes and prejudices seem to affect us the most when we have to make hard, ambiguous decisions. They intrude when you are looking at two applications (or papers for publication in a journal, or student papers), and you are searching for reasons to prefer one to the other, thinking “uhhh . . . I don’t know . . . this one has these strengths but these weaknesses . . . but this one has these other strengths and these other weaknesses . . . well, I guess I’ll pick this one . . . it just seems a little better.” I don’t think W&C captures this situation accurately.*
Finally, a small point. The authors note their “surprise” at their findings, and in a Nature article describing the study, Williams says she was “shocked” by the data. This strikes me (and Helen de Cruz on FB) as strange, and possible disingenuous, given than W&C have been publishing articles purporting to show the absence of gender bias in the academy for years.
(Thanks so much to Alex Madva, Bryce Huebner, and Jenny Saul for help putting these thoughts together.)
*In their 2012 study, Corinne Moss-Racusin and colleagues anticipate this very problem with W&C’s study. They write: “Following conventions established in previous experimental work (11, 12), the laboratory manager application was designed to reflect slightly ambiguous competence, allowing for variability in participant responses and the utilization of biased evaluation strategies (if they exist). That is, if the applicant had been described as irrefutably excellent, most participants would likely rank him or her highly, obscuring the variability in responses to most students for whom undeniable competence is frequently not evident. Even if gender-biased judgments do typically exist when faculty evaluate most undergraduates, an extraordinary applicant may avoid such biases by virtue of their record. This approach also maintained the ecological validity and generalizability of results to actual undergraduate students of mixed ability levels.”
[…] Michael Brownstein has written a very thorough post at the Feminist Philosophers blog discssing a widely publicized study by Wendy Williams and Stephen Ceci that’s supposed to show that women are favored in academic hiring. […]
[…] UPDATE 2 (4/15/15): Michael Brownstein (CUNY / John Jay) has some extensive comments on the study here. […]
Thanks for the post – I was thinking how describing male and female candidates as “powerhouses” (a gendered term, by the way, that I’ve never heard to describe a woman) makes it easy for the participants to choose the woman in this low-stake situation and thereby look good – it reminded me of this study by Monin and Miller (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.8290&rep=rep1&type=pdf), where people were confronted with superior African-American and female candidates in mock hiring situations, and unsurprisingly, hired them. Interestingly, afterwards, having established their lack of prejudice, were more likely to reject female or African-American candidates for “a job stereotypically suited for majority members” (the so-called moral licensing effect). A worry of mine is that WC’s studies can generate this effect – now committee members can think (in STEM) “Well it appears we’re not prejudiced after all. A study clearly demonstrated it, we even favor female candidates. But in this case (say, an on campus), the male candidate is clearly superior to the woman. So we aren’t being sexist by selecting the man”. I hope this doesn’t happen, but it’s a risk of WC’s works, a risk that outweighs any potential benefits they allege their studies have.
I second Helen’s worry about the effects of this reasearch and also her pointing out, on New Apps, that W&C have a research program. It even extends to topics like tenure, which they oppose. I’d call it “why you shouldn’t pay attention to all these liberal worries.” With respect to science, it really is quite pernicious since we are harmed by the internal brain drain of women from science.
I also thinking it is shocking that the authors are generalizing to real life from a study where the participants knew the outcomes didn’t matter.
I read this study with the background belief that numerous biases pervade academic life. I continue to think that. It’s perfectly compatible with this being a strong study and with the idea that it’s a great time for more women to enter science. Knowing I might have some kind of advantage in securing a job might encourage me to enter a field. That said I’m having a hard time identifying any actual objections to the research over your gloss of it. That biases could occur among applicants with fewer credentials is an interesting speculation but not an objection. The point that other biases surely exist at other levels was not tested, and is compatible with the finding. Mention of possible confounds is vague and various features of the narratives were actually fairly well controlled for purposes of testing them. The observation that results were driven in part by people trying not to look bias seems really reasonable but also not an objection. If that happens in low stakes anonymous mock situations it probably also will in very high stakes situations when everyone could find out you are a professor working at a department with very bad hiring practices.
Thanks to Michael Brownstein for his post and to Jenny Saul for publishing it. The post raises/reiterates many interesting and worthwhile points that will continue providing food for thought.
In the meantime, I have a question about the conclusion: “Given all of this, it seems reasonable to think that participants took this as an opportunity to appear unbiased. The stakes are low; gender is salient; and all of the candidates are stellar. So it remains very plausible that biased hiring persists in this (i.e., TT junior hires in STEM) and other areas, when the stakes are higher and the quality of the candidates is harder to decipher.” As I understand this, Michael is tentatively proposing that W&C’s findings are due merely (or mainly) to socially desirable responding.
My question arises in light of what struck me as a rather important point in the General Discussion. W&C write, “Real-world data ratify our conclusion about female hiring advantage. Research on actual hiring shows female Ph.D.s are disproportionately less likely to apply for tenure-track positions, but if they do apply, they are more likely to be hired, sometimes by a 2:1 ratio. […] Many studies have argued that because only the very top women persist in math-intensive fields, their advantage in being hired is justified because they are more competent than the average male applicant. This is why an accurate evaluation of gender preference in hiring depends on data from an experiment in which competence is held constant. […] [R]eal-world hiring data showing a preference for women, inherently confounded and open to multiple interpretations because of lack of controls on applicant quality, experience, and lifestyle, are consistent with our experimental findings.”
Understood in this context, does it still seem reasonable to attribute W&C’s experimental findings to socially desirable responding?
Assuming that W&C are accurately characterizing this data on academic hiring — among other things, an NRC report on over 1800 hires at 89 PhD-granting institutions — then it does not seem reasonable, to me at least, that W&C’s experimental findings merely reflect socially desirable responding. Instead, the findings seem to provide some initial, defeasible support for one mechanism that would help explain observed patterns in academic hiring.
Thanks for the follow-ups everyone. Helen – I think what you say about something like what Monin & Miller found being in play in W&C is exactly right. And Anne – yes, this paper is definitely part of a series. Carole Lee has written a great critique of another of their papers.
Thanks for this. A couple of related thoughts: first, as you say, the fact that the cvs are of “powerhouses” is likely to trigger different stereotypes. We are aware of our general propensity to talk in terms of a magical talent (this has been discussed a lot lately thanks to Sarah Jane Leslie’s work on it), and I think there is an interesting gender aspect to this that is relevant here. Some women get picked out as fitting into the superstar category, but implicitly *as exceptions*. Think about the older generation of women in philosophy (born before 1945) who succeeded in their careers and often report having experienced no sexism. What is going on? The men around them saw them as exceptional. The fact that they were a small minority was essential to that. Now we have more women it is harder for that to work. But I think the profession still functions like that in that a very few women are held up as superstars and those few are used as proof that there is no sexism.
Another point – why do people keep posting and sharing the original article with no comment? Are they unaware of the obvious social meaning that posting it has?
This is a commentary by the sociologist Zuleyka Zevallos, who argues that W&C’s methodology is flawed (H/T Sylvia Wenmackers). http://othersociologist.com/2015/04/16/myth-about-women-in-science/
Among the points she makes:
– using psychologists (who work a lot about gender) as a control sample to validate the study (with STEM faculty) is not warranted
– as has been noted here and elsewhere, we don’t hire people based on narratives. CVs are the first rough cut
– so, “participants self-selected to participate in a study knowing they’d be judging hypothetical candidates” – in a design where it is very transparent that they were looking for gender differences
– ” They have produced data about how scientists respond to a study about gender bias in academia, when they can easily guess that gender bias is being observed. Academics already understand that gender discrimination is morally wrong and unlawful.”
– the problem is, the study says nothing about unconscious biases, which will come into play the moment people need to make actual rather than hypothetical hiring decisions.
Building on her points, I would say (in response to those on fb and elsewhere who argue the study at the least shows a positive trend), the following: the study shows that people who realize that gender bias is being measured know that gender bias against women is wrong, and perhaps overcompensate by choosing women at greater rates. Whether this realization also translates itself in less biased hiring decisions is a different matter – Eric Schwitzgebel showed, for instance, how weak the correlation (if at all) is between thinking something is wrong (e.g., eating meat) and not doing it (vegetarianism).
I am no fan of Williams and Ceci, but your criticism doesn’t accurately reflect the paper that they wrote. They conducted an additional experiment using full CV’s which replicated their main finding, and they sent some participants a single applicant to rate, so gender wasn’t salient. The most apt criticism, I believe, is that they looked only at “powerhouse, 9.5/10” candidates, when in fact 95% of women won’t be at that level. Would have been interesting to conduct the same type of study across a wide range of qualifications.
Regarding ““it is a propitious time for women launching careers in academic science” there is also the issue of whether the real challenges and biases show up these days not at entry-level but mid-career (‘in the trenches’) as one moves to potentially being a leader in one’s institution. I know this has been my own experience.
Thanks again for the comments everyone. I agree with much of what Helen says about the methodology of the study. In addition:
Anon: the liklihood that other biases exist is an objection to W&C’s paper, since what they conclude is that “efforts to combat formerly widespread sexism in hiring appear to have succeeded. After decades of overt and covert discrimination against women in academic hiring, our results indicate a surprisingly welcoming atmosphere today for female job candidates in STEM disciplines . . ..”
John Turri: yes, I agree that the apparent correlation between W&C’s findings and the real-world actuarial data they cite is interesting, and poses a challenge to my (and others’) speculation about socially desirable responding. I don’t know this data, beyond what they cite, and I’m honestly not sure what to think about it yet. In the section you quoted, W&C seem to dismiss the interpretation of the actuarial data according to which only the most qualified women make it to the application phase in math-intensive fields. Their claim seems to be that because they controlled for the quality of the applicants in their study, then participants’ preference for women candidates can’t be due to perceived differences in the quality of the applicants. One worry I have, though, is that their control for the quality of the candidates wasn’t very good, given all the possible confounds in the narrative descriptions of the candidates. More broadly, on my speculation about socially desirable responding, both Helen de Cruz and Neil Levy have pointed out how this is testable. If participants are picking women to appear unbiased, then we should see subsequent moral licensing effects.
Elinor: great point! I totally agree.
D: the validation study using full CVs had only 35 participants, so I’m doubtful that we can tell much from it. The second validation study, which just sent participants one applicant’s materials, had a larger n, but it used the narratives, which made gender salient in the ways that I worried about above.
I am hopeful the things this study suggests could be true someday will come true in my own university and department. After many years and many searches I’m still waiting for a woman to be hired or even for more than one woman to make a short list for interviews. Hope springs eternal.
MB: That is not an objection to this experiment or methodology, only an obvious limitation about what can be inferred from it. No doubt the discussion could have been more carefully worded in places.
I wonder if anyone told Corinne Moss-Racusin that her work on CV bias was ‘shocking’. I hope not. Most people thought it was very interesting, though of course not conclusive.
The authors include their significance testing for that study, and the results certainly pass the usual tests for significance. Can you be more specific about your skepticism?
[This below comment is a response to Dr. Zuleyka Zevallos’s critique of the PNAS study on STEM faculty hiring bias by Wendy Williams and Stephen Ceci. http://othersociologist.com/2015/04/16/myth-about-women-in-science/%5D
Zuleyka, thank you for your engaging and well researched perspective. On Twitter, you mentioned that you were interested in my take on the study’s methods. So here are my thoughts.
I’ll respond to your methodological critiques point-by-point in the same order as you: (a) self-selection bias is a concern, (b) raters likely suspected study’s purpose, and (c) study did not simulate the real world. Have I missed anything? If so, let me know. Then I’ll also discuss the rigor of the peer review process.
As a forewarning to readers, the first half of this comment may come across as a boring methods discussion. However, the second half talks a little bit about the relevant players in this story and how the story has unfolded over time. Hence, the second half of this comment may interest a broader readership than the first half. But nevertheless, let’s dig into the methods.
(a) WAS SELF-SELECTION A CONCERN?
You note how emails were sent out to 2,090 professors in the first three of five experiments, of which 711 provided data yielding a response rate of 34%. You also note a control experiment involving psychology professors that aimed to assess self-selection bias.
You critique this control experiment because, “including psychology as a control is not a true reflection of gender bias in broader STEM fields.” Would that experiment have been better if it incorporated other STEM fields? Sure.
But there’s other data that also speak to this issue. Analyses reported in the Supporting Information found that respondents and nonrespondents were similar “in terms of their gender, rank, and discipline.” And that finding held true across all four sampled STEM fields, not just psychology.
The authors note this type of analysis “has often been the only validation check researchers have utilized in experimental email surveys.” And often such analyses aren’t even done in many studies. Hence, the control experiment with psychology was their attempt to improve prior methodological approaches and was only one part of their strategy for assessing self-selection bias.
(b) DID RATERS GUESS THE STUDY’S PURPOSE?
You noted that, for faculty raters, “it is very easy to see from their study design that the researchers were examining gender bias in hiring.” I agree this might be a potential concern.
But they did have data addressing that issue. As noted in the Supporting Information, “when a subset of 30 respondents was asked to guess the hypothesis of the study, none suspected it was related to applicant gender.” Many of those surveyed did think the study was about hiring biases for “analytic powerhouses” or “socially-skilled colleagues.” But not about gender biases, specifically. In fact, these descriptors were added to mask the true purpose of the study. And importantly, the gendered descriptors were counter-balanced.
The fifth experiment also addresses this concern by presenting raters with only one applicant. This methodological feature meant that raters couldn’t compare different applicants and then infer that the study was about gender bias. A female preference was still found even in this setup that more closely matched the earlier 2012 PNAS study.
(c) HOW WELL DID THE STUDY SIMULATE THE REAL WORLD?
You note scientists hire based on CVs, not short narratives. Do the results extend to evaluation of CVs?
There’s some evidence they do. From Experiment 4.
In that experiment, 35 engineering professors favored women by 3-to-1.
Could the evidence for CV evaluation be strengthened? Absolutely. With the right resources (time; money), any empirical evidence can be strengthened. That experiment with CVs could have sampled more faculty or other fields of study. But let’s also consider that this study had 5 experiments involving 873 participants, which took three years for data collection.
Now let’s contrast the resources invested in the widely reported 2012 PNAS study. That study had 1 experiment involving 127 participants, which took two months for data collection. In other words, this current PNAS study invested more resources than the earlier one by almost 7:1 for number of participants and over 18:1 for time collecting data. The current PNAS study also replicated its findings across five experiments, whereas the earlier study had no replication experiment.
My point is this: the available data show that the results for narrative summaries extend to CVs. Evidence for the CV results could be strengthened, but that involves substantial time and effort. Perhaps the results don’t extend to evaluation of CVs in, say, biology. But we have no particular reason to suspect that.
You raise a valuable point, though, that we should be cautious about generalizing from studies of hypothetical scenarios to real-world outcomes. So what do the real-world data show?
Scientists prefer *actual* female tenure-track applicants too. As I’ve noted elsewhere, “the proportion of women among tenure-track applicants increased substantially as jobseekers advanced through the process from applying to receiving job offers.”
https://theconversation.com/some-good-news-about-hiring-women-in-stem-doesnt-erase-sex-bias-issue-40212
This real-world preference for female applicants may come as a surprise to some. You wouldn’t learn about these real-world data by reading the introduction or discussion sections of the 2012 PNAS study, for instance.
That paper’s introduction section does acknowledge a scholarly debate about gender bias. But it doesn’t discuss the data that surround the debate. The discussion section makes one very brief reference to correlational data, but is silent beyond that.
Feeling somewhat unsatisfied with the lack of discussion, I was eager to hear what those authors had to say about those real-world data in more depth. So I talked with that study’s lead author, Corinne Moss-Racusin, in person after her talk at a social psychology conference in 2013.
She acknowledged knowing about those real-world data, but quickly dismissed them as correlational. She had a fair point. Correlational data can be ambiguous. These ambiguous interpretations are discussed at length in the Supporting Information for the most recent PNAS paper.
Unfortunately, however, I’ve found that dismissing evidence simply because it’s “correlational” can stunt productive discussion. In one instance, an academic journal declined to even send a manuscript of mine out for peer review “due to the strictly correlational nature of the data.” No specific concerns were mentioned, other than the study being merely “correlational.”
Moss-Racusin’s most recent paper on gender bias pretends that a scholarly debate doesn’t even exist. Her most recent paper cites an earlier paper by Ceci and Williams, but only to say that “among other factors (Ceci & Williams, 2011), gender bias may play a role in constraining women’s STEM opportunities.”
dx.doi.org/10.1177/0361684314565777
Failing to acknowledge this debate prevents newcomers to this conversation from learning about the real-world, “correlational” data. All data points should be discussed, including both the earlier and new PNAS studies on gender bias. The real-world data, no doubt, have ambiguity attached to them. But they deserve discussion nevertheless.
WAS THE PEER REVIEW PROCESS RIGOROUS?
Peer review is a cornerstone of producing valid science. But was the peer review process rigorous in this case? I have some knowledge on that.
I’ve talked at some length with two of the seven anonymous peer reviewers for this study. Both of them are extremely well respected scholars in my field (psychology), but had very different takes on the study and its methods.
One reviewer embraced the study, while the other said to reject it. This is common in peer review. The reviewer recommending rejection echoed your concern that raters might guess the purpose of the study if they saw two men and one woman as applicants.
You know what Williams and Ceci did to address that concern? They did another study.
Enter data, stage Experiment 5.
That experiment more closely resembled the earlier 2012 PNAS paper and still found similar results by presenting only one applicant to each rater. These new data definitely did help assuage the critical reviewer’s concerns.
That reviewer still has a few other concerns. For instance, the reviewer noted the importance of “true” audit studies, like Shelley Correll’s excellent work on motherhood discrimination. However, a “true” audit study might be impossible for the tenure-track hiring context because of the small size of academia.
The PNAS study was notable for having seven reviewers because the norm is two. The earlier 2012 PNAS study had two reviewers. I’ve reviewed for PNAS myself (not on a gender bias study). The journal published that study with only myself and one other scholar as the peer reviewers. The journal’s website even notes that having two reviewers is common at PNAS.
http://www.pnas.org/site/authors/guidelines.xhtml
So having seven reviewers is extremely uncommon. My guess is that the journal’s editorial board knew that the results would be controversial and therefore took heroic efforts to protect the reputation of the journal. PNAS has come under fire by multiple scientists who repeatedly criticize the journal for letting studies simply “slip by” and get published because of an old boy’s network.
The editorial board probably knew that would be a concern for this current study, regardless of the study’s actual methodological strengths. This suspicion is further supported by some other facts about the study’s review process.
External statisticians evaluated the data analyses, for instance. This is not common. Quoting from the Supporting Information, “an independent statistician requested these raw data through a third party associated with the peer review process in order to replicate the results. His analyses did in fact replicate these findings using R rather than the SAS we used.”
Now I embrace methodological scrutiny in the peer review process. Frankly, I’m disappointed when I get peer reviews back and all I get is “methods were great.” I want people to critique my work! Critique helps improve it. But the scrutiny given to this study seems extreme, especially considering all the authors did to address the concerns such as collecting data for a fifth experiment.
I plan on independently analyzing the data myself, but I trust the integrity of the analyses based on the information that I’ve read so far.
SO WHAT’S MY OVERALL ASSESSMENT?
Bloggers have brought up valid methodological concerns about the new PNAS paper. I am impressed with the time and effort put into producing detailed posts such as yours. However, my overall assessment is that these methodological concerns are not persuasive in the grand scheme. But other scholars may disagree.
So that’s my take on the methods. I welcome your thoughts in response. I doubt this current study will end debate about sex bias in science. Nor should it. We still have a lot to learn about what contexts might undermine women.
But the current study’s diverse methods and robust results indicate that hiring STEM faculty is likely not one of those contexts.
Disclaimer: Ceci was the editor of a study I recently published in Frontiers in Psychology. I have been in email conversation with Williams and Ceci, but did not send them a draft of this comment before posting. I was not asked by them to write this comment.
dx.doi.org/10.3389/fpsyg.2015.00037
[…] For example, sociologist Zuleyka Zevallos [Other Sociologist], philosopher Michael Brownstein [Feminist Philosophers], and professors Joan C. Williams and Jessi L. Smith [Chronicle of Higher Ed.] […]
[…] Another great article about that Williams and Ceci article Michael Brownstein had a great post on. […]
As to the D’s point about the validation study and Micheal’s reply to it, I think statistically significant findings from small-n studies are as trustworthy as those from larger-n studies. Small n is problematic because it means low statistical power but this is irrelevant to the validation study being discussed.