It’s Halloween and they’re back! Williams and Ceci again

Recent  research reports significant faculty bias against women students in science.  

However, Williams and Ceci have an op-ed piece in the NY Times stating a conflicting conclusion from their recent research:  there’s no bias aainst women in math-intensive fields in STEM.  Their piece links to a forthcoming article by them.

Are they right?  If you have the time, you might try to analyze their work.  I don’t have the time, so let me simply urge a lot of caution when you read about the recent work.  They published a similar conclusion in 2011, and there turned out to be serious problems with their reasoning.  We discuss some of them here.

10 thoughts on “It’s Halloween and they’re back! Williams and Ceci again

  1. I looked into their work a few years ago, for a page or two of my dissertation. IIRC, the outcome data they cite — concerning hiring, pay, funding, citation, and so on — is uncontroversial. Then, they reason, these outcome data aren’t consistent with explanations based on discrimination, in the sense of individuals’ attitudes towards women. Eliminating some other potential factors (e.g., math ability), they conclude that women and men have different preferences and make different choices.

    Someone might argue that discrimination could still be at work. Very roughly, if the pool of women were, overall, “more talented” than the pool of men, and the effects of discrimination exactly counterbalanced this difference in talent, then we could have both the outcome data and discrimination at the same time. That seems pretty ad hoc to me, but maybe someone could make it work.

    More importantly, Williams and Ceci don’t consider harassment as a potential factor. The term “harassment” doesn’t occur at all in the APS piece (linked in the NYT piece). And, IIRC, they don’t really consider how preferences are formed and choices are made, and consequently whether there might be problems lurking behind the façade of Pareto optimality. A woman who drops out of the tenure-track — or decides to get a teaching certificate instead of going for her Ph.D. — because of a combination of endless micro-inequalities/harassments, the need to solve the two-body problem, and gendered expectations about childcare is, on their view, merely making a “choice” based on her “preferences,” and that’s nothing to worry about.

  2. I ploughed through the paper and I am baffled it made it through the peer review. Almost all their rebuttals of the workings of bias are structured as “studies X, Y and Z indicate this, however, we took some sample of universities and used such-and-so methodology, and crunched the data as follows, and voila, no effect. This way of working is problematic, to say the least. There is selective and biased reporting of the data. For instance, as Emily Willingham (link below) writes: “Take a look at Figure 14 in their paper. The analysis suggesting that women are cited as often as men is weird and selective, but this graph is pretty clear: Men are still published significantly more than women. There are so many significant difference asterisks on those graphs, they look like a tiny galaxy. I know the H index is a hot new thing, but which one matters more still on your CV: Your citation count or your publication list?” And “Their data show lower salaries for women in academic STEM compared to men, almost across the board (Table 4 and Figure 17; note the drop in salaries for female assistant professors from 1995 to 2010 and that they’re at 85% of what male assistant professors were paid); lower job satisfaction for women (Table 19); fewer publications than men across fields (with one exception) in early career, whether we have children or not (Figure 16); fewer publications than men in most fields even when we’re full professors (Figure 14); more hours worked than men (Figure 15, not significant); scarcely breaking 30% representation in the “math-heavy” STEM fields (Figure A1; damn you, kindergarten! and Figure 1–note the lumping of life sciences with psychology and social sciences–I have a problem with that, and this paper is one example of why); and a dropping off of women from the pipeline between BS and PhD (Figure 2). Where I come from, we call that institutional bias. I gotta say, though, that when it comes to psychology, women sure are representin’.” – so their data do not support their conclusions.
    And framing women’s dropping out in terms of individual choice, without an investigation of the potential role of harassment, or expectations in domestic task distributions, is misleading.

  3. The journal publishes invited pieces, and doesn’t seem to use peer review:

    I don’t want to defend Ceci and Williams or criticize Willingham too much, because I agree with most of Willingham’s critique. But, in the quotation Helen Cruz gave, Willingham seriously misinterprets the statistics Ceci and Williams and their coauthors report. With on the order of 100 t-tests in the whole article, using a simple (but statistically conservative) correction the individual comparisons are only conventionally statistically significant when p < .00005. That means, at most, only the three-star comparisons are statistically significant. This Wikipedia article gives a good explanation:

    (Willingham could rightfully complain that Ceci and Williams and their coauthors haven't made anything like these corrections, either.)

  4. Hi Dan, I’m not a statistician, but isn’t it inappropriate to use a conservative method here? If the null hypothesis is that there is no sexism in these fields, then using a conservative method would enable the authors to determine whether it could be definitively shown that there is sexism in those fields. But they are claiming to have shown definitively that there is no sexism in those fields; for that reason (and also because it’s the common-sense view) shouldn’t they take the null to be that there is sexism? The article you linked said that that method of correction tended to suppress false positives at the cost of producing false negatives; since the authors are claiming to have proved a negative result, the conservative method seems wildly inappropriate.

    It was pretty remarkable scanning through their graphs that there would be a whole bunch of graphs showing that women were worse off than men in various measures in various fields, with maybe one field where that wasn’t the case, but all the individual comparisons were marked as statistically significant. If sexism really weren’t a factor, shouldn’t we expect women to do better than men about half the time? From what you describe it sounds like they chose a method that minimizes that–which might not be a problem if they wrote their results up as “We haven’t been able to prove the sciences are sexist,” but is definitely a problem when they’re positively asserting that the sciences aren’t sexist.

    This is leaving aside a lot of the other problems that Helen de Cruz and WIllingham and others report (e.g., the assumption that women having children and winding up doing extra childcare is a lifestyle choice).

  5. Hi Matt,

    That’s a great point. The null hypothesis is often thought of as something like “the hypothesis that you’re trying to reject,” but IIRC it’s formally defined as something more like “the hypothesis that the value is zero,” and specifically “the value” here is some aggregate difference between women and men, like the mean number of publications. As it’s usually (and badly) taught, the logic of statistical hypothesis testing is all about whether or not the evidence is sufficient to reject the null. It can’t tell you whether the evidence is sufficient to accept the null — to paraphrase you, if you’re positively asserting that there really is no difference. Here’s a nice explanation:

    Unfortunately, as suggested by that link, statistics textbooks often ignore or even explicitly dismiss this point. It’s entirely standard practice in psychology and economics, which is probably part of the reasons why I missed this, too. That’s not to apologize for Ceci and Williams; it’s to condemn the misuse of statistics in social science.

    (Two asides: 1. One argument for Bayesian techniques — which don’t depend on subjectivism about probabilities — is that they don’t have these limitations, and can let you accept the null. 2. Deborah Mayo’s error statistics philosophy uses the classical tests in a very different way — they tell you how severely you’re testing hypotheses, not whether to accept or reject them. It would be worthwhile to run Ceci and Williams’ data through some Bayesian or severity test algorithms, but unfortunately I don’t have time today.)

    Again, great point!

  6. For what it’s worth, I am told by a social psychologist of my acquaintance that PSPI *is* peer reviewed. It’s a journal all of whose issues are, in effect, special issues, so the submissions are all invited — and Ceci and Williams are on the editorial board for PSPI — so probably more of the submissions are accepted than in a journal that is getting mostly unsolicited submissions — but still, there is peer review of the articles.

    How one is inclined to view the rigor of that peer review given the apparent methodological problems in this particular study is another matter…

  7. PSPI is peer reviewed, but Ceci and Williams are on the editorial board. What can one say?

  8. “We asked ourselves, who are the world’s top experts in things that we work on? It turned out to be us, so we invited ourselves to submit an article to our journal about our work. We all agreed that we did top-notch work.”

    Well, heck, if that’s a good way to run a journal, surely it’s good enough for a philosophy rankings website.

  9. Looking over my comment again it had a sense-reversing typo; it should’ve read “there would be a whole bunch of graphs showing that women were worse off than men in various measures in various fields, with maybe one field where that wasn’t the case, but all the individual comparisons were marked as statistically insignificant.”

Comments are closed.