Double anonymous review helps women

Women are more likely to be accepted to speak at academic conferences if applications are anonymised to remove any mention of their gender, a study suggests.

In the latest piece of evidence to support the “Matilda effect” – where women in male-dominated fields are rated more harshly by peer reviewers – a review of a leading international conference found that papers with a female first author were viewed more positively once clues to the applicant’s gender were removed.


The study was also important because it revealed gender bias in this particular field, even though a male-to-female speaker ratio (roughly 50:50 in 2012 and 2014) did not suggest an immediate problem, Dr Roberts explained.

“We’ve shown that sometimes a bias is there even when raw numbers show equality,” he added.

“Young people also do better under double-blind reviewing as they can be recognised for the quality of their work, rather than winning credit just for their name and reputation,” Dr Roberts said.

Read more at THES, or the paper.

4 thoughts on “Double anonymous review helps women

  1. Good to see this, and it is worthwhile doing repeated studies so news items like this keep coming out: people need reminding! But the actual result is not, alas, at all surprising.

  2. Not surprising, but there are fewer solid studies showing that double anonymous review makes a difference than one might think. The linked article cites Budden, but there are lots of problems that have emerged with that one.

  3. I have to look into this more closely, but it looks very much like there was a statistical error in this paper. I’ve downloaded their data from github and re-ran the stats, and in the crucial analysis (paired change in ranking, the paragraph preceding Figure 3) there is no main effect of Gender, only a Gender*StudentStatus interaction (with opposite gender effects for students and non-students); it looks like the authors have made the mistake [quite common in our field] of mistaking dummy-coded regression coefficients for main-effect coefficients. On the bright side, the other important analysis (post-hoc t-tests in the paragraph above that) still holds up. Nevertheless, there are a lot of other reasonable ways the data could be analyzed (e.g., using the raw scores rather than rankings) where the stats don’t come out, so there seems to be a substantial ‘garden of forking paths’ problem (Gelman, 2014) here. This is a shame because research into this sort of reviewing bias is really important, and some really carefully conducted studies do show at least some effect (e.g. Knobloch-Westerwick et al. 2013, Science Communication) but I think the results from this dataset may be inconclusive.

  4. I’m curious to hear more about the problems with the Budden et al 2008 study. To be sure, it’s only one study with limited data whose results perhaps might not be generalizable. However, early critiques of the study (e.g., Webb et al 2008 and Nature’s retraction about the study’s merits) were addressed by Budden et al in a response to Webb et al (in the same volume). Oddly enough, this early response by Budden et al is rarely mentioned (yet Webb et al’s critique is) in recent dismissals of the Budden et al study. This early response can be found at

Comments are closed.