New paper: Gender and Philosophical Intuition

Wesley Buckwalter and Steve Stich’s new paper, “Gender and Philosophical Intuitions”, is out. I think it’s really interesting. Here’s the first paragraph:

In recent years, there has been much concern expressed about the underrepresentation of women in academic philosophy. A full explanation of this troubling phenomenon is likely to be quite complex since there are, almost certainly, many factors that contribute to the gender disparity. Our goal in this paper is to call attention to a cluster of phenomena that may be contributing to the underrepresentation of women in philosophy, though until now these phenomena have been largely invisible. The findings we review indicate that when women and men with little or no philosophical training are presented with standard philosophical thought experiments, in many cases their intuitions about these cases are significantly different. We suspect that these differences could be playing an important role in shaping the demography of the profession. But at present this is only an hypothesis, since we have no evidence that bears directly on the causal relation between the gender gap in academic philosophy and the facts about intuition that we will recount. In future work, we plan to focus on that causal link. However, we believe that thefacts we report about gender differences in philosophical intuitions are both important and disturbing, and that philosophers (and others) should begin thinking about their implications both for philosophical pedagogy and for the methods that philosophers standardly use to support their theories. It is our hope that this paper will help to launch conversations on these issues both within the philosophical community and beyond.

Thanks, A!

16 thoughts on “New paper: Gender and Philosophical Intuition

  1. They’re working with very small samples — the largest studies look at only a couple hundred people, and the smallest a few dozen. With samples that small relative to the population to which they want to extrapolate (at minimum, college undergrads), it’s actually quite easy to get statistically significantly results from the sample that amount to statistical noise in the population. It’s one of the basic methodological problems that plagues a lot of quantitative psych work.

  2. Actually, I saw this and thought “Didn’t Carol Gilligan write this book like 30 years ago?” Of course, Gilligan was talking about Kohlbergian moral reasoning and this paper is talking about “philosophical intuitions,” but isn’t the point essentially the same–i.e., that women are socialized to think/reason/evaluate things in a different and less socially/institutionally valued way than men are socialized to think/reason/evaluate things?

  3. The chart on page three is just depressing. I mean we all know how bad the situation is, but seeing the numbers there in black and white is still shocking. (I’m pretty sure the department with the “best” numbers on the chart, Yale, has a lower percentage of women faculty than is reported, since Kati Balog has left for Rutgers. If that’s right, then no department in the Leiter top 20 has more than 27% female faculty.)

  4. For what it’s worth, my colleagues and I have a paper in press at the Journal of Cognition and Culture where we look at the effects of demographic factors–including reported gender, among other factors–on judgments in moral psychology experiments (I’ve included the abstract below). In line with this research, we did find that reported gender had a statistically significant effect on judgments in **some** moral psychology experiments; however, and this is the key point, we found that the size of this effect, where it emerged, tended to be incredibly small.

    In cases where a scaled response was provided, no demographic variable accounted for more than 9% of total variance, in fact, most accounted for less than 5% of the total variance in participants’ responses. Additionally, where a dichotomous responses was provided, reported gender yielded a significant effect in just 4 out of 30 scenarios; moreover, there did not seem to be any interesting sense in which these scenarios were theoretically unified.

    Of course, none of this is to deny the claim that there is a “chilly climate” that emerges in academic philosophy departments! So, I’m happy to be the first in line wherever we can find effective ways of attenuating the pernicious effects of the kierarchical power structures that pervade academic philosophy. However, I am skeptical about appeals to gender-based differences in initial intuitions as a mechanism for propagating asymmetries of power.


    Click to access BanerjeeEtAl.pdf

    Abstract. Research on moral psychology has frequently appealed to three, apparently consistent patterns: 1) Males are more likely to engage in transgressions involving harm than females; 2) Educated people are likely to be more thorough in their moral deliberations because they have better resources for rationally navigating and evaluating complex information; 3) Political affiliations and religious ideologies are an important source of our moral principles. Here, we provide a test of how four factors—gender, education, politics, and religion—affect intuitive moral judgments in unfamiliar situations. Using a large-scale sample of participants (N = 8778) who voluntarily logged on to the internet-based Moral Sense Test (, we analyzed responses to 145 unique moral and conventional scenarios that varied widely in content. Although each demographic or cultural factor sometimes yielded a statistically significant difference in the predicted direction (e.g., men giving more utilitarian judgments than women; religious individuals giving more deontological/rule-based judgments than atheists), these differences were consistently associated with extremely small effect sizes. We conclude that gender, education, politics, and religion are likely to be relatively insignificant for moral judgments of unfamiliar scenarios. We discuss these results in light of current debates concerning the mechanisms underlying our moral judgments, and especially, the idea that we share a universal moral sense that constrains the range of cross-cultural variation.

  5. Robin– Yes, there are commonalities. But important differences, too. (1) The B/S data concern intuitions over a wide range of subject matter, not just ethics. (2) The Gilligan stuff wasn’t just about thought experiments, but included discussion of real-world decision making. (3) The B/S stuff isn’t tied to claims about women thinking more relationally, emotionally etc. (4) The B/W stuff is being used to argue for different sorts of changes to philosophical pedagogy and methodology. So they seem interestingly different.

    (5) I think, anyway, that the Gilligan gender-difference claims didn’t really hold up all that well. So that makes me surprised by this. (Though if Bryce and Dan are right, maybe this won’t hold up either.)

  6. It might be worth noting that SSRN is a place for posting one’s own papers; they are not, as far as I know, peer-reviewed by experts in the field. This makes me a bit uneasy, because it is, I think, unusual for authors of a paper with a large empirical debt to publicize their results, and claim they have important facts worth the community’s discussion, in advance of professional review. When it has been done in the sciences, it can receive considerable negative reactions, as scientists who announce their results to the NY Times can find out. (There’s a notorious editorial in one learned journal entitled “Brain Scam,” which makes just this point.)

    (Let me say that I do have the highest respect for Stich’s work and I will be happy if I learn I have gotten something wrong here.)

    Mind you, I’m not especially a fan of the quixotic peer review process, but this case does have a special feature. Explanations in terms of hypotheses to the effect that a fundamental problem is that the women are different can have some bad features. Not the least of these features is that it may distract us from explanations that afford more immediate action. It also may let people off worrying about what they are doing to continue the discrimination.

    Philosophy and many of the sciences and engineering have closely similar gender problems. I don’t know of any explanation of the problems in the sciences and engineering of the form “women are different” that’s survived much scrutiny.** Hence, I should have thought such a hypothesis in philosophy should have to meet really high standards.

    **Some purported differences have seemed to have led to actions benefitting women in fairly local circumstances; perhaps there is something like a placebo effect operating. Women students might feel empowered and more worthwhile if a department asks them to help change itself to benefit them.

  7. Let me add: the paper, said to be “this remarkable paper,” is also mentioned on Leiter’s blog, along with its first paragraph urging that the community discuss its important facts.

    I dearly hope we won’t be hearing for the next ten years that the problem with women in philosophy is that we don’t like rigorous argumentation AND we think differently. Unless, that is, the latter is true – which comments above put into question – and there’s investigation of what can be done then to change philosophy to accommodate different ways of thinking. The former seems to be one of these weeds that cannot be eradicated.

  8. Fascinating, though I appreciate the Gilliganian dangers. Notice that the Dweck studies discussed at the end sound like they might matter to philosophy, whether or not the data about gender and intuitions hold up; indeed, independent of anything to do with intuition. Some people (and women more than men) tend to think of intelligence as a gift, relatively fixed; some as plastic, malleable, educable. When confronting puzzlement and confusion, the former give up more readily, especially (I hadn’t heard of this) if they are brighter. ‘Well, I guess this isn’t my cup of tea…’ I gather the Dweck studies have been used to try to explain past gender asymmetries in school math performance, and perhaps they’re relevant to the experience of students in our discipline too.

  9. Rae, I looked back at their discussion of Dweck and see that a very interesting paper of hers is on the web. I think there’s every indication that the underlying generalities are not peculiar to women doing science and math.

    On the one hand, I’d be concerned that since we find scores of girls and boys in maths are converging, that she’s looking at a more temporary phenomenon. At the same time, philosophers are too often clueless about psychological factors affecting performance in philosophy, and too often beliefs create a discriminatory environment. Eric Schwitzgebel had some interesting comments on his blog about how some people are taken to look very good at philosophy, and the positive results of getting so labeled, largely for young white guys.

  10. As Bryce mentioned in passing, it seems the main issue Stich et al will have to meet is how any account of individual differences, assuming they are there, would account for part of the difference in gender representation in faculties. Stich, being the careful philosopher that he is, is right to point out that he is not making the argument that the alleged difference accounts for ALL of the disparity. But then, the question is doubly hard: one has to figure out how much of the disparity the alleged differences represent and also account for how individual differences somehow morph into the institutional disparity.

    Tall order for both of those. Maybe their further work will try to wrestle with these issues but I am skeptical that these can be teased out in any methodological way.

  11. I’d like to echo something that Bryce said: when looking at studies like this, it’s the effect size that really matters. Statistical significance shows you that there is a difference between genders. *Interpreting* that effect requires some measure of how large that effect is. Without the raw data, it’s hard to evaluate anything but the studies presented in charts 2-4. But at least for those, the size of the effect appears to be fairly small: by my calculations, gender explains 8%, 6%, and 3% of the variance in intuitions in the studies reported in figures 2,3, and 4 (respectively). Each of these is statistically significant. But that’s not very much.

    Technical minutiae: I calculated a phi coefficient for each of the studies, assuming that they were coded dichotomously as presented. There’s argument about whether Phi-squared is a good estimate of effect size. But even if you’re as charitable as possible and you use BSED with Preece’s correction, the effect sizes for studies 3 and 4 are .22 and .17, which is small on the standard interpretation. Only study 2 gets a reasonable effect size of .6. But study 2 admits of plenty of other explanations for the data.

    To put that in a possibly clearer way: given someone’s intuitions in studies 3 and 4, you’d have about a 55% chance of guessing their gender correctly. That’s better than chance, but not by a lot.

    Contra Dan: yes, the number of subjects is relatively low. But that actually hurts the point rather than helps it: all things being equal, a statistically significant difference is more meaningful with lower Ns than with higher ones. Conversely: it’s not surprising at all that there’s *some* difference between men and women on their intuitions; you’d expect as much given the obvious and massive differences in socialization. The fact that B/S found it w/ N=100, rather than N=1000, helps rather than hurts them. (You say that “it’s actually quite easy to get statistically significantly results from the sample that amount to statistical noise in the population.”: but that’s precisely what significance testing is designed to rule out! And not just rule out, but to quantify precisely the chance that you’d see this kind of data if there was no real effect.)

    Without the raw data, I can’t say anything about the studies that used Likert scales. Eyeballing it, i’d be surprised if the effect size was terribly large for most of them.

    I’ve geeked out, and I apologize if this has gone on too long. A short conclusion: if there is an effect of gender on intuitions, it’s likely much smaller than the data appears to show at first glance. I have a lot of respect for Stich. People who I respect a lot less will get ahold of this data, though, and I worry about the effect that it will have.

  12. Shen-yi: thanks for the link; I hadn’t seen it yet. I’m still suspicious; I think that Cohen’s d is going to overestimate the effect in many of these cases, sometimes enough to make an actual small effect look larger than it is.

  13. Colin –

    You’re clearly more knowledgable about statistics than I am. (I have a master’s in math, but studied topology and logic; in stats I’m almost entirely self-taught, except for a little help from my economist girlfriend.) I’ll defer to your judgment about the sizes of these samples for the time being (until I have time to consult with aforementioned girlfriend).

Comments are closed.