Yesterday the journal Science published the results of the Open Science Collaboration’s effort to replicate 100 studies published in three top psychology journals (here). The results are arresting: overall, replication effects were half the magnitude of the original effects, and only 36% of replications had statistically significant results. The results were particularly bad for social psychology, for which only 14 of 55 studies were replicated (on the basis of significance testing).
The title of today’s coverage on Slate captured what seems to be a widespread reaction: “That Amazeballs Scientific Study You just Shared on Facebook is Probably Wrong, Study Says.” But is this really what the study says?
It’s worth reading the actual article in Science, rather than just the headline. For example:
- Almost none of the replications contradicted the original studies. Instead, the effects of many of the replications were significantly weaker than the original effects. The replication efforts don’t therefore tell us that the findings of any particular study that didn’t replicate were false. Rather, it tells us that the evidence for those findings being true is considerably weaker than we might have thought.
- It appears that the best predictor of replication success for any particular study was the strength of the original findings, rather than the perceived importance of the effect or the expertise/reputation of the original research team. In addition, surprising effects were less reproducible (surprise!), as were effects that resulted from more difficult/complicated experimental scenarios.
- This is not a problem in psychology alone. It has been reported that in cell biology, only 11% and 25% of landmark studies recently replicated. Moreover, there may be good reasons why social psychology studies are harder to replicate than other studies in psychology. As Simine Vazire points out (here), the phenomena social psychologists study are extremely noisy. She writes, “if we still don’t know for sure, after years of nutrition research, whether coffee is good for you or not, how could we know for sure after one study with 45 college students whether reading about X, thinking about Y, or watching Z is going to improve your social relationships, motivation, or happiness?” That said, the Science study points out other reasons why social psychology studies were particularly unlikely to replicate: social psychology journals have been particularly willing to publish under-powered studies with small participant samples and one-shot measurement designs.
There is, of course, something very unsettling about these findings. But in the big picture it seems to me that this article is a testament to science working well. (Or, maybe, like Churchill said of democracy, it is a testament to science being the worst form of inquiry . . . except for all the others.) The fact that one of the most important scientific journals has published this article is itself confidence-inspiring. Vazire quotes Asimov saying that “the point of science is all about becoming less and less wrong.” Or as the Science article puts it:
“After this intensive effort to reproduce a sample of published psychological findings, how many of the effects have we established are true? Zero. And how many of the effects have we established are false? Zero. Is this a limitation of the project design? No. It is the reality of doing science, even if it is not appreciated in daily practice. Humans desire certainty, and science infrequently provides it. As much as we might wish it to be otherwise, a single study almost never provides definitive resolution for or against an effect and its explanation. The original studies examined here offered tentative evidence; the replications we conducted offered additional, confirmatory evidence. In some cases, the replications increase confidence in the reliability of the original results; in other cases, the replications suggest that more investigation is needed to establish the validity of the original findings. Scientific progress is a cumulative process of uncertainty reduction that can only succeed if science itself remains the greatest skeptic of its explanatory claims.”
PS – good coverage from The Atlantic
Michael, I can’t remember if you were at the Sheffield workshop where Brian Nosek discussed the worries in this area. One crucial point he made, as I remember, is that there is little incentive for scientists to check up on each others’ work. If I remember correctly, his chief reasons were that agencies are not anxious to fund the checking-up research, and journals are not wild about publishing articles about why someone else got something right.
Hi Anne. Yes, I was there, and I remember Brian talking about this issue. The incentive structure in publishing is definitely part of the problem. It is one of the things Brian’s Open Science network is trying to change, I think.
I think Dan Kahneman has also been trying to address the problem of the absence of rechecking. I vaguely remember reading that he suggested that there should be something like research consortia that undertake to recheck work. Maybe that’s what the Open Network is part of??
It seems to me that this could all get very expensive. Last time I looked, $500 an hour for the use of a magnet (fMRI) was a good deal, and that leaves out paying subjects, data calculating, and so on.
I’m not sure about Kahneman. Brian Nosek won a large grant to fund the Open Science collaboration. In the article in Science, they talk about the criteria they used for selecting studies to replicate, including avoiding studies with very expensive methods.
Hi Michael. I’m concerned with the notion that this study shows “like Churchill said of democracy, it is a testament to science being the worst form of inquiry . . . except for all the others.” I think philosophy, history and literature, but also other phenomenologically-based (if you will) fields like anthropology, often find themselves in a defensive position in response to psychology and biology to say nothing of the ‘hard sciences’ over the question of whether we are producing knowledge. It is unfortunate that such efforts to defend might take glee in this study in certain quarters, but I think it is not because they are glad to see some other field taken down a notch in its own claims about its capacity to achieve what it purports to achieve (ok, maybe it’s a little of that), but rather that they think it shows that we are in more of an equal position in relation to the production of knowledge than some fields like to say in relation to others. I think there would be less animosity if there was a willingness to acknowledge the kind of knowledge philosophers produce and to recognize and acknowledge the limits of knowledge offered by those claiming to be more scientific. I for one am put off by “more scientific” having come to mean “actual knowledge.” I recognize that the sciences themselves are situated in a defensive position in relation to those who want to say all knowledge is basically belief and so any claim (like intellectual design) is as equally viable as any other. But I’d like to see ways of resisting that position that doesn’t dismiss those in the academy committed to the production of (non-scientific) knowledge.
Atrott01 hits the nail on the head. I was going to say precisely that! (Except less eloquently.) I’ll just add this 2013 article from The Economist which reported a similar effect, but across the sciences more broadly.
Hey Adriel, I suppose I do associate “more scientific” with “actual knowledge,” but with the sort of caveats that the quotes above get at (i.e., about science rarely delivering certainty, and being a process of diminishing our wrongness). I really don’t see a study like this as bringing science down a notch. My point was that, in the long run, this kind of study is a testament to what makes science special. I’m not sure if that’s because of the particular methods scientists use, or because of scientists’ self-critical attitudes, or because of the structure of scientific communities, or something else. But I think if someone who resented the status of science gloated over this study, or studies like it, they would misunderstand its larger significance. In any case, I do think science is special. But also very hard to do well, and quite fallible. As for philosophy, I think it is best when done in very close contact with the sciences (both “hard” and “soft”). I think differently of literature and the arts. I–like most people, I presume–think they contribute tremendously to our lives, but I don’t think they are in any kind of an “equal position [with science] in relation to the production of knowledge.” That’s not because they pale in comparison; it’s because I don’t think it’s a competition.
[…] feministphilosophers take on the recent claim from a study published in Science (Reproducibility in Psychology – Nosek […]