The Buffy Effect

According to some interesting new research, the portrayal of strong female characters may be more important than plot content (including sex, violence, and even sexual violence) when it comes to shaping viewer attitudes to women.

Science codex reports:

Past research has been inconsistent regarding the effects of sexually violent media on viewer’s hostile attitudes toward women. Much of the previous literature has conflated possible variables such as sexually violent content with depictions of women as subservient

The submissive characters often reflect a negative gender bias that women and men find distasteful. This outweighed the sexual violence itself, giving credence to what Ferguson calls the “Buffy Effect”—named after the popular television show Buffy the Vampire Slayer and its strong lead female character.

“Although sexual and violent content tends to get a lot of attention, I was surprised by how little impact such content had on attitudes toward women. Instead it seems to be portrayals of women themselves, positive or negative that have the most impact, irrespective of objectionable content. In focusing so much on violence and sex, we may have been focusing on the wrong things,” Ferguson said.

“While it is commonly assumed that viewing sexually violent TV involving women causes men to think negatively of women, the results of this carefully designed study demonstrate that they do so only when women are portrayed as weak or submissive,” added Journal of Communication editor and University of Washington Professor Malcolm Parks. “Positive depictions of women challenge negative stereotypes even when the content includes sexuality and violence. In this way Ferguson reminds us that viewers often process popular media portrayals in more subtle ways than critics of all political stripes give them credit for.

Proving once again what we all (or at least all of us of a certain age) already knew: Buffy is so much more awesome than Twilight ever could be.

[Author’s note: It’s possible that the entire point of this post was just an excuse to put up Jo Chen’s Buffy Illustration. I’m okay with that.]

23 thoughts on “The Buffy Effect

  1. They showed a convenience sample of 150 undergrads one of six TV episodes and had them fill out questionnaires afterwards. Since the students didn’t fill out questionnaires *beforehand*, the study didn’t actually measure any changes in depression, anxiety, or sexist attitudes. So this is a lousy basis for an inductive generalization, and an even worse basis for making causal claims. I’d suggest that relying on methodologically bad research to support our views is even worse than relying on no research whatsoever.

    Also, whoever wrote the tables and charts has some weird ideas about how to present data: the order of the categories is inconsistent, there are no error bars, etc.

  2. Dan, I’m not that familiar with research on this topic, but I think the study has to be taken in context. Previous work seemed to suggest that after watching depictions of violence, particularly sexual violence, viewers would report anxiety, unease, and in many cases somewhat sexist attitudes. This study suggests that you can get variation in these results by varying how women are portrayed – even if the level of violence remains roughly the same.

    That seems fairly interesting. The methodology of the whole enterprise may well be questionable, sure. But this particular study’s methodology doesn’t seem aberrant when compared to similar ones.

    Anyway, I mostly I put this up because I wanted to talk about Buffy.

  3. Hi Dan,

    I’m not so down on the methodology.

    Taking your points in turn:

    1) Convenience sample…well, this is typical, yes? I wouldn’t want to use this a strong basis for generalizing to, say, older people (due to possible generation effects). But it hardly seems meaningless.

    2) Only questionnaires afterwards, so no baseline. But this is a between subjects, not within subjects study. Given that the students were randomly assigned a viewing, there’s no particular reason to think that any significant differences in other qualities (e.g., depression levels) were due to preexisting differences. That’s the point. Note the testing for differences in depression due to gender (against, between subjects).

    Thus, for example,

    Results for anxiety symptoms likewise showed no main effects either for gender, F(1, 143) = 2.94, or show type, F(1, 143) = 0.10. However, results for anxiety showed an interaction between gender and show type F(1, 143) = 3.38, p ≤ .05, r = .15, 95% CI = −.01 to .31. As shown in Figure 1, women who viewed the sexually violent show with negative portrayals of women showed higher anxiety (M = 12.79, SD = 15.44) in comparison to the group which portrayed positive female characters even when the material included sexual violence (M = 6.23, SD = 5.53), with the mean for the neutral shows between those two conditions (M = 9.03, SD = 10.05). Males showed
    an inverse effect, least anxiety with negative female depictions (M = 4.23, SD = 8.87) and most anxiety with positive female depictions (M = 8.96, SD = 8.77) with neutralshows once again in the middle (M = 6.49, SD = 6.56).

    So, there was no difference in anxiety levels between men and women as a whole, but women who watched the sexual violent + crappy female characters exhibited more anxiety than the women who watched the other show.

    There could be a confounding factor unnoted, or it could just have been chance, but the study is, in fact, design to detect the effects of watching the show and it’s a reasonable design.

    (Within subject studies have their own issues and benefits, of course.)

    So, this study is perfectly fine for making inductive generalizaiton (though I’d be cautious about generalizing to a markedly different population) and is perfectly ok for supporting causal claims. One nice feature is that they didn’t validate all their hypotheses e.g., the effect on depression.

    Yes, we shouldn’t use methodological suspect or crappy work, but we should be clear on what constitutes methodologically problematic work.

  4. Plus, the author was entirely judicious in their interpretation of the results:

    As with all studies, this one has its limitations. Although care was taken to match the television shows as closely as possible, matching media in experimental studies is well known to be difficult. The current study suggests that portrayals of women may have confounded previous research on sexually violent content, yet future research should be alert for other potential confounding variables. Media effects seen in the current study were small in effect size and should not be overinterpreted. Furthermore, a pre/post design was not employed, which could have tracked emotional changes over time. The current study employed a Hispanic majority sample. Given that it is possible, cultural factors such as machismo (see Cowan, 2000; Fischer, 1987) may distinguish male Hispanic attitudes about positive female portrayals in media from the attitudes of other cultural groups; it is inadvisable to generalize results from one ethnic group to others. It is worth noting that the main characters in all shows were Caucasian non-Hispanic, although locating shows with Hispanic lead characters that otherwise fit the study criteria would arguably have been difficult.

    So, a solid piece of work, but just one study.

  5. I meant to include a link to the actual study earlier, but forgot to do so. Sorry about that.

    magicalersatz – I agree that it’s an interesting topic, and that the methodology of this particular study is similar to the methodologies of other studies on the topic. My criticisms — especially of small samples and of convenience samples — also apply to these other studies. For problems with convenience samples made up of undergrads, see Henrich, Heine, and Norenzayan, “The weirdest people in the world?,” Behavioral and Brain Sciences, 33 no. 2-3 (June 2010), pp 61-83. Another, related general problem in many psychological studies includes “flexible” data collection and analysis; see Simmons, Nelson, and Simonsohn, “False-positive psychology”, Psychological Science, 22 no. 11 (Nov 2011), pp 1359-66.

    Bijan Parsia –
    On convenience sampling: For instance, the researcher notes that almost all of his subjects are Hispanic (he’s at Texas A&M International), but he doesn’t do anything at all to control for racial effects. For all we know, all of the non-Hispanic students ended up watching the same episode together, and a considerable chunk of the variation would be explained by race-linked differences in sexist attitudes. Notably, he feels free to speculate about race-linked differences in the conclusion: “Particularly among Latino men, for whom machismo often remains an influential cultural phenomenon, the depiction of strong females may threaten traditional gender roles.”

    On the lack of a baseline: First, random subsamples will only tend to be representative of the population when you’re working with sufficiently large numbers. He had about 25 people per subsample. (Cf. figure 2 in the paper by Simmons, Nelson, and Simonsohn.) You’re right that there’s no particular reason to think there were any preexisting significant differences between the subsamples. Without establishing the baseline, there’s also no particular reason to think there weren’t any preexisting significant differences. You’re simply wrong when you say that “the study is, in fact, designed to detect the effects of watching the show.” For every factor other than gender, it relies on randomization over a small sample taken from an unrepresentative convenient sampling frame.

    Note that, in the conclusion, he suggests future research should use a “pre/post design.” If this is a good idea for future research, why wasn’t it a good for this research?

    On interpreting the results: I’ll grant that the author included some (in my view, basically pro forma) gestures in the direction of replicability and looking for confounders. First, even he identifies one potentially major confounder, but did nothing to control for it here. Second, scientists rarely bother with straight-up replications, because funders don’t like to fund them and journals don’t like to publish them. Third, all of the media coverage of this piece uses the same, highly problematic language found in the Science Codex piece linked in the OP: “strong female portrayals eliminate negative effects of violent media.” I don’t want to get into an argument over whether the researcher bears any responsibility for crappy science journalism, but the crappy science journalism is certainly also a problem here, and I do have a problem with academics promulgating such crappy science journalism.

  6. Hi Dan,

    First, can I ask if you’ve designed and executed an experiment on humans? I see you do philosophy of science, but I’m curious as to your practice. (Obviously, if you haven’t, that’s nothing against your argument, but it would help me if you clarified this point, if only to keep me from pointless speculation :)) To be upfront, your comments seem to be to be similar to reviewer comments which to me seem to be uninformed…picking on convenience sampling and complaining about size without going into what aspect of the e.g., statistical tests are affected by it I treat, rightly or wrongly, as tells. (For the record, I have. I recently did one using MSc students. My sample was smaller. I think it was fine and useful :))

    On convenience sampling: For instance, the researcher notes that almost all of his subjects are Hispanic (he’s at Texas A&M International), but he doesn’t do anything at all to control for racial effects. For all we know, all of the non-Hispanic students ended up watching the same episode together, and a considerable chunk of the variation would be explained by race-linked differences in sexist attitudes.

    This isn’t about convenience sampling, is it? The assignment is random, not convenience. That is, if we set the population to be the 150 students, he doesn’t convenience sample within that. He random samples. Which, I think is how it should be!

    Your “for all we know” isn’t really very telling, is it? For all that we know he made up the whole thing. The author was aware enough to speculate on racial issues, so it seems unlikely that they would have failed to notice such an unusual configuration. A quick glance at the demographics shows that “90.5% of students are minorities, including Hispanic, African American and Asian students.” So, it’s really unlikely that there’d be an all white section (90% of 150 = 135, so it’s impossible if the sample reflects the school demographics). I’m sure you can work out the probabilities yourself of getting 1/6 white marbles even if the back is half brown, half white. This isn’t a real concern. But hey! Send the author an email and ask.

    This is the sort of thing which makes me feel that your objections are somewhat rote.

    (I agree that it would have been nice for the author to be determinate about the racial composition instead of using qualitative terms like “primarily” and “majority”.)

    Notably, he feels free to speculate about race-linked differences in the conclusion: “Particularly among Latino men, for whom machismo often remains an influential cultural phenomenon, the depiction of strong females may threaten traditional gender roles.”

    I wish you would use more neutral description. “Feel free to speculate” goes in with your earlier disparaging words about the methodology.

    So, what’s wrong with this? They have observed a phenomenon and are offering possible explanations. The full quote:

    More discouragingly,
    at least some males appear to respond negatively to strong female characters. The
    effect was small (equivalent in effect size to approximately r = .25), but of potential
    practical significance. It is possible that some males find the presentation of strong
    females to be threatening to traditional gender–role stereotypes, although this is
    speculative and warranting of further research. Particularly among Latino men, for
    whom machismo often remains an influential cultural phenomenon, the depiction
    of strong females may threaten traditional gender roles. This is not to say other
    ethnicities are necessarily immune, of course, but the culture of machismo in which
    females are seen as passive, may have been threatened by media portrayals of strong
    female characters, particularly those who are resisting violence by men.

    And a bit later:

    The current study employed a Hispanic majority sample. Given that
    it is possible, cultural factors such as machismo (see Cowan, 2000; Fischer, 1987)
    may distinguish male Hispanic attitudes about positive female portrayals in media
    from the attitudes of other cultural groups; it is inadvisable to generalize results
    from one ethnic group to others.

    It all seems fine, responsible, and appropriate.

    On the lack of a baseline: First, random subsamples will only tend to be representative of the population when you’re working with sufficiently large numbers. He had about 25 people per subsample.

    Since the author does not generalize to the population even of Hispanic college students, I think it’s fine. Again, I think it helps to think of the overall sample as a population under study, at least at first.

    (Cf. figure 2 in the paper by Simmons, Nelson, and Simonsohn.)

    It doesn’t seem relevant. BTW, I skimmed through that paper and it seems that Ferguson’s would score pretty well. If you look at the requirements for authors, 1 is met, 2 is met (25 per cell!), I’ve no reason to doubt 3 and 4, etc.

    You’re right that there’s no particular reason to think there were any preexisting significant differences between the subsamples. Without establishing the baseline, there’s also no particular reason to think there weren’t any preexisting significant differences.

    You are still confusing within and between subjects, I think. Also, that’s the point of random subsampling and why overmanipulating that is a mistake.

    You’re simply wrong when you say that “the study is, in fact, designed to detect the effects of watching the show.”

    I really want to be snarky here.

    For every factor other than gender, it relies on randomization over a small sample taken from an unrepresentative convenient sampling frame.

    I’m having trouble understanding this, though, again, it sound generic. Randomization over the whole initial sample generates the subsamples. The subsamples are sufficiently large that fairly standard tests can yield significance given the effect sizes. Not too surprising given the phenomena under test. Ferguson doesn’t make any unreasonable generalizations, so I’m not particularly bothered by the sampling frame. I still don’t know why you are, unless you are starting from a position that convenience samples, esp. of college students, is inherently illegitimate. I don’t know what to say to that. It’s clearly a reasonable activity although, as with all empirical methods, you have to be very careful about its limitations.

    Note that, in the conclusion, he suggests future research should use a “pre/post design.” If this is a good idea for future research, why wasn’t it a good for this research?

    I can think of a ton of reasons off the top of my head: for example, pre/post would have involved administering the e.g., anxiety assessment before and after the watching which would have lengthened an already lengthy session (1 hr!) and potentially introduced a number of effects including measurement consciousness, activating anxiety, etc. I think this is a reasonable first study design which provides reasonably preliminary evidence that strong women characters might have this protective effect. I.e., that it’s worth studying more. It’s definitely worth trying to control for this effect in similar studies.

    I’ll grant that the author included some (in my view, basically pro forma) gestures in the direction of replicability and looking for confounders.

    Eh. The second to last paragraph is pro forma, though noting the fact that they were targeting clinical relevant stuff was a useful reminder. The prior pargraph is fine. What, there can be no pro forma stuff in a study? The inclusion of boilerplate is evidence for the poorness of the study?

    BTW, which confounder? I didn’t see any. Do you mean that matching may have been messed up? Or do you mean race?

    Second, scientists rarely bother with straight-up replications, because funders don’t like to fund them and journals don’t like to publish them.

    There’s also danger in overreplication…statistics will have it’s due. I don’t thing an exact replication would be interesting anyway.

    With respect to your concerns about the coverage, fair enough. But they have nothing to do with the quality of the study and the methodology employed. So they are sort of irrelevant, except maybe helping to explain your animosity toward the article (esp. your linking the scientist and the journalism). (But, I’ve been there!)

  7. Actually, the thing I’d like to know is whether any of the participants had seen any of the shows before (either the series or the particular episodes).

  8. I don’t have further time for a detailed back-and-forth, so I’ll just say this and bow out of the discussion:

    You’re right that most of my criticisms are “generic,” if by that you mean “not particular to this study.” Bad methodology — including an over-reliance on convenient, unrepresentative samples, small samples, and oversimplified experimental designs that don’t control for obvious confounders — is a widespread problem in many fields of behavioral science.

  9. Oh, that’s too bad. Perhaps later, off line?

    I’m really sad that you didn’t take a minute to address your experience and, esp., the okness of a non pre-post study.

    I agree that all the generic criticisms are widespread, but that doesn’t mean that they apply in this case. Over reliance on generic criticism (“But Hume showed induction is false”) is *also* a problem.

    In this case, the convenience sample was what it was and was fine (invokes threats to external validity). The subsamples were representative and adequately sized for the effects (and — good sign — not all the hypotheses were validated!). All the obvious confounders are plausibly accounted for (race wasn’t a confounder, afaict).

    It’s a study with limitations. But Ferguson looked at the world and saw something interesting. Worth publishing and considering and further work.

  10. Sorry sorry, to follow up again. But my last sentence is exactly the point. Some evidence is better than no evidence a good deal of the time. (Too many studies, as I pointed out, can spoil evidence, but too few means lack of evidence.)

    In this case, the result are interesting and potentially of great clinical significance. If having strong female characters can overcome the sexual violence aspects, then that’s great news. It seems MUCH easier to incorporate various kinds of female strength than to reduce the amount of sexualized violence in e.g., films and TV.

  11. I like it when the scientific method is put to use to analyze cultural artifacts–especially when the findings are as interesting and significant as this. Thanks for sharing!

  12. So, I wrote Ferguson, and he gave a (helpful, IMHO) reply (of course, the reply confirms what I thought :)). I asked him specifically 1) about race as a confounder and 2) about prior viewing. His reply:

    t’s good to see people are discussing the piece and critically evaluating its methodology. I’m the first to say that no study is perfect and no study should be shielded from criticism.</blockquote

    As to the first question, as our local city (Laredo, TX) is 92% Hispanic, samples drawn from it whether convenience or otherwise, invariably have the same demographic composition. However this is not a confound as Dan suggests. It would only be a confound if the ethnic composition differed between groups. Given random assignment, this is not the case. I think what Dan means to say is that we have to be careful generalizing to other ethnic groups, which is naturally the case and the case for all studies (most of which are non-Hispanic white majority). But it does not change the validity of the results of the paper.

    (I think it’s a bit generous to Dan here.)

    We did ask the participants whether they had seen the shows or series beforehand. Typically, although some had, the majority had not for any one particular show. However, once again, the issue is equality between the groups and whether the random assignment was successful. Here, too, the shows were equivalent in likelihood of having been said before. Further, there is nothing in the typical theories of media violence to suggest that having viewed a show prior (violent or not) is a mitigating factor in its alleged effects. Now the theory might be wrong, of course, and I tend to think media violence theories are wrong on many levels, but at least in terms of following the bouncing ball, prior viewership “shouldn’t” be a mitigating factor.

    Also, I saw Dan made reference to baseline measurements. He does raise a good point, certainly. Perhaps ALL slows reduce negative attitudes toward women, for instance, just some shows better than others. It’s a reasonable counterargument and there CAN be value in having pre-assessments.

    This is again, fairly generous to Dan, but raises an interesting point about what might be missing. Not that there was a differential effect (that was supported), but that there might have been a uniform beneficial effect. But now the key:

    HOWEVER, there are also risks, and in this case I felt the risks outweighed the benefits. Those risks are in creating demand characteristics. By asking participants how they feel about women or how anxious they are BEFORE seeing the show, now they are primed to pay attention to those issues before the show has even started. That can distort study results, causing spuriously high results. Don’t get me wrong, I think there could be value in redoing the study with pretesting to see if results are similar, but that is what scientific replication is all about. No single study can check all the boxes. We have to try things again and again and see if results are consistent and, if they are not, try to understand why.

    Which is pretty much what I said, and pretty much kills the “Well if pre/post is good for future work, why not now?!” line.

    As for convenience samples, well of course Dan has a good point. Of course if more concerned citizens would come into the lab and volunteer themselves for science *for free* then we wouldn’t need convenience samples. :) Once again, it goes to limits to generalizability, of which most psychological researchers are well aware.

    I will note that Dan seemed to conflate internal and external validity issues. The convenience sample threatens external validity, not internal validity.

    Anyway, fun!

  13. […] The Buffy Effect A study seems to show that strong women in TV shows, like, say, Buffy, may be more important than plot content (including sex, violence, and even sexual violence) when it comes to shaping viewer attitudes to women. Um, duh? Still, nice to see data making the case. Share this:TwitterFacebookMoreTumblrLike this:LikeBe the first to like this. Filed under Weekly Linkroll and tagged Buffy the Vampire Slayer, Charles Lindbergh, cool science, Ellis Nelson, Eric Idle, Galaxy Song, Gawker, Gratitude every day, Monty Python, Nils Pickert, parenting, Visible and Real, wind map, women, works in progress | Leave a comment […]

Comments are closed.