They really shouldn’t be used for hiring and promotion. They’re biased in many ways– including gender, race, and difficulty of topic. And they often contain abuse.
academia / bias / hostile workplace / science / teaching / Uncategorized
They really shouldn’t be used for hiring and promotion. They’re biased in many ways– including gender, race, and difficulty of topic. And they often contain abuse.
Comments are closed.
Though note this reply to the Slate article that is used as one of the primary sources of evidence. The gender bias, on its own, looks pretty small. Not zero, but small. The difficulty bias is really huge, and that should be accounted for. And the looks bias…
https://slate.com/technology/2018/04/hotness-affects-student-evaluations-more-than-gender.html
So how do we get them to stop making us do these evaluations? They don’t do any good, they elicit implicit bias, and they waste time. And they’re damned unpleasant. With all the solicitous concern about student’s being made to feel uncomfortable, why isn’t there some concern about faculty getting trashed, getting ridiculed and insulted in evaluations? I want a safe space for me!
“They don’t do any good…”
I have had some confused and some deeply unpleasant student evaluations. (Recently, I’ve had a lot of people complaining that I talk too slow, and that they don’t like my [American, I guess – I’m teaching in Australia now] voice and/or accent.) But, I have also almost always learned something from my evaluations, especially when there is written feedback, and think I’ve improved my teaching because of them. This is so sometimes even when the complaints are not 100% constructive or useful. We don’t get a lot of feedback on our teaching, so I try to make use of what I get, and think I’ve been able to do so. Sifting out the few usable points from a bunch of less than good stuff can be hard, but is, I think, worth-while.
(Of course, it’s possible that other people either don’t need the help, or else get even less useful input than I do, and so this may not apply to them.)
I certainly agree with Brian that the gender X teaching evaluations issue is complicated, but I am not much moved by the Slate article he cites. For one thing, the “study” conducted by the authors was based on “Rate My Professor” scores, not on teaching evaluations. (The authors say that RMP scores are sufficiently strongly correlated with ratings on teaching evaluations to make it legitimate to use the former as proxies for the latter, but I’m skeptical.) For another thing, the study that the authors are criticizing did take into account the complications the Slate authors mention, things like confounds between gender and rank (evals are typically lower for intro level classes, so if women are clustered in the lower academic ranks they may be more likely to be teaching the intro courses and evaluated poorly for that reason). They do also have justifications for the statistical methods they chose to use, but I don’t know enough about statistics to be able to say whether they properly addressed the problem of being underpowered.
There are several other things about standardized teaching evaluations besides the possibility of gender bias that concern me — partly because of my recent experience on a college-level personnel committee. One — faculty in the humanities often receive no training in the interpretation of the statistics they see. I am willing to bet a large amount of money that there is a serious restriction-of-range problem at my institution, and I’d bet there’s one almost everywhere. I’d say that the scores on the summary questions (“how do you rate this course/professor overall?”) fall mostly within the highest 20%. That is, most students never give the lower scores to anyone. In that case, the *differences* within that 20% range are going to be assigned an inordinate amount of significance. Two — the response rate has dropped precipitously since our institution went to online reporting, so that faculty often get a banner across their reports that says “Note: Results may not be statistically significant”. So why are statistically insignificant results even being reported? The answer I got when I asked an administrator that question was “Because a little information is better than no information at all.” Three — I have only impressions about this, but I wonder if it has been studied: instructors with non-Anglo names or who speak with a non-American, non-British accent get lower than average ratings.
Apologies for the belated comment, but I’d like to publicise that the Australasian Association of Philosophy has recently made a public statement on student evaluations of teaching effectiveness. Recommendations are as follows:
1. First and foremost, develop a more multidimensional approach to evaluating pedagogy. For instance, staff might develop a ‘teaching portfolio’ which includes syllabi and teaching materials; peer teaching observations; solicited letters from a significant number of students; and a personal narrative of pedagogical practice and development.
2. Consciousness-raising amongst staff who draw on SET in employment and promotion decisions, about the limitations of these instruments.
3. Where feasible, consciousness-raising amongst students about their own potential biases (possibly directing them to some of the empirical research cited below).
4. In cases where, for whatever reason, asking students to ‘score’ staff on their teaching is deemed unavoidable, students might be asked to state in a line or two their reason for each ‘mark’ – both to encourage greater thoughtfulness in the students concerning their responses, and to provide further information for evaluating the staff member concerned.
The full statement is here: https://aap.org.au/about/governance/genderstatement.