Excellent brief article.

Student evaluations are a poor indicator of professor performance. The good news is that college students often reward instructors who teach well. The bad news is that students often conflate good instruction with pleasant ambience and low expectations. As a result they also reward instructors who grade easily, require little work, are glib and chatty, wear nice clothes, and are physically attractive. It’s generally impossible to separate all these factors in an evaluation. Plus, students will penalize demanding professors or professors who have given them a bad grade, regardless of the quality of instruction that a professor provides. In the end, deans and tenure committees are using bad data to evaluate professor performance, while professors feel pressure to grade easier and reduce workloads to receive higher evaluations.

  1. first paragraph: so. very. yes.
    second paragraph– that’s an interesting thought.
    third paragraph– no. just no. [in principle, I get the thought behind outcomes assessment; in practice, it’s a morass of bs. If someone has figured out a way to make it not bs, with regard to philosophy classes, I’d be *very* interested]

  2. The third paragraph is basically the sort of baloney you get in the education literature. I’m a big advocate for the sort of evaluation this author calls for in paragraph two. But the stuff in the third paragraph assumes instructors are all teaching the same sort of students with roughly the same starting point and potential. That sort of evaluation will never be useful for evaluating instructor performance. (And K-12 programs all over the USA are currently abusing this sort of evaluation)

  3. No, it doesn’t assume that all students are of “roughly the same starting point and potential”; it only assumes that all students within a single institution and taking the same course, are of roughly the same starting point and potential. And that is a good assumption; it is true. My Logic I students are the same as my colleague’s Logic I students. The only difference is that mine wanted to take classes on Tuesdays, not Wednesdays.

    And it is reasonable to think that if all those students go on to take advanced logic, and all his students get As and all mine get Cs, then he’s a better teacher than I am.

  4. There are some possible worlds where such an evaluation system isn’t abused. Unfortunately none of those possible seem to be actual in the American educational system. Here’s a realistic picture of how this evaluation system is currently used (from some teacher friends who have operated under this sort of regime): A fourth grade teacher gets a batch of students who are advanced students at roughly grade level 3.3 when they enter her class and at grade level 4.2 when they leave her class. This teacher, even though she prepared her students past where they were supposed to be, gets evaluated poorly because she didn’t advance them “an entire grade level.”

    That’s the sort of nonsense this type of evaluation system produces.

  5. We use student evaluations for the same reason we use GDP as a measure of economic performance: it is the worst measure of performance except for all the others. It is fine to say that student evals shouldn’t be used in hiring or tenure decisions, but what should they be replaced with? Peer evals? Please. Outcomes? Unworkable.

    So, assuming that being a good teacher is something academic institutions should care about (an assumption many R1s reject, I suppose), what should we use?

    One important thing is to maintain department or university wide standards on work and grade expectations. That way you can avoid arms races where professors lower expectations in order to gain higher evals. Students compare their work and grades with other people at their school, much less so at other institutions.

