Evaluating teachers

While the debate rages about value-added analysis of Los Angeles’ teachers, NPR looks at how value-added data is used in North Carolina’s Winston-Salem/Forsyth County School District. The district began using the data three years ago, notes Robert Siegel, the host. The information is not made public, explains Superintendent Donald Martin

Dr. MARTIN: . . . if you’re red, your students are performing two standard errors below your — sort of comparable counterparts. If you’re yellow, you’re right in the average performance. And if you’re green, you’re two standard errors above.

And if a teacher has one red, you know, their first year, then we literally just have a – it’s like a growth conference with them. They have a personal, you know, individual plan. We talk to them about what are they going to do differently next year.

Then in the second year, if there’s two reds in a row, the teacher has consecutive reds, then we have a trigger for what we call a plan of assistance. And that plan of assistance may involve going to training. It may involve sending in some central office folks to work with that person and to really work on, you know, a very formal plan that’s now, you know – could trigger dismissal at the end of the year if it is unsuccessful.

Principals rarely are surprised by which teachers are red or green, Martin says. But, without data, teacher evaluations suffer from “a Lake Wobegon issue. Everybody is above average.” Administrators are to blame for failing to be honest about teacher effectiveness.

Value-added data is available only for a fraction of teachers, writes Sara Mead on Policy Notebook. She’s concerned about the validity of classroom observations.

There is currently no value-added data for kindergarten and early elementary teachers, teachers in non-core subjects, or high school teachers in most places. My brother-in-law, who teaches middle school band and drama, and sister, who teaches high school composition and literature, do not have value-added data.

When available, value-added data should be used to “inform teacher evaluations,” Mead writes, but the larger issue is developing ways to evaluate all teachers. For example, the Classroom Assessment Scoring System (CLASS) measures the extent to which teachers are teaching in ways linked to improved student outcomes.  Mead is concerned “that the observational rubrics many districts and states will put into place under their proposed evaluation systems have not yet been validated.”

While an Economic Policy Institute report urges caution in relying on value-added data, others say the alternative ways to assess teachers, such as classroom observations, are much less reliable than value-added, notes Teacher Beat.  “I think people are right to point out the potential flaws of [value-added modeling], but it should be compared against what we have, not some nirvana that doesn’t exist,” said Daniel Goldhaber, a professor at the University of Washington in Bothell.

In response to teacher feedback, Houston Superintendent Terry Grier has told principals to collaborate with teachers on an individual plan setting out each teacher’s goals for the year and how the principal will help the teacher meet them.  The Houston Federation of Teachers sees this as a nefarious plot to make teachers look bad, writes Rick Hess. HFT is telling teachers not to admit to any performance weaknesses or allow test scores to be used to judge their success.  There’s a lot of fear out there.

Update: Here’s the New York Times’ value-added story.

About Joanne


  1. I want to trust my job to someone who doesn’t know the difference between a standard error and a standard deviation? Lord.

  2. @Lightly Seasoned:

    You’re really right you shouldn’t your job to someone who don’t even know the difference between a standard error and a standard deviation.

  3. Like it or not, the good, old days of being above demonstrating professional skills are coming to an end. Sorry.

  4. I am assuming Martin actually means “standard deviation,” which would seem to make more sense in this application.

    That said, if he does mean “standard deviation,” about 95% of values in a normally-distributed population will be within two standard deviations of the mean. (And large sample sizes typically approach normality.) The remaining 5% will be distributed evenly above and below that region.

    So, 2.5% of the teachers will be real “superstars” (green) and 2.5% will be doing a very poor job (red). (Actually, I suspect a lot of teachers will be sad that they will be in the “yellow” – it seems a lot of people think they do a better job than they actually do, and I’ve met people who thought they were “superstars” at their jobs when they were just sort of adequate.)

    So unless the “sort of comparable counterparts” come from a very different population (say, from a private school with uniformly high test scores), there will not be that many teachers getting “intervention.”

    But yeah, it sounds like he’s misusing the stats terms. (And actually, a lot of stats terms and tests tend to be misused in the education world…)