How to un-bias classroom observations

Classroom observations – a key part of teacher evaluation systems — are biased against teachers with low-achieving students, concludes a new Brookings study of four school districts.

Teachers with students with higher incoming achievement levels receive classroom observation scores that are higher on average than those received by teachers whose incoming students are at lower achievement levels, and districts do not have processes in place to address this bias. Adjusting teacher observation scores based on student demographics is a straightforward fix to this problem. Such an adjustment for the makeup of the class is already factored into teachers’ value-added scores; it should be factored into classroom observation scores as well.

In addition, “observations conducted by outside observers are more valid than observations conducted by school administrators.”

Some teacher evaluation plans include a value-added measure for the school as a whole. This lowers the score of good teachers in bad schools and raises scores for bad teachers in good schools, Brookings concludes.

Only 22 percent of teachers in the study were evaluated on test score gains, notes the report. All teachers are evaluated based on classroom observation.

Good teaching, poor test scores

Evaluating teachers based partly on student test scores is unreliable, concludes a study in Educational Evaluation and Policy Analysis. Researchers analyzed a subsample of 327 fourth- and eighth-grade mathematics and English-language-arts teachers across six school districts.

“Some teachers who were well-regarded based on student surveys, classroom observances by principals and other indicators of quality had students who scored poorly on tests,” reports the Washington Post. Some poorly regarded teachers had students who did well.

Thirty-five states and the District of Columbia require student achievement to be a “significant” or the “most significant” factor in teacher evaluations. Just 10 states do not require student test scores to be used in teacher evaluations.

Most states are using “value-added models” — or VAMs — which are statistical algorithms designed to figure out how much teachers contribute to their students’ learning, holding constant factors such as demographics.

Last month, the American Statistical Association warned against used VAMS, saying that “recent studies have found that teachers account for a maximum of about 14 percent of a student’s test score.”

“We need to slow down or ease off completely for the stakes for teachers, at least in the first few years, so we can get a sense of what do these things measure, what does it mean,” said Morgan S. Polikoff, a USC assistant professor of education and co-author of the study. “We’re moving these systems forward way ahead of the science in terms of the quality of the measures.”

Gates: Mix measures to evaluate teachers

Combining growth in students’ test scores, student feedback and classroom observations produces accurate information on teacher effectiveness, according to Gates Foundation research.

A composite measure on teacher effectiveness drawing on all three of those measures, and tested through a random-assignment experiment, predicted fairly accurately how much high-performing teachers would successfully boost their students’ standardized-test scores, concludes the series of new papers, part of the massive Measures of Effective Teaching study launched three years ago.

No more than half of a teacher’s evaluation should be on growth in student achievement, researchers concluded.  In addition, teachers’ classroom performance should be observed by more than one person.

Of course, the controversy on how to evaluate teachers — and what to do with the information — is not over.

The ever-increasing federal role in education makes no sense, writes Marc Tucker, who complains that U.S. Education Secretary Arne Duncan is forcing states to evaluate teachers based on student performance in order to get No Child Left Behind waivers.  Most researchers don’t think value-added measures of teacher performance are reliable, writes Tucker.

The study is a “political document and not a research document,” Jay Greene tells the Wall Street Journal.  Classroom observations aren’t a strong predictor of student performance says Greene, a professor of education policy at the University of Arkansas. “But the Gates Foundation knows that teachers and others are resistant to a system that is based too heavily on student test scores, so they combined them with other measures to find something that was more agreeable to them,” he said.