Using observation to improve teaching

ednext_XV_1_steinberg_fig02-smallDoes Better Observation Make Better Teachers? Chicago Public Schools’ Excellence in Teaching Project (EITP), a teacher evaluation system based on the Danielson framework, led to improved reading performance, according to a study reported in Education Next.

However, the focus on classroom observations and feedback had little or no impact in high-poverty and low-achieving schools. In the second year, when schools had less support from the central office, the gains vanished.

(Under EITP), principals and teachers engaged in a brief (15- to 20-minute) pre-observation conference during which they reviewed the rubric. The conference also gave the teacher an opportunity to share any information about the classroom with the principal, such as issues with individual students or specific areas of practice about which the teacher wanted feedback. During the 30- to 60-minute lesson that followed, the principal was to take detailed notes about what the teacher and students were doing.

After the observation, the principal rated teacher performance, focusing primarily on classroom environment and instruction.

Within a week of the observation, the principal and teacher discussed the observation, focusing on areas of disagreement and how the teacher could improve.

Getting classroom observations right

For all the talk of “value-added” performance measures, most teachers can’t be evaluated by gains in their students’ test scores because they don’t teach tested subjects or no prior test scores are available, write Grover J. “Russ” Whitehurst, Matthew M. Chingos and Katharine M. Lindquist in Education Next. That makes it important to get classroom observations right.

“Teacher evaluations should include two to three annual classroom observations, with at least one of those observations being conducted by a trained observer from outside the teacher’s school,” they recommend.

In addition, classroom observations “should carry at least as much weight as test-score gains in determining a teacher’s overall evaluation score when both are available.”

ednext_XV_1_whitehurst_fig03-smallTeachers with lots of low-performing students complain they’re rated ineffective unfairly.

That’s true, say the researchers. “Districts should adjust teachers’ classroom-observation scores for the background characteristics of their students, a factor that can have a substantial and unfair influence on a teacher’s evaluation rating.”

Scores can be adjusted for “the percentages of students who are white, black, Hispanic, special education, eligible for free or reduced-price lunch, English language learners, and male,” they write.

Principal’s classroom visits don’t help

Principals say “instructional leadership” is important, but what does that mean? Cognitive scientist Daniel Willingham praises a new study that recorded how 100 principals spent their time during the school day.  Principals averaged 12.6 percent of their time on activities related to instruction, including classroom walkthroughs (5.4 percent) and formal teacher evaluation (2.4 percent).

Principals spent more time on instructional leadership at schools with more lower-income, lower-achieving and nonwhite students.

“Just pretend I’m not here.”

Time spent on instructional leadership did not correlate with better student learning outcomes — unless the principal spent that time coaching teachers (especially in math) or evaluating teachers and curriculum. 

. . . informal classroom walkthroughs–the most common activity–were negatively associated with student achievement. This was especially true in high schools.

. . . The negative association with student achievement was most evident where principals believed that teachers did not view walkthroughs as opportunities for professional development.  (Other reasons for walkthroughs might be to ensure that a teacher is following a curriculum, or to be more visible to faculty.)

It’s all about the feedback, concludes Willingham. “Instructional leadership activities that offer meaningful feedback to teachers may help. Those that don’t, will not.”

Principals spend 8% of time in classrooms

Principals spend 63 percent of their time in the office and 8 percent in classrooms, according to a Stanford study, writes Justin Baeder. Researchers started counting 30 minutes before the school day began and ended when students left.

In lieu of putting a whoopie cushion on the seat, Baeder suggests principals “get rid of your desk chair during school hours.”

Teachers, here’s the Thing

As a New York City public school teacher for almost three decades, Arthur Goldstein is tired of back-to-school meetings on The Next Big Thing, which teachers must do immediately.

Students need more test prep. Students need less test prep.

Teachers must stand. Teachers must not read aloud. Teachers must sit in rocking chairs and read aloud.

Students must do all writing in class. Students must do all writing at home.

Every year, it’s something new — or something recycled and renamed. An administrator announces the Thing.

 “This is the only Thing that works. We will observe you and pay very close attention to whether or not you do it, because you can’t possibly teach unless you do it every single day without exception. But don’t worry, because it’s the best. After we tell you about it, you’ll break into groups, try it, and report back to us.”

Experienced teachers often disappoint presenters by failing to get sufficiently excited. They ask disrespectful questions, like what happened to last year’s Thing? They are invariably told it’s out. It’s not the Thing anymore.

. . . Teachers are chided. You must move with the times, which are after all a-changing. Once we start doing this Thing we will achieve the active participation that’s forever eluded us.

The Thing may not be bad, he writes. But what works for one teacher may not work for others. And doing the same thing or Thing every day is tedious for his teenage students.


Washington D.C.’s “rigid, numerically based” IMPACT rates teachers based on classroom observations and student performance, notes Inside IMPACT, a new Education Sector report.  The old system rated 95 percent of D.C. teachers “satisfactory” or above.

“In the two years since this high-stakes report card was launched, it has led to the firing of scores of educators, put hundreds more on notice, and left the rest either encouraged and re-energized, or frustrated and scared,” writes author Susan Headden.

Multiple-measures teacher evaluation is the future of K–12 education, the report concludes. In D.C., the future is now.

Figure 1 from "Inside IMPACT," What Teachers Are Graded On

Here’s the New York Times story on IMPACT, which notes that “last year 35 percent of the teachers in the city’s wealthiest area, Ward 3, were rated highly effective, compared with 5 percent in Ward 8, the poorest.”

What’s the value of value-added?

New York City will release value-added rankings of teachers in fourth through eighth grade, if a United Federation of Teachers lawsuit fails. That’s intensified the debate over evaluating teachers based on their students’ progress, reports Education Week.  Traditionally, virtually all teachers — from Mr. Chips to Mrs. Burnout — are judged “satisfactory.”

“It’s universally acknowledged—teacher evaluations are broken,” said Timothy Daly, president of The New Teacher Project, a group that helps school districts recruit and train teachers.

Perhaps surprisingly, teacher-union leaders agree. Michael Mulgrew, president of New York City’s United Federation of Teachers (UFT), said last spring that “the current evaluation system doesn’t work for teachers—it’s too subjective, lacks specific criteria and is too dependent on the whims and prejudices of principals.”

Mulgrew supported New York’s new evaluation system, which counts student achievement as 40 percent of a teacher’s rating.

“Value-added” measurements use complex statistical models to project a student’s future gains based on his or her past performance, taking into account how similar students perform. The idea is that good teachers add value by helping students progress further than expected, and bad teachers subtract value by slowing their students down.

Value-added modeling is too inaccurate to be used as the “primary way to evaluate teachers,” says an Economic Policy Institute statement signed by many prominent education researchers. In addition, “an excessive focus on basic math and reading scores can lead to narrowing and over-simplifying the curriculum to only the subjects and formats that are tested, reducing the attention to science, history, the arts, civics, and foreign language, as well as to writing, research, and more complex problem solving tasks.”

Although standardized test scores of students are one piece of information for school leaders to use to make judgments about teacher effectiveness, such scores should be only a part of an overall comprehensive evaluation.

Diane Ravitch, one of the signers, argues against releasing teacher performance data in the New York Daily News.

Twenty-five states and hundreds of districts use measures of student achievement in teacher evaluations, writes Richard Colvin on HechingerEd. However, student achievement counts for less than half of a New York  teacher’s evaluation. So, it’s not the primary way teachers are evaluated.

I don’t think anyone argues that value-added scores should be the primary way to evaluate teachers. The question is whether the scores, which will be available only for some teachers, should be used at all.

My problem is that the other aspects of “comprehensive evaluation,” such as classroom observations, are subjective and “dependent on the whims and prejudices of principals.”  Many teachers say they have no faith in their principal’s ability to judge good teaching fairly and intelligently. If teachers’ effectiveness can’t be judged accurately by principals and can’t be judged accurately by student achievement, what’s left?

Update: Ed Sector’s Bill Tucker looks at how New York City is Putting Data Into Practice “to create an evidence-based and collaborative teaching culture.”

Evaluating teachers

While the debate rages about value-added analysis of Los Angeles’ teachers, NPR looks at how value-added data is used in North Carolina’s Winston-Salem/Forsyth County School District. The district began using the data three years ago, notes Robert Siegel, the host. The information is not made public, explains Superintendent Donald Martin

Dr. MARTIN: . . . if you’re red, your students are performing two standard errors below your — sort of comparable counterparts. If you’re yellow, you’re right in the average performance. And if you’re green, you’re two standard errors above.

And if a teacher has one red, you know, their first year, then we literally just have a – it’s like a growth conference with them. They have a personal, you know, individual plan. We talk to them about what are they going to do differently next year.

Then in the second year, if there’s two reds in a row, the teacher has consecutive reds, then we have a trigger for what we call a plan of assistance. And that plan of assistance may involve going to training. It may involve sending in some central office folks to work with that person and to really work on, you know, a very formal plan that’s now, you know – could trigger dismissal at the end of the year if it is unsuccessful.

Principals rarely are surprised by which teachers are red or green, Martin says. But, without data, teacher evaluations suffer from “a Lake Wobegon issue. Everybody is above average.” Administrators are to blame for failing to be honest about teacher effectiveness.

Value-added data is available only for a fraction of teachers, writes Sara Mead on Policy Notebook. She’s concerned about the validity of classroom observations.

There is currently no value-added data for kindergarten and early elementary teachers, teachers in non-core subjects, or high school teachers in most places. My brother-in-law, who teaches middle school band and drama, and sister, who teaches high school composition and literature, do not have value-added data.

When available, value-added data should be used to “inform teacher evaluations,” Mead writes, but the larger issue is developing ways to evaluate all teachers. For example, the Classroom Assessment Scoring System (CLASS) measures the extent to which teachers are teaching in ways linked to improved student outcomes.  Mead is concerned “that the observational rubrics many districts and states will put into place under their proposed evaluation systems have not yet been validated.”

While an Economic Policy Institute report urges caution in relying on value-added data, others say the alternative ways to assess teachers, such as classroom observations, are much less reliable than value-added, notes Teacher Beat.  “I think people are right to point out the potential flaws of [value-added modeling], but it should be compared against what we have, not some nirvana that doesn’t exist,” said Daniel Goldhaber, a professor at the University of Washington in Bothell.

In response to teacher feedback, Houston Superintendent Terry Grier has told principals to collaborate with teachers on an individual plan setting out each teacher’s goals for the year and how the principal will help the teacher meet them.  The Houston Federation of Teachers sees this as a nefarious plot to make teachers look bad, writes Rick Hess. HFT is telling teachers not to admit to any performance weaknesses or allow test scores to be used to judge their success.  There’s a lot of fear out there.

Update: Here’s the New York Times’ value-added story.