A teacher reviews her performance review

An English and journalism teacher for six years, Coleen Bondy ranked as low average in her effect on students’ test scores this year. The value-added scores — based only on her least-motivated students — are “practically useless in evaluating teacher performance,” she writes in a Los Angeles Times op-ed.

It’s hard for those who finished high school 20 or 30 years ago, as I did, to fathom the conditions in a typical L.A. Unified high school classroom these days. Classes are huge. Students face overwhelming family and social issues. Drugs are rampant. Students are incredibly disrespectful, testing authority constantly at the beginning of the year. Teachers must be able to get a strong grip on their classes all by themselves because consequences for bad behavior in class are often nonexistent outside it.

. . . Today’s teacher must be highly skilled in her subject matter just to make it into the classroom, more so than at any other time in the history of education. She also must play the role of parent, custodian, psychologist, drug and alcohol interventionist and parole officer, to name a few.

“Society has decided to blame many of its failings on teachers,” Bondy writes.

If teachers can’t be evaluated fairly based on their students’ progress (compared to their previous progress’ rates) and they can’t be evaluated based on classroom observations, how can they be evaluated?

Movin’ and improvin’

Teacher-effectiveness data should be used to help teachers improve, not just to fire incompetents, argues Movin’ It and Improvin’ It! by Craig Jerald, an education policy consultant, on the the Center for American Progress site.

. . . districts are missing an opportunity to … help leverage their highest performers and help teachers with strong potential grow into solid contributors.

The  “movin’ it” strategy uses “selective recruitment, retention, and ‘deselection’ to attract and keep teachers with higher effectiveness while removing teachers with lower effectiveness.

In contrast, “improvin’ it” policies treat teachers’ effectiveness as a mutable trait that can be improved with time. When reformers talk about providing all teachers with useful feedback following classroom observations or using the results of evaluation to individualize professional development for teachers, they are referring to “improvin’ it” strategies. If enough teachers improved their effectiveness, then the accumulated gains would boost the average effectiveness in the workforce.

Smart districts will do both, Jerald argues.

Professional development rarely improves teaching effectiveness and student learning, research shows. “The nation’s school systems spend billions of dollars annually on wasteful and ineffective professional development,” Jerald writes. Yet some forms of training have shown “substantial improvements in teaching and learning” in the last two years.

Teachers matter — now what?

Teachers Matter. Now What?, writes Dana Goldstein in The Nation, citing the Chetty study on the long-term effects of high value-added teachers.

Given the widespread, non-ideological worries about the reliability of standardized test scores when they are used in high-stakes ways, it makes good sense for reform-minded teachers’ unions to embrace value-added as one measure of teacher effectiveness, while simultaneously pushing for teachers’ rights to a fair-minded appeals process.

What’s more, just because we know that teachers with high value-added ratings are better for children, it doesn’t necessarily follow that we should pay such teachers more for good evaluation scores alone. Why not use value-added to help identify the most effective teachers, but then require these professionals to mentor their peers in order to earn higher pay?

That’s the sort of teacher “career ladder” that has been so successful in high-performing nations like South Korea and Finland, and that would guarantee that excellent teachers aren’t just reaching twenty-five students per year but are truly sharing their expertise in a way that transforms entire schools and districts.

Reformers have been advocating teacher career ladders for a long time. Why aren’t they used more widely?

Pay teachers more — and less

Pay some teachers more and others less, writes Jordan Weissmann in The Atlantic.

Not all teaching jobs are alike. In fact, one could say there’s no such thing as “a teacher” at all. There are math teachers and English teachers. There are fourth grade teachers and high school teachers. There are gym teachers and…well you get my point. But while it might seem obvious, it’s also important. Because as two new studies out this week highlight, some kinds of teachers may simply be more influential on students’ educations and lives than others. The way we evaluate and pay them should reflect that.

The first study, an NBER working paper on The Long Term Impacts of Teachers, concluded that students assigned to a high value-added teacher any time between third and eighth grade were “more likely to go to college, were less likely to have children as teens, and made more money as adults” than their peers.

Good English teachers actually had a greater long-term impact on their students’ lives than talented math teachers. But they were also rarer. On the whole, math teachers were just more capable of raising their students’ test scores.

A second study, also an NBER working paper,  Do High-School Teachers Really Matter? concluded “only sometimes.”

Looking at data from schools in North Carolina, Northwestern Professor C. Kirabo Jackson found clear evidence that high school algebra teachers were able to regularly lift their students’ test scores. When it came to English teachers, though, the proof wasn’t there. Meanwhile, good high school teachers’ saw the amount of improvement in their students’ test scores vary much more from year to year than top elementary school teachers.When I spoke with Jackson, he said there were any number of explanations for his findings. Perhaps chief among them: English is considered a harder topic to “move the needle on,” especially in high school. Students learn language inside and outside the classroom.

“Performance bonuses might be more effective for math teachers, who are more likely to see results from their teaching, than English teachers, who might be facing an impossible task,” Weissmann writes. Or perhaps good English teachers should be paid more, because their job is so difficult.

Performance-pay schemes designed for elementary teachers, who have a decent chance at improving their students’ scores, may not be a fair way to evaluate high school English teachers, he adds.

The value-added debate

Can a few years’ data reveal bad teachers? The New York Times‘ Room for Debate takes on value-added analysis.

Teacher observation: Imperfect, but a step forward

Evaluating teachers by watching them teach is “tricky, labor-intensive, potentially costly and subjective — but perhaps the best way to help them improve,” according to a Gates Foundation study (pdf) reported in the Los Angeles Times.

The findings highlight the importance of teacher observations, but also pinpoint why they frequently don’t work. The old way — observing a teacher once a year, or once every five years in some cases — is insufficient. And the observers, typically the school principal, frequently don’t know what to look for anyway.

But that doesn’t mean teacher observations should be tossed aside. The best way to evaluate teachers, while also helping them improve, is to use several measures — including data-based methods that rely on students’ standardized test scores, along with an updated teacher observation system, the report found.

Earlier research has looked at student surveys and value-added measures to judge teachers’ effect on students’ performance.

Using these methods to evaluate teachers is “more predictive and powerful in combination than anything we have used as a proxy in the past,” said Vicki Phillips, who directs the Gates project.

Traditionally, 98 percent of teachers are rated effective.

Researchers looked at “measures of success beyond test scores,” adds the Hechinger Report.

That is, can we know for sure that a teacher who receives a top grade on one of the more rigorous and frequent classroom observations is also going to have a classroom of students who get top grades on achievement tests at the end of the year and on other important measures, like interest and happiness in school? . . .  And are the evaluation measures, whether they are qualitative observations or quantitative test scores, accurate in labeling teachers great, ordinary, or bad?

Teachers’ observation scores correlated with their students’ results on a variety of achievement tests, the Gates study concluded.

Study: Great teachers have lifelong impact

Students with an excellent elementary or middle-school teacher don’t just earn higher reading and math scores, concludes a new study that tracked one million students in an urban district over 20 years. A single year with a high value-added teacher leads to higher college attendance, higher adult earnings and even lower teenage-pregnancy rates, according to the authors, economists Raj Chetty and John Friedman of Harvard and Columbia Professor Jonah Rockoff.

All else equal, a student with one excellent teacher for one year between fourth and eighth grade would gain $4,600 in lifetime income, compared to a student of similar demographics who has an average teacher. The student with the excellent teacher would also be 0.5 percent more likely to attend college.

It may be difficult to hire more excellent (top five percent) teachers, but it’s not necessary.

. . . the difference in long-term outcome between students who have average teachers and those with poor-performing ones is as significant as the difference between those who have excellent teachers and those with average ones, the study found.

It adds up: Replacing a low-value-added (bottom five percent) teacher with an average teacher would raise a single classroom’s lifetime earnings by about $266,000, the economists estimate.

“If you leave a low value-added teacher in your school for 10 years, rather than replacing him with an average teacher, you are hypothetically talking about $2.5 million in lost income,” said Professor Friedman, one of the coauthors.

. . . “The message is to fire people sooner rather than later,” Professor Friedman said.

When a high value-added teacher transferred to a new school, student performance went up in the grade or subject area taught by that teacher, matching predicted gains. Scores dropped in the school the high-value teacher had left. Conversely, scores went up significantly when a low-value teacher left and dropped in her new school.

High performing teachers may more than justify much higher pay,” Slate observes.

“Great teachers create great value – perhaps several times their annual salaries,” write the authors. Now a working paper, the study will be submitted to a journal.

‘Race’ states go off reform track

Race to the Top winners are veering off the reform track, reports the Wall Street Journal.

The Obama administration is stepping up pressure on states to make good on their commitments under its Race to the Top competition, after all 12 winners either scaled down plans or pushed back timelines to overhaul their public-education systems.

Hawaii, which has delayed almost every part of its reform plan, could lose its $75 million grant, the Education Department warns.  The state has been unable to reach a deal with the teachers’ union.

The Education Department has approved scores of waiver requests, including allowances for Massachusetts to delay plans to develop online courses for teacher mentors and for Rhode Island to push back plans to open more charter schools. Some states, including Florida, got sidetracked by overly optimistic target dates to hire contractors for developing student data systems or to create mathematical formulas for linking teacher evaluations to student test scores.

Tennessee is pushing ahead with a plan to link teacher evaluations to value-added data on their students’ progress, despite complaints that the system makes no sense for teachers in untested subjects and grades. A few “tweaks” will fix the problems, says Education Commissioner Kevin Huffman.

The uses (and misuses) of value-added research

Value-added research, which uses “sophisticated statistical techniques to attempt to isolate a teacher’s effect on student test score growth,”  makes sense, writes Matt DiCarlo in a thoughtful analysis on Shanker Blog. What’s troubling is how the models are used.

For example, the most prominent conclusion of this body of evidence is that teachers are very important, that there’s a big difference between effective and ineffective teachers, and that whatever is responsible for all this variation is very difficult to measure (see hereherehere and here). These analyses use test scores not as judge and jury, but as a reasonable substitute for “real learning,” with which one might draw inferences about the overall distribution of “real teacher effects.”

And then there are all the peripheral contributions to understanding that this line of work has made, including (but not limited to):

The “research does not show is that it’s a good idea to use value-added and other growth model estimates as heavily-weighted components in teacher evaluations or other personnel-related systems.,” DiCarlo concludes.

As has been discussed before, there is a big difference between demonstrating that teachers matter overall – that their test-based effects vary widely, and in a manner that is not just random –and being able to accurately identify the “good” and “bad” performers at the level of individual teachers.

Most districts and states use value-added models poorly, concludes DiCarlo

The poverty factor

Evaluating teachers based on “value-added” analysis of their students’ progress is unfair to teachers with lots of low-income students, argue teachers’ union leaders in Washington, D.C.

Ward 8, one of the poorest areas of the city, has only 5 percent of the teachers defined as effective under the new evaluation system known as IMPACT, but more than a quarter of the ineffective ones. Ward 3, encompassing some of the city’s more affluent neighborhoods, has nearly a quarter of the best teachers, but only 8 percent of the worst.

. . .  Are the best, most experienced D.C. teachers concentrated in the wealthiest schools, while the worst are concentrated in the poorest schools? Or does the statistical model ignore the possibility that it’s more difficult to teach a room of impoverished children?

Value-added models compare a student’s previous progress with current progress: If Johnny has gained four months of learning for every year in school — because of poverty, disability, lack of English fluency or some other reason — and gains six months in Teacher X’s class, then the teacher has done well. If Jane has gained nine months a year in past years but only six months in Teacher Y’s class, the teacher gets the blame.

Adding demographic factors is unnecessary, if there’s at least three years of test-score data available, says William Sanders, a former University of Tennessee researcher who developed value-added analysis.

“If you’ve got a poor black kid and a rich white kid that have exactly the same academic achievement levels, do you want the same expectations for both of them the next year?”

However, D.C. uses one year of data, and factors in students’ poverty status.

A few value-added models factor in the concentration of disadvantaged students in a classroom.

Studies have found that students surrounded by more advantaged peers tend to score higher on tests than similarly performing students surrounded by less advantaged peers.

To some experts, this research suggests that a teacher with a large number of low-achieving minority children in a classroom, for example, might have a more difficult job than another teacher with few such students.

Controlling for the demographics of a whole class makes a complex model even more complicated — and may not make much difference. But the idea is being studied.