The poverty factor

Evaluating teachers based on “value-added” analysis of their students’ progress is unfair to teachers with lots of low-income students, argue teachers’ union leaders in Washington, D.C.

Ward 8, one of the poorest areas of the city, has only 5 percent of the teachers defined as effective under the new evaluation system known as IMPACT, but more than a quarter of the ineffective ones. Ward 3, encompassing some of the city’s more affluent neighborhoods, has nearly a quarter of the best teachers, but only 8 percent of the worst.

. . .  Are the best, most experienced D.C. teachers concentrated in the wealthiest schools, while the worst are concentrated in the poorest schools? Or does the statistical model ignore the possibility that it’s more difficult to teach a room of impoverished children?

Value-added models compare a student’s previous progress with current progress: If Johnny has gained four months of learning for every year in school — because of poverty, disability, lack of English fluency or some other reason — and gains six months in Teacher X’s class, then the teacher has done well. If Jane has gained nine months a year in past years but only six months in Teacher Y’s class, the teacher gets the blame.

Adding demographic factors is unnecessary, if there’s at least three years of test-score data available, says William Sanders, a former University of Tennessee researcher who developed value-added analysis.

“If you’ve got a poor black kid and a rich white kid that have exactly the same academic achievement levels, do you want the same expectations for both of them the next year?”

However, D.C. uses one year of data, and factors in students’ poverty status.

A few value-added models factor in the concentration of disadvantaged students in a classroom.

Studies have found that students surrounded by more advantaged peers tend to score higher on tests than similarly performing students surrounded by less advantaged peers.

To some experts, this research suggests that a teacher with a large number of low-achieving minority children in a classroom, for example, might have a more difficult job than another teacher with few such students.

Controlling for the demographics of a whole class makes a complex model even more complicated — and may not make much difference. But the idea is being studied.


Some A+ schools get C’s in new grading system

Some of Arizona’s 62  A+ schools became C schools under a new rating system. The Arizona Educational Foundation’s system relies on a comprehensive set of criteria, while the state’s new system is based on test scores and progress.

Michigan teachers report pressure to cheat

Nearly 30% of Michigan teachers report pressure to cheat on standardized exams, according to a survey by the Detroit Free Press. In addition, 34% of public school educators said administrators, parents or others pressure teachers to change grades.

At schools that don’t meet federal standards, the tension is higher: About 50% say pressure to change grades is an issue, and 46% say pressure to cheat on the tests is a problem.

Some cave in — about 8% say they changed grades within the last school year, and at least 8% admit to some form of cheating to improve a student’s standardized test score.

Another 17% report cheating by a colleague.

However, the most common cheating method — writing down vocabulary words to teach to next year’s classes — doesn’t seem like cheating to me. Does Michigan give exactly the same tests from year to year? That would be asking for trouble.

Two out of three teachers surveyed oppose using standardized tests to gauge student achievement and 95% oppose using standardized tests to make decisions about teacher salaries.

Michigan will base 25% of a teacher’s evaluation on students’ progress by 2013-14; that will rise to 50% in 2015-16.

In addition, the state education department plans to raise standards on the state exam, making it harder to score as proficient. “ACT scores show only 17% of Michigan students leave high school prepared for college,” notes the Free Press.


Tough course titles, weak test scores

More high school students are taking advanced classes, but test scores haven’t improved. What’s going on? Course title inflation, answers the New York Times.

Algebra II is sometimes just Algebra I. And College Preparatory Biology can be just Biology.

Thirteen percent of high school graduates completed a rigorous curriculum in 2009, up from 5 percent in 1990, a federal study of transcripts reported in April. But the testing trend lines are flat.

“There may be a ‘watering down’ of courses,” said Arnold A. Goldstein, a director at the National Center for Education Statistics.

Schools inflate course titles to help students satisfy tougher high school graduation requirements, researchers say. It looks good to have more students in high-level or Advanced Placement classes.

About 15 percent of eighth-grade math courses — with titles from remedial through “enriched” to Algebra I — use textbooks that cover less advanced material, a Michigan State study found.

In 2008, Dr. (William) Schmidt surveyed 30 high schools in Ohio and Michigan, finding 270 distinctly labeled math courses. In science, one district offered Basic Biology, BioScience, General Biology A and B — 10 biology courses in all.

“The titles didn’t reveal much at all about how advanced the course was,” he said.

As Advanced Placement enrollment has soared, so have failure rates. Arkansas sextupled the number of students taking AP exams; only 30 percent earn a passing grade of 3, 4 or 5. Some argue that students benefit from the challenge, even if they don’t do well enough to earn college credit.

Competition improves public schools

Threatened with losing students to private schools, Florida public schools improved, concludes a Northwestern study by David Figlio and Cassandra Hart.

Starting in 2002, the Florida Tax Credit Scholarship Program (FTC) has provided funding to help low-income parents pay for private school.  Corporations donate money to fund the scholarships in exchange for a tax credit.

The scholarship is quite generous; it covers approximately 90 percent of tuition and fees at a typical religious elementary school in Florida and two-thirds of tuition and fees at a typical religious high school. As a result, the program greatly increased the accessibility of private schools to low-income families. In the first year, some 15,585 scholarships were awarded, increasing the number of low-income students attending private schools by more than 50 percent. For the 2009–10 school year, the FTC program awarded scholarships to 28,927 students.

Public schools located near private schools increased reading and math scores more than public schools that had little competition.

For every 1.1 miles closer to the nearest private school, public school math and reading performance increases by 1.5 percent of a standard deviation in the first year following the announcement of the scholarship program. Likewise, having 12 additional private schools nearby boosts public school test scores by almost 3 percent of a standard deviation. The presence of two additional types of private schools nearby raises test scores by about 2 percent of a standard deviation. Finally, an increase of one standard deviation in the concentration of private schools nearby is associated with an increase of about 1 percent of a standard deviation in test scores.

Test scores rose more for elementary and middle schools than for high schools, perhaps because the scholarship made K-8 private schools affordable but didn’t cover as much of the tuition at private high schools.

Scholars back value-added’s value

Value-added data on student performance adds value to teacher evaluations, concludes a Brookings report by a group of well-respected scholars.  “We conclude that value-added data has an important role to play in teacher evaluation systems, but that there is much to be learned about how best to use value-added information in human resource decisions.”

At Teacher Beat, Stephen Sawchuk summarizes:

While an imperfect measure of teacher effectiveness, the correlation of year-to-year value-added estimates of teacher effectiveness is similar to predictive measures for informing high-stakes decisions in other fields, the report states. Examples include using SAT scores to determine college entrance, mortality rates and patient volume as quality measures for surgeons and hospitals, and batting averages as a gauge for selecting baseball talent.

Statistical predictions in those fields are imprecise, too, but they’re able to predict larger differences across providers than other measures and so are used, the authors write.

The traditional method of evaluating teachers identifies nearly all as effective, the Brookings authors write. That’s both inaccurate and harmful to students.

“When teacher evaluation that incorporates value-added is compared against an abstract ideal, it can easily be found wanting in that it provides only a fuzzy signal. But when it is compared to performance information in other fields in other fields or to evaluations of teachers based on other sources of information, it looks respectable and appears to provide the best signal we’ve got.”

By contrast, the Economic Policy Institute and the National Academy of Sciences issued reports criticizing the reliability of value-added measures and arguing the data should not be used to evaluate teachers.

Merit pay fails a test

Is merit pay a flop? Project on Incentives in Teaching (POINT) offered big bonuses to Nashville math teachers in grades 5-8 for raising students’ test scores. Fifth graders improved in the second and third years, but there was no lasting improvement in student performance, concluded researchers at Vanderbilt’s National Center on Performance Incentives.

The study used a control group and calculated students’ progress using value-added measures. There was heavy attrition as teachers were reassigned or left the district.

In surveys of participants, 80 percent said they didn’t change their teaching in hopes of earning a bonus, according to the Hechinger Report. Teachers’ students had to hit the 95th percentile to earn a $15,000 bonus, the 90th percentile for $10,000 and the 80th for a $5,000 reward. Many teachers just missed the cut-off.

The fact that many fifth-grade teachers teach multiple subjects to the same students may have been a reason for the positive impact of merit pay found in fifth grade, according to the study’s authors. But “the effect did not last. By the end of 6th grade it did not matter whether a student’s 5th grade math teacher had been in the treatment or control group,” the study said.

While “bonus pay alone” didn’t improve student outcomes, “more nuanced” compensation ideas should be tested, said Matthew Springer, executive director of the National Center on Performance Incentives.

Merit pay opponents say the study proves merit pay is worthless. Others claim the real issue is not whether bonuses motivate teachers to work harder.

Under Arne Duncan, the Education Department is pushing performance pay, writes Stephen Sawchuk on Teacher Beat. New grantees will be announced this month “under a federal program designed to seed merit-pay programs for teachers and principals.”

“While this is a good study, it only looked at the narrow question of whether more pay motivates teachers to try harder,” a spokeswoman for the U.S. Department of Education said in an e-mail. “What we are trying to do is change the culture of teaching by giving all educators the feedback they need to get better while rewarding and incentivizing the best to teach in high need schools and hard-to-staff subjects.”

Before the results were out, Rick Hess wrote that it would tell us nothing because it looked only at whether teachers will work “harder” for money, like rats trying to earn extra food pellets. “Serious people” hope that “rethinking teacher pay can help us reshape the profession to make it more attractive to talented candidates, more adept at using specialization, more rewarding for accomplished professionals, and a better fit for the twenty-first century labor force.” The study asked the wrong question, he concludes.

Eduwonk agrees that merit pay is a way to improve the teaching force of the future, not a way to make teachers work harder or better.

It sends a signal that in this field, performance and excellence matters.  Right now the signal is that everyone gets treated alike, as widgets, regardless of how well or how poorly you do your job.

A “surprising number of people are saying this somehow settles the debate about performance pay,” Eduwonk also notes.  “And funny, they don’t say that about single studies that don’t confirm their views.”

However, performance pay should be judged on performance, argues Intercepts, who links to the response from the AFT, which sees a “a role” for performance pay,  and from the NEA, which calls it “only the latest blow” to the idea. Intercepts writes:

It’s funny to see NEA suddenly equating “student achievement” with the results of a “single standardized test” (page 13) – in this case, the Tennessee Comprehensive Assessment Program (TCAP) math test.

The Tennessee results aren’t all rosy for the teachers’ unions . . . if your goal is to raise student math scores, and a $15,000 bonus to math teachers didn’t do it, why would giving all teachers more money have any effect?

“What happens next is just as much a political question as an education one,” concludes Intercepts. Very true.

Update: Rick Hess publishes a response to the study by Tom Kane, a Harvard professor who’s heading the Gates Foundation’s  research into teacher performance, evaluation, and pay. Kane writes:

It’s a well-done study of a not-very-interesting question. Merit pay for teachers could impact student achievement via three distinct routes: by encouraging teachers to work harder, by encouraging talented and skilled teachers to remain in teaching, by enticing talented and skilled people to enter teaching. The study was designed to answer a narrow question: can you make the average teacher work harder with monetary incentives? They did not report any results on the likelihood that more effective teachers would remain in teaching. Nor did they design the study to study entry into teaching.

We know there are huge differences in student achievement gains in different teachers’ classrooms. The authors confirmed that result. However, the impact of the specific incentive they tested depends on what underlies the differences in teacher effectiveness–effort vs. talent and accumulated skill. I’ve never believed that lack of teacher effort–as opposed to talent and skills–was the primary issue underlying poor student achievement gains. Rather, the primary hope for merit pay is that it will encourage talented teachers to remain in the classroom or to enter teaching.

Kane thinks “more meaningful tenure review” is “the most likely route of impact for teacher effectiveness policies.”

By the way, Corey Bunje Bower, a Vanderbilt PhD student, writes that a “reliable source” says Hess knew the results of the study before he wrote the pre-announcement column saying the results don’t matter. I’m reluctant to take the word of an anonymous source over the word of Rick Hess.

Hess says someone told him the results after he’d written the column.

Most Chicago schools get a D or F

Most Chicago public schools earned a D or F grade on the district’s own evaluation, reveals the Chicago Tribune, which has printed the grades.  The district didn’t release the information, saying it lacks nuance. Someone leaked the info to the Trib.

As the graph shows, only 10 percent of elementary  and middle schools and 4 percent of high schools received an A. Half of K-8 schools and two thirds of high schools were given a D or F.

The grades are based on attendance, dropout rates and test scores, with no attempt to measure students’ progress. Not surprisingly, most of the A and B schools serve fewer low-income students than the district average. However, some high-poverty schools, such as Burnham Elementary, a nearly all-black magnet school, did well.  Overall, charters schools were more likely to earn a passing grade.

Some fear the K-8 schools look better because the tests are too easy.  The failure rate is high on the 11th-grade exam, which is partly based on the college entrance ACT exam, the Tribune reports.

“At the elementary level, state assessment standards have been so weakened that most of the 8th-graders who ‘meet’ these standards have little chance to succeed in high school or to be ready for college,” wrote the Civic Committee of the Commercial Club of Chicago in a 2009 report.

My mother’s alma mater, Sullivan High, is an F school with an 88.5 percent poverty rate.

No gold stars for LA teachers

Los Angeles doesn’t reward, recognize or try to learn from its most effective teachers, reports the LA Times in a follow-up to its value-added analysis of third- through fifth-grade teachers’ effects on their students’ test scores.

The Times found that the 100 most effective teachers were scattered across the city, from Pacoima to Gardena, Woodland Hills to Bell. They varied widely in race, age, years of experience and education level. They taught students who were wealthy and poor, gifted and struggling.

In visits to several of their classrooms, reporters found their teaching styles and personalities to differ significantly. They were quiet and animated, smiling and stern. Some stuck to the basics, while others veered far from the district’s often-rigid curriculum. Those interviewed said repeatedly that being effective at raising students’ performance does not mean simply “teaching to the test,” as critics of value-added analysis say they fear.

On average, these teachers’ students improved by 12 percentile points on tests of English, from the 58th to the 70th, and 17 percentile points in math, from 58th to 75th, in a year.

Thomas Kane, a Harvard education researcher, tested the reliability of the value-added approach in Los Angeles, the Times reports.  Kane predicted the student gains for  156 teachers who volunteered for the experiment.

Value-added analysis was a strong predictor of how much a teacher would help students improve on standardized tests. The approach also controlled well for differences among students, the study found.

With $45 million from the Bill and Melinda Gates Foundation, Kane and other researchers are now following 3,000 teachers in six school districts to see if other types of evaluation — including sophisticated classroom observations, surveys of teachers and reviews of student work — are also good measures of teacher performance.

In the meantime, Kane said that, although it is not perfect, “there is currently not a better measure of teacher effectiveness than the value-added approach.”

Teachers on the Times’ most effective list said they’d never been recognized for excellence.  Aldo Pinto, a 32-year-old teacher at Gridley Street Elementary School in San Fernando, said, ”The culture of the union is: Everyone is the same. You can’t single out anyone for doing badly. So as a result, we don’t point out the good either.”

Value-added is the worst form of teacher evaluation, but it’s better than everything else, writes Chad Aldeman on The Quick and the Ed.

Los Angeles Unified now plans to share value-added data with teachers privately and hopes to negotiate its use in teacher evaluations with the teachers’ union.  Tennessee did just the opposite, Aldeman notes. “Every year since the  mid-1990’s every single eligible teacher has received a report on their (value-added) results.”

When these results were first introduced, teachers were explicitly told their results would never be published in newspapers and that the data may be used in evaluations. In reality, they had never really been used in evaluations until the state passed a law last January requiring the data to make up 35 percent of a teacher’s evaluation. This bill, and 100% teacher support for the state’s Race to the Top application that included it, was a key reason the state won a $500 million grant in the first round.

While LA teachers are angry and confused, Tennessee teachers have had time to understand how value-added analysis works and  prepare to accept it.

LA Times lists ‘effective’ teachers, schools

The Los Angeles has posted its list of the most and least effective third-, fourth- and fifth-grade teachers and schools with teachers’ comments. A few teachers are challenging the data, saying that they’re listed as teaching in years when they were on leave. The value-added analysis of schools is interesting.

On City Journal, dducation researcher Marcus Winters looks at the pros and cons of value-added analysis and comes out against release individual teachers’ scores. 

 Test-score analysis is “correct” on average—it can tell us a great deal about aggregate teacher quality. It can also help to evaluate individual teachers. But given its messiness—especially when tied to stakes as high as people’s jobs—it cannot be used in isolation.

Critics go too far, however, when they claim that these limitations justify abandoning the value-added approach altogether. The real lesson is that test scores are best used to raise red flags about a teacher’s objective performance; rigorous subjective assessment should follow, to ensure that the teacher is truly performing poorly. If both analyses show that a teacher is ineffective, then action should be taken, including removal from the classroom.

Economic Policy Institute also sees Problems with the Use of Student Test Scores to Evaluate Teachers.