Evaluation varies: Tale of 2 cities

Screen shot 2014-02-19 at 3.57.07 PM

Teacher evaluations vary widely, points out This Week in Education. Observation counts 30 percent in Denver, and 60 percent in New York City. Student performance counts 30 percent in Denver, 20 percent in New York. From Scholastic Administrator.

Observing teachers: Is it worth the time?

Teacher evaluation is off to a “bumpy start” in New York City schools, reports the New York Times. For example, PS 130 Principal Lily Din Woo and her assistant principal “are spending parts of each day darting in and out of classrooms, clipboards and iPads in hand, as they go over checklists for good teaching. Is the lesson clear? Is the classroom organized?”

All told, they will spend over two of the 40 weeks of the school year on such visits. The hours spent sitting with teachers to discuss each encounter and entering their marks into the school system’s temperamental teacher-grading database easily stretch to more than a month.

So, the principal and her assistant now will spend 10 percent of their time visiting classrooms, observing and giving teachers feedback on their teaching. Is that really excessive?

“Talent coaches” are helping and retirees may be hired “to pitch in at schools where the workload is heavy.”

Writing up observations, which must be “low inference” and aligned to the “Danielson rubric,” will be more time consuming and taxing than the Times estimates, predicts NYC Educator.

Minnesota is piloting a new teacher evaluation system that includes more classroom observation by the principal, reports Hechinger’s Tim Post for Minnesota Public Radio.

Pine Island, Minn. – Principal Cindy Hansen’s fingers fly across her laptop as she types notes in a corner of Scott Morgan’s classroom, watching as the special education teacher works with a kindergartner on her social skills.

This is more than a principal pop-in. Hansen and Morgan are part of a new, experimental kind of teacher evaluation. Earlier, they met for a pre-evaluation chat. Later, they’ll talk over the teacher’s strengths and weaknesses and set performance goals. She’ll evaluate 70 teachers this way.

“It’s not meant to be a “gotcha” kind of a situation,” Hansen says later. “It’s really is meant to be a helpful kind of conversation.”

Beginning teachers will be observed three times a year for the first three years, while veteran teachers will be observed at least once a year, with a more thorough review once every three years. Student performance will count for 35 percent of overall evaluations. Student surveys also will be factored in.

Use of test scores to evaluate teachers is controversial. Now there’s resistance to principals evaluating their teachers’ classroom performance.

How a charter network evaluates teachers

Evaluating teachers’ effectiveness is a priority for the Aspire network of 37 charter schools, reports the San Francisco Chronicle. It’s not just about test scores.

When Eva Kellogg’s bosses evaluated her performance as a teacher, they observed her classes. They reviewed her lesson plans. They polled her students, their parents and other teachers. And then they took a look at her students’ standardized test scores.

When the lengthy process was over, the eighth-grade English teacher at Aspire Lionel Wilson College Preparatory Academy in Oakland had received the highest rank possible.

She was a master teacher.

And based on her job performance, she got a $3,000 bonus as well as a metaphorical front-row seat at one of the biggest battles in public education: how to evaluate teachers and whether to give good ones a bigger paycheck.

Forty percent of a teacher’s score is based on observation by the principal, 30 percent on students’ standardized test scores and the rest on student, colleague and family feedback, as well as the school’s overall test scores.

Teachers are ranked as emerging, effective, highly effective or master. Bonuses range from $500 to $3,000.

Study: Teacher evaluation lifts scores

Evaluation can improve mid-career teachers’ effectiveness in math, but not reading, according to a study of Cincinnati’s Teacher Evaluation System (TES), reports Education Next.

 . .  . teachers are more effective at raising student achievement during the school year when they are being evaluated than they were previously, and even more effective in the years after evaluation. A student instructed by a teacher after that teacher has been through the Cincinnati evaluation will score about 11 percent of a standard deviation (4.5 percentile points for a median student) higher in math than a similar student taught by the same teacher before the teacher was evaluated.

Well-designed performance evaluation “can be an effective form of teacher professional development,” conclude researchers Eric S. Taylor and John H. Tyler.

During the yearlong TES process, teachers are observed in the classroom four times, once by the principal or another administrator and three times by a “high-performing, experienced teacher who previously taught in a different school.”

The evaluation measures classroom management, instruction, content knowledge, and planning, among other topics.

After each classroom observation, peer evaluators and administrators provide written feedback to the teacher and meet with the teacher at least once to discuss the results. At the end of the evaluation school year, a final summative score in each of four domains of practice is calculated and presented to the evaluated teacher.

. . . For beginning teachers (those evaluated in their first and fourth years), a poor evaluation could result in nonrenewal of their contract, while a successful evaluation is required before receiving tenure. For tenured teachers, evaluation scores determine eligibility for some promotions or additional tenure protection, or, in the case of very low scores, placement in a peer assistance program with a small risk of termination.

Teachers who were the least effective in raising student scores before the evaluation and those who earned relatively low TES scores showed the greatest improvement. Despite the high cost — $7,500 per teacher — TES is a cost-effective way to improve student performance, the study found.

Also on Ed Next, Thomas Kane, who led the Gates Foundation’s project on measuring teaching, writes on Capturing the Dimensions of Effective Teaching.

Tennessee: Observers inflate teachers’ scores

Principals are giving high scores to low-performing teachers, concludes a Tennessee Education Department report on the state’s new evaluation system, reports the Tennessean. Principals need more training in how to evaluate teachers, the report recommends.

. . . instructors who got failing grades when measured by their students’ test scores tended to get much higher marks from principals who watched them in classrooms. State officials expected to see similar scores from both methods.

“Evaluators are telling teachers they exceed expectations in their observation feedback when in fact student outcomes paint a very different picture,” the report states.

More than 75 percent of teachers received top scores of 4 or 5 in classroom observations, but only 50 percent earned high value-added scores based on their students’ academic progress. By contrast, fewer than 2.5 percent received a 1 or 2 observation score; 16 percent were rated that low based on student progress. Teachers with a learning gains score of 1 averaged an observational score of 3.6.

Teachers can be denied tenure, or lose it, if they score score 1s or 2s for two consecutive years.

. . . Half of each evaluation is based on observations. The other half comes from standardized tests and other measures of student performance.

But almost two-thirds of instructors don’t teach subjects that show up on state standardized tests, so for those teachers — including in kindergarten through second grade, and in subjects like art and foreign languages — a score is applied based on the entire school’s learning gains, which the state calls its “value-added score.”

Rather than using schoolwide scores, the state should develop other ways to measure these teachers, the report recommends. It also calls for principals to “spend less time evaluating teachers who scored well and more time with teachers who need more training,” reports the Tennessean.  “High-scoring teachers may get the chance to undergo fewer observations and to choose to use their value-added scores for 100 percent of their overall scores.”

 

Surveys let students grade teachers

In addition to value-added measures and classroom observations, teachers could be evaluated by their students, reports Ed Week‘s Teaching Now. At a Center for American Progress event, the Tripod student-perception survey was discussed.

Developed by Ronald Ferguson of the Achievement Gap Initiative at Harvard University in partnership with Cambridge, the Tripod surveys have been used in 3,000 classrooms across the U.S. as part of the Bill and Melinda Gates Foundation-funded Measures of Effective Teaching Project. . . . Teachers are rated on the research-based “7 C’s”—care, control (of the classroom), clarify, challenge, captivate, confer, and consolidate.

Tiffany Francis, a Pittsburgh teacher, said her second-grade students’ views were “enlightening.”  All rated her highly on “care,” but scores were lower for “control,” and on the statement, “to help us remember, my teacher talks about things we already learned.” She plans to make changes in her teaching.

The Pittsburgh Federation of Teachers supports the use of student surveys, as well as the use of value-added measures, despite heavy criticism from other union affiliates, said William Hileman, vice president of PFT.  Other union affiliates  “We have to get better about instructing children,” he said.

 

‘Creative … motivating’ and fired

Sarah Wysocki struggled in her first year of teaching fifth-grade at a Washington D.C. middle school, but she earned excellent evaluations in her second year. Then she was fired for low value-added scores, reports the Washington Post.

A majority of her students took the fourth-grade test at a feeder school suspected of cheating. Some who’d tested as “advanced” could barely read when they started fifth grade, she said.  When their scores slipped, her value-added score took the hit. With a low score from her first year of teaching, Wysocki was out.

In classroom observations in her second year, Wysocki’s teaching won praise.

“It is a pleasure to visit a classroom in which the elements of sound teaching, motivated students and a positive learning environment are so effectively combined,” Assistant Principal Kennard Branch wrote in her May 2011 evaluation.

Branch asked her to share her ideas with her colleagues. He also praised her ability to engage parents.

After Wysocki was fired, Principal Andre Samuels wrote a glowing recommendation describing her as  “enthusiastic, creative, visionary, flexible, motivating and encouraging.” She was hired immediately by a Fairfax,  Virginia elementary school, where she’s again teaching fifth grade.

Most teachers with low value-added scores also score poorly on classroom observations, says an architect of D.C.’s system for teacher evaluation. But there doesn’t seem to be a way to apply common sense when the system goes wrong.

After years of very low performance, D.C. needs to stress reading and math scores in teacher evaluations, Rick Hess writes.

In response to MetLife’s survey, which found teachers’ satisfaction has declined, he wonders who is unhappy. “If a teacher is lousy or doing lousy work, they should have lousy morale. Hopefully it’ll encourage them to leave sooner.”

Teacher observation: Imperfect, but a step forward

Evaluating teachers by watching them teach is “tricky, labor-intensive, potentially costly and subjective — but perhaps the best way to help them improve,” according to a Gates Foundation study (pdf) reported in the Los Angeles Times.

The findings highlight the importance of teacher observations, but also pinpoint why they frequently don’t work. The old way — observing a teacher once a year, or once every five years in some cases — is insufficient. And the observers, typically the school principal, frequently don’t know what to look for anyway.

But that doesn’t mean teacher observations should be tossed aside. The best way to evaluate teachers, while also helping them improve, is to use several measures — including data-based methods that rely on students’ standardized test scores, along with an updated teacher observation system, the report found.

Earlier research has looked at student surveys and value-added measures to judge teachers’ effect on students’ performance.

Using these methods to evaluate teachers is “more predictive and powerful in combination than anything we have used as a proxy in the past,” said Vicki Phillips, who directs the Gates project.

Traditionally, 98 percent of teachers are rated effective.

Researchers looked at “measures of success beyond test scores,” adds the Hechinger Report.

That is, can we know for sure that a teacher who receives a top grade on one of the more rigorous and frequent classroom observations is also going to have a classroom of students who get top grades on achievement tests at the end of the year and on other important measures, like interest and happiness in school? . . .  And are the evaluation measures, whether they are qualitative observations or quantitative test scores, accurate in labeling teachers great, ordinary, or bad?

Teachers’ observation scores correlated with their students’ results on a variety of achievement tests, the Gates study concluded.

Observation matches value-added data

Elementary teachers rated well by observers also were rated as high-performing by a value-added analysis of their students’ progress, concludes a Consortium for Chicago School Research report on Chicago’s teacher evaluation pilot. Low observation ratings also matched poor value-added data.

A similar correlation was found in several studies on Cincinnati’s teacher-evaluation system, notes Teacher Beat.

Both principals and external evaluators observed and assessed teachers’ classroom performance.

• Principals and observers gave similar numbers of lower scores, but principals gave the top rating more often than the other observers did, across all 10 of the evaluations standards. Interestingly, much of this variation disappeared when researchers controlled for the teachers’ prior evaluation scores, suggesting that principals may be drawing on background knowledge in assigning scores. While this doesn’t exactly fit the narrative of vindictive principals, it does show that who you get as an observer potentially matters.

• Most of the principals were close to the external observers in terms of how strictly they applied the evaluation standards, but there were a few outliers on both ends. Eleven percent of principals regularly rated teachers lower than the observers while 17 percent tended to rate them higher. Another reason to consider more than one observer in a teacher-evaluation system.

Teachers and principals said observations lead to more meaningful discussions of how to improve teaching, but principals said they needed more training on how to help teachers analyze their evaluation.

Teacher evaluation: Not ready for prime time?

An early Race to the Top winner, Tennessee is requiring schools to evaluate teachers by value-added test scores and principal observations. The new evaluation system is complex, confusing and a huge time suck for principals, reports the New York Times.

Because there are no student test scores with which to evaluate over half of Tennessee’s teachers — kindergarten to third-grade teachers; art, music and vocational teachers — the state has created a bewildering set of assessment rules. Math specialists can be evaluated by their school’s English scores, music teachers by the school’s writing scores.

The state is tweaking rules to cut principals’ paperwork burden.  But principals complain it’s not enough.

. . .  (Principal Will) Shelton is required to have a pre-observation conference with each teacher (which takes 20 minutes), observe the teacher for a period (50 minutes), conduct a post-observation conference (20 minutes), and fill out a rubric with 19 variables and give teachers a score from 1 to 5 (40 minutes).

He must have copies of his evaluations ready for any visit by a county evaluator, who evaluates whether Mr. Shelton has properly evaluated the teachers.

 Shelton must observe his 65 teachers four times a year, whether they’re his best or weakest staffers.

In Florida, evaluation formulas are so complex, even the math teachers can’t figure it out.

The formula—in what is called a “value-added” model—tries to determine a teacher’s effect on a student’s FCAT performance by predicting what that student should score in a given year, and then rating the teacher on whether the student hits, misses or surpasses the mark.

But (calculus teacher Orlando) Sarduy, like thousands of other Florida teachers, doesn’t even teach a subject assessed by the FCAT. So his value-added score will not come from his math teaching or his particular students. Instead, it will be tied to the FCAT reading score of his entire school in South Dade—a notion that infuriates him, even though he appreciates the level of objectivity the new system brings, and the ways it strives to isolate a teacher’s impact on student learning.

Some performance-pay experiments have rewarded teachers and support staff for improvements in the whole school, rather than trying to measure each person’s contribution. The idea is that everydone does their bit in raising those reading scores, including the music teacher and the janitor. But when the stakes are high, people want to be rated on measures they control.  And it’s hard work to evalute teachers fairly.