Test smarter

Smart testing improves quality, writes Mark Kleiman of The Reality-Based Community. He cites W. Edwards Deming’s work on statistical quality assurance, which continuously “feeds back information about processes and their outcomes to operators so the processes can be changed in real time.” In education, this would mean:

– Selecting a sample of students for high-quality, expensive testing rather than settling for the level of observation we can afford to do on every student.

– Using information about the whole range of performance rather than fixating on an arbitrary cutoff.

– Taking measurements all through the school year, not just at the end, and getting the results back to the teachers promptly.

Via Megan McCardle.

About Joanne


  1. I think there should be a distinction made here between different purposes of tests. High-quality testing on a sample might be good within a school, and taking measurements all through the school year, not just at the end, and getting results back to teachers promptly strikes me as likely to be essential for top quality teaching. But that’s something that can be done, and sometimes is done, within a school.

    An advantage of testing everyone at a school is that it reduces the chances of an unscrupulous school system manipulating the test sample.
    Another one is that for many purposes we want to know if a particular kid can do something (read a newspaper, meet the pre-requisites for engineering school). In that case knowing the range that a sample of the student’s classmates perform at is not sufficient, as it’s the individual we care about. I know that many of the students I was at high school with would have dropped out of engineering school, but I completed it (just, but still). A sample could not have told the engineering faculty staff that it was worth giving me a shot.

    Furthermore, Kleinman doesn’t mention the moral problems with looking at the whole range of performance rather than setting what he describes as an “arbitrary cut-off”. If you look at the whole range of performance, then how many poorly-performing students are you willing to disregard to get more performance at the top of the range? There’s no non-arbitrary answer to that one.

  2. Kleiman is absolutely right. At the macro level, we are likely a bit more prepared to use sampling and high quality tests. NAEP as an example, and the states who have begun to participate in international tests, as a state–as well as our national participation.

    The measurements all through the school year–which ought to be easier to implement, have been plagued by many things. Most efforts that I have been aware of attempted to replicate state tests on a small/short term basis–based on some notion that kids don’t do well on state tests because they don’t “test well” on material that they know. They are time consuming, the results are not helpful in guiding instruction (due to grading time lags, sampling of material, rigid pacing guides which have moved the class on to other material by the time the results are available, and limited knowledge about “reteaching” material that students didn’t understand on the first go around). Teachers tend to fight these efforts as they are not helpful and because they fear administrative retribution of students do not do well.

    There are some key elements of Deming that we never fully seem to understand. One is that there is a necessity for all workers in a decision-making capacity (and this would sometimes include all workers) must have an understanding of at least some elementary workings of statistics. We assume this to be true of teachers–but I suspect that teachers outside of math or science secondary teachers are no better than the general population in this area. The other is that there is a process for improvement that follows a pattern (like the scientific method) of suggesting and testing a hypothesis–generally by altering a variable and measuring change. I have know teachers who fully internalized this understanding. If A doesn’t work and Z is the reason, alter Z and measure the results.

    The other thing that we don’t seem to realize about Deming is that it takes time to do all of this improvement work. Japanese factories (and American factories that have brought in Japanese-style quality circles) provide time to groups of workers willing to tease out the solutions to problems or to recommend improvements. Their lesson study groups seem to have brought these principles into education-but time is scheduled for this kind of work. Not only do we tend to regard teacher out of class time as “individual” work, but we don’t allow for a great deal of it. A more honest assessment of what it takes to involve teachers at the “line worker” level in interpreting data and recommending changes is needed in order to bring this about, as well as a differently prepared group of teachers. Not only do we need to provide a grounding in statistical work–we need to stop assuming that a diploma that is still wet is the classroom equivalent of a teacher with years of experience. One needs far more help and supervision–and should be burdened with far less decision-making, even if this requires a fair amount of “scripting,” or teaching lessons prepared by others with greater knowledge, for newer teachers.

  3. Margo/Mom – the Direct Instruction curriculae provide these sorts of tests you talk about. See http://www.specialconnections.ku.edu/~specconn/page/instruction/di/pdf/reading_feature_b.pdf for an example placement test.

  4. Michael E. Lopez says:

    As a student who once told school administrators that I didn’t want to waste my time on a standardized test, that I refused to take the standardized test they had scheduled, and that I would not only deliberately fail the standardized test if I was forced to take it and send their data points to hell since I was supposed to be one of the people raising the average, but would encourage all of my high-achieving friends to do the same…

    I can only imagine what some kid like me would feel like if singled out for testing that wasn’t being given to everyone.

    One of these days, the honors students of this country are going to realize that they really do, collectively, have school administrators by the balls.

  5. Margo/Mom- There are times when you demonstrate such insight into education that I really regret that you are not a teacher or administrator.

    I do have one quibble. I have tried to include the skills that my students are required to know for the state reading test in my classroom assessments. This works very well in reading because there are really just a small number of challenging skills that are tested. A student needs to be able to make inferences, understand figurative language, write a summary, and use information from the text to support assertions that they make about the text. I can teach the basic skills early in the year and spend the rest of the year helping students improve the quality of their work. By testing the students about once a month on these core skills, I am able to see their progress and make adjustments for individual students. While it’s true that interim tests that are similar to the state tests can turn in to a form of mindless test prep, this does not have to be the case, especially if the state tests are of a high quality.

  6. “Margo/Mom – the Direct Instruction curriculae provide these sorts of tests you talk about. See http://www.specialconnections.ku.edu/~specconn/page/instruction/di/pdf/reading_feature_b.pdf for an example placement test.”

    Tracy–Placement tests are a beginning (teehee), but only one part of a really meaningful set of summative evaluations–that can be used to guide individual instruction, but also to point up weaknesses in either the curriculum or teaching methodology.

    Ray–I think that we have unwittingly expected far too much of annual state-administered tests. At best they can sample the material–which provides good data for what they are intended for, which is an accountability system that gives across-the-board information. Where we fall down is trying to work backwards from that necessarily limited information to make individual or classroom level decisions.. While a repeated pattern of weakness (or strength) in a particular area is certainly helpful information, it may lack the detail needed to improve curriculum. And because schools have overlooked the more important issues brought about by having standards (ie: what should children be learning) and instead focused on shortcuts to doing well on tests, the formative testing that ought to be more helpful tends to focus primarily on those things likely to appear on the summative test. In other words–if the end result is to have built a building, and in the end all we know is that our building won’t stand–we don’t know if we need to work on better bricks, more I-beams, fewer windows, stonger mortar or revising the building schedule to better sequence the assembly.

  7. oops–in response to Tracy, I meant to say formative assessments, not summative.

  8. Margo/Mom, sorry I got interrupted while writing my response and forgot to provide a fuller set of links. The Direct Instruction lessons build in those sorts of formative assessments right the way through. See for example http://www.specialconnections.ku.edu/~specconn/page/instruction/di/pdf/reading_sample_lesson_a.pdf, at the end there’s a brief reading test passage for the kids. And see http://www.specialconnections.ku.edu/cgi-bin/cgiwrap/specconn/main.php?cat=instruction&section=main&subsection=di/reading where I am getting these links from.

  9. And then there’s the AP courses in schools where everyone can sign up, but nobody can leave. Doesn’t matter how far over their heads the material is.

    So the principal instructs the teacher to teach at the students’ level, and the College Board (who audits the AP courses and administers the AP tests) insists that the material be taught at that level. And the kids, who don’t really want to be there anyways, certainly aren’t interested in spending extra time in the classroom catching up.

  10. Argh. That comment should have gone with the AP juggernaut article. Sorry.

  11. Kirk Parker says:


    Hostility much? Wow…

  12. Ray:

    Yesterday I omitted my thanks for the compliment that you paid me. What you said is perhaps the most appreciative thing that anyone related to education has ever said to me as a parent.

    Thanks so much and enjoy your holiday!


  1. […] This post was mentioned on Twitter by kriley19 and PostRank – Education, JoanneLeeJacobs. JoanneLeeJacobs said: Test smarter http://www.joannejacobs.com/2009/12/test-smarter/ […]