Brave new tests

Measures That Matter by Achieve and Education Trust sets out a “new set of basics” — standards, course requirements, curriculum and teacher support materials, aligned assessments, and an information/accountability system — that will prepare high school graduates for college and careers.

‘Open source’ testing could satisfy those who want local measures of achievement linked to national standards, writes Charles Barone in Education Week. He envisions a national data bank of test items based on the National Assessment of Educational Progress and the Program for International Student Assessment.

Over the course of one or two years, the panel would create a pool of test items that would be piloted and subjected to the usual analyses of psychometric rigor. The goal would be to move beyond multiple-choice items to short-answer, problem-solving, essay, and other formats.

States and districts could pick items from the bank to develop local assessments. But comparability would remain an issue, I think.

Computer-adaptive testing is gaining in popularity, reports Education Week. These tests respond to correct answers by asking harder questions; wrong answers prompt easier questions. Tests are shorter because high achievers don’t waste time with questions that are too easy and low achievers don’t face questions that are too hard. Teachers can pinpoint each student’s achievement level. And teachers get the results instantly, so they can use the information to adapt instruction.

Only Oregon uses computer-adaptive testing as its accountability test, though Utah is considering it.

About Joanne

Comments

  1. “Because NCLB requires that students be tested on grade level, most computer-adaptive tests, which may present students with questions above or below grade level depending on how they answer previous questions, are not allowed for accountability purposes.”

    Trust the bureaucrats to prevent schools from giving an assessment that actually provides more useful information about its students…

  2. In defense of bureaucrats–you don’t set the metric based on what’s available–you set the metric based on what you are trying to measure. NCLB measures the ability of schools to get students to a specified level of proficiency.

    No test is going to be a crystal ball–providing appropriate information for all situations. The article goes on to say that the computer adaptive tests are also less specific than some other kinds of tests when it comes to providing diagnostic information. All that aside–it does appear that the computer-adaptive tests can be developed within a tested grade-level (as experienced by Oregon, and perhaps in the near future by Utah). In Oregon they also allow kids to take the tests up to three times during the year–keeping the best score, which is a nice sort of add-on incentive to keep on learning to a higher level. It also provides instant results. I am guessing that one draw-back is the limitation on any of the open-ended, essay types of questions (although I have seen some of these–mystifying!)

  3. Mark Roulo says:

    In defense of bureaucrats–you don’t set the metric based on what’s available–you set the metric based on what you are trying to measure. NCLB measures the ability of schools to get students to a specified level of proficiency.

    The *BIG* problem with this sort of test is that it totally fails to measure improvement in children who start the year off substantially below grade level [a slightly smaller problem is that it also doesn’t measure improvement in children who start the year off substantially above grade level].

    Consider … a child entering 5th grade who currently tests at 2nd grade in some skill (lets say math). At the end of the 5th grade, the child *will* be tested on 5th grade math skills. So … what is the reasonable thing for the teacher to do? Imagine a super teacher who can actually teach this child 1½ – 2 years of math in one year. If the teacher does this, at the end of the year, the child will know *maybe* 4th grade math … which is great given where we start. But the kid will then proceed to mostly flunk the test on 5th grade math content … the child wasn’t taught any of this, after all.

    Instead, the teacher might try to cram enough 5th grade math so that the child can pass the test. The kid won’t actually *learn* any math that will be retained, but he/she might do better on the test.

    The teacher that teaches the 1½ – 2 years of math looks worse than the teacher that crams for the specific test.

    This suggests that only testing for one year’s content is a *huge* mistake, because:
    (a) You can’t see progress in kids who are far enough behind, and
    (b) Because of (a) you encourage “teaching to the test”

    [I’ll note in passing that the kids who come in to a class above grade level will have similar experiences with the 5th-grade-only test, just in reverse. But I’m not sure very many people care about these kids compared to the ones who are failing.]

    Another bad side-effect of this grade-level-only test is that another rational response from the teacher is to give up on the kids that are “too far” behind. They can’t be brought up to speed fast enough to do well on this year’s test, so just write them off and focus your efforts on the kids who are a bit behind, but who might catch up.

    The basic effect of a grade-level-only test is that it rewards the following behavior:
    *) Don’t spend much time on the smart/fast kids. They will “pass” the test anyway.
    *) Don’t spend much time on the kids who are very far behind. They will fail the test pretty much no matter what (NOTE: They’ll fail *this* test even if they learn a lot … because the test is poorly constructed).
    *) Focus most of your attention on the kids who are at grade level or very slightly below. Maybe shade things to spend a bit more time on the kids who are below than the kids who are doing okay but not great.

    Stated this baldly, I’m not sure anyone would agree to the scheme. Write off the gifted kids and the ones who are *REALLY* struggling. Really?

    But this seems to be what we’ve got because of the testing regime.

    Sigh.

    I know that an adaptive computer-driven test is harder to give than one on paper. But … you *could* give the struggling 5th grade kids a 4th grade test and credit the school (and teacher!) if the kids who were only testing at the 2nd grade the year before made those two years of progress. Same thing for the accelerate students — given them the 6th grade test (or 7th or 8th …).

    -Mark Roulo

  4. Mrs. Davis says:

    Consider … a child entering 5th grade who currently tests at 2nd grade in some skill (lets say math). At the end of the 5th grade, the child *will* be tested on 5th grade math skills. So … what is the reasonable thing for the teacher to do?

    To retain the child in fifth grade until he cam perform adequately at the fifth grade level, just as the 3rd grade teacher should have done when the child failed to master the 3rd grade material in third grade. Or better yet, return them to the fourth grade on day one if they aren’t ready for fifth grade.

  5. The *BIG* problem with this sort of test is that it totally fails to measure improvement in children who start the year off substantially below grade level

    And the assumption built into that objection is the concept of grade level which is a necessary convenience in organizations that can’t measure skill/knowledge acquisition cheaply enough to use that as a means of discriminating among kids. You use the relatively crude division by age and, to a lesser extent, by attainment but it’s the age discrimination that strongly recommends itself because it’s so easy to be certain of age and a hell of a lot tougher to be certain of educational attainment.

    That all changes if you’ve got inexpensive, convenient, cheap testing technology but you have to throw away the idea of age-related grade levels as a marker for the acquisition of certain sets of skills and knowledge.

    While there’s obviously some relationship between age and attainments the limitations of the current technology make age the driving factor and educational attainment effectively a function of age.

    With a sophisticated testing system that’s easy to access, easy to use, inexpensive and allows for a fine-grained appraisal of learning progress age takes on a more appropriate degree of importance.

    But an individualized “speedometer” for learning, a tool that would indicate in a much closer proximity to real time what learning is taking place carries some very subversive possibilities with it.

    One very attractive element is that the burden of testing is simply lifted off the teacher but also off the school. The benefit of the knowledge is accessible without anywhere near the burden that current testing demands. The scut work of teaching, the developing, administering, grading and recording of tests, simply vanishes. To that extent teaching becomes less an accounting function and more a teaching function.

    But with a flood of timely information about the educational attainments of the kids comes, by inference, information about the professional skills of the teachers and the principals. A good thing if you’re a good teacher or principal, not such a good thing if you’re not.

    Another worthwhile outcome of such a testing system would be to point up the superciliousness of the district administration to education.

    With a mechanism to quickly and easily measure the efficacy of individual schools, and highlight the laggards, the need for a central administration to shepherd the schools toward worthwhile educational goals simply evaporates.

    The lousy school proclaims its lousiness by its poor showing and the value of district busybodies, scurrying around urging laggard schools to mend their ways, is made clear when compared to the reaction of the parents of the kids in those lousy schools.

    Within a district though a school’s educational value may or may not be important but there are public schools in which the value of the education they serve up is a distinctly greater concern: charters.

    Right now they can do alright by simply being less awful then the district school but before too very long that won’t be good enough. Parent’s will demand more from a school then just a reasonable assurance that their kid won’t be shot. As the number of charters burgeons the demands made on them will increase and a cheap, effective, widely-recognized testing system will proclaim the good schools for what they are and build a fire under the not-so-good schools.

    I think there’s an inevitableness to the idea and it won’t be long before we see this kind of testing utility start to be pulled into existence by schools that have a need to identify themselves as good schools. The technology’s been around for a while but the conditions necessary to generate the demand are just starting to come into existence.

  6. See–I don’t see that the BIG problem is that the fifth grade test can’t adequately measure the progress of kids who got there three years behind. I see the BIG problem as the fact that kids are getting to fifth grade three years behind.

    Sigh

  7. Mark Roulo says:

    See–I don’t see that the BIG problem is that the fifth grade test can’t adequately measure the progress of kids who got there three years behind.

    My full sentence started with, “The *BIG* problem with this sort of test …”

    I, too, see a problem with kids in 5th grade with only 2nd grade skills, but my post was about the tests 🙂

    -Mark Roulo

  8. As someone who actually knows the test system in Oregon (what? We’re the only ones with the computer-adaptive tests?), let me say that there is a place for the writing component. The writing test is paper and pencil, not computer-generated, and students must write to specific prompts. Our windows are limited for the writing test as well.

    The current vendor is much better than the former vendor. The former test went against all the good test-taking strategies a good teacher develops in kids–skip the questions you don’t know, then go back, or change the answer if you go ahead and remember the correct answer later. Once a question was skipped, or answered, you couldn’t change it. The current system allows students to go back and change answers or skip a question and go back and answer it later. There were also problems with reviewing reading passages in the older version that we don’t have now.

    Some districts and schools use the 3 times to take a test as a progress measure–their students take the test in the fall, then perhaps again in the winter, then again in the spring. Sounds good for progress measurement? Well, it’s one of those things that sounds better than it is.

    The computerized tests are very time-consuming, and this means more instructional time lost to testing. As it is, we estimate that the overall time lost in the spring (when my school tests) is close to three to four weeks. We encourage the students to take the tests three times, to shoot for their highest scores. By the time you figure in dragging the entire class to the computer lab (with budget cuts, you don’t have enough aides to help you split the class, plus current rules require that aides be supervised by certified staff at all times, therefore a teacher or administrator has to be present during testing as well), accommodating for those students who need extra time/sped accommodations, etc, etc, etc–that’s a big chunk of time lost that could have been spent on teaching instead. The old-timers at the school feel that we’ve lost quite a bit of valuable instructional time in the spring to testing.

    What I notice is that the kids visibly become fatigued and burned out on school during the testing period. There’s a lot of tension around the tests, especially since the kids all know the cut scores and if they don’t make the grade-level benchmarks, many of them get discouraged. My sped students used to be able to challenge down to their performance level. They still didn’t make grade-level assessment, but they had a measurable mark by which they could determine that they were making progress (last year you were performing at the fourth grade level and made benchmark, this year you’re performing at the sixth grade level and making benchmark. Great progress!). Now, they just get the message that they’ve failed.

    Conversely, TAG kids can’t challenge up, and get rather bored with the whole process. Neither group is adequately served.

  9. Lightly Seasoned says:

    Assessment isn’t “scut work.” It is my feedback loop.