The tests that can be computer scored

Over at the Curriculum Matters blog, Erik Robelen has a link-filled post about machine-scoring of essay tests entitled “Man vs. Computer: Who wins the essay-scoring challenge?

It seems there was a study.

“The results demonstrated that overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre,” says the study, co-authored by Mark Shermis, the dean of the University of Akron’s college of education, and Ben Hammer of Kaggle, a private firm that provides a platform for predictive modeling and analytics competitions.

There’s something odd going on here.

Barbara Chow, the education program director at Hewlett, said in the press release that she believes the results will encourage states to include a greater dose of writing in their state assessments.

And she believes this is good for education.

“The more we can use essays to assess what students have learned,” she said, “the greater likelihood they’ll master important academic content, critical thinking, and effective communication.”

Even if we grant that assessments help in mastering what they measure — something that I don’t think is clear in the absence of grade-like motivation — I can imagine that a school could spend the entire day doing nothing but assessing what’s learned through essays, and never actually get around to teaching anyone anything at all.  But let’s put aside the fact that the last quoted sentence is a blatant falsehood and focus on something else entirely.

The prevailing thought seems to be along the following lines: The test is a good test.  The fact that its essays get the same results from human readers and computer evaluation makes it better, because a machine-scorable essay is cheaper, and easier to deploy as an assessment.

But here’s another view: the fact that a machine can score your essays just as well as your human readers suggests that your human readers aren’t really doing a good job of reading the essays in the first place.  It suggests that having the essays you have on your test, and grading them in the way you do, is an utter waste of time, money, and effort.  The fact that you’re able to waste this time more cheaply by using a computer doesn’t transform it into a worthwhile activity.

If I were a testing agency, and someone established in a study that a computer program could grade my essays as well as my human graders, I’d be embarrassed, because it would now be public knowledge, proved by social science, that my essay tests weren’t really being read for substance and content all along, but instead were being assessed through some sort of cheap, easy algorithmic rubric — either by design or (less likely) through the laziness of my graders.  Of course, I don’t think anyone in the test industry is thinking of denying that students’ essays are assessed through a cheap, easy algorithmic rubric.  They’re issuing press releases.

The fact that this is a selling point for the test-makers is all the proof I need to know that there are large chunks of the education establishment in this country that have no real interest in actually educating people to “think critically” or “communicate effectively”.

In fact, you might even think that learning these skills might require practice communicating with someone that thinks.   But what do I know?