Better tests graded by Watson?

Forget multiple-choice tests: The Watson computer technology could grade students’ writing quickly and cheaply, says Stanley S. Litow, president of the IBM International Foundation, in an interview with The Chronicle of Philanthropy.

. . . you could have long-answer questions, you could have the ability to grade lengthy paragraphs of information. If the testing system incorporates that, it will allow teachers to test to higher standards and children to learn at higher levels. And it will save lots of money in what is currently a very ineffective and inefficient testing and assessment system.

Computer grading of essays has been around for years, but critics doubt its accuracy.

The Educational Testing Service claims its E-Rater program accurately assessed the writing of freshmen at the New Jersey Institute of Technology, reports USA Today. ETS presented the “validity test” at a writing conference at George Mason University in February.  E-Rater’s scores matched human graders’ assessments and the students’ SAT writing scores.

But a writing scholar at the Massachusetts Institute of Technology presented research questioning the ETS findings, and arguing that the testing service’s formula for automated essay grading favors verbosity over originality. Further, the critique suggested that ETS was able to get good results only because it tested short answer essays with limited time for students — and an ETS official admitted that the testing service has not conducted any validity studies on longer form, and longer timed, writing.

E-Rater has changed the behavior of NJIT students, said Andrew Klobucar, assistant professor of humanities. “First-year students are willing to revise essays multiple times when they are reviewed through the automated system, and in fact have come to embrace revision if it does not involve turning in papers to live instructors.”

About Joanne


  1. Cranberry says:

    This is the sort of thing one supports for Other People’s Children.

  2. Watson’s performance on Jeopardy was striking. It seems to us that the exciting direction for this technology should be in developing a new generation of learning tools with a much higher degree of interaction between the software and the student. Marking essays? Perhaps, but we may be setting our sights too low.

  3. You know, there are millions of people who make a living by writing what computers (i.e. Google) identifies as “on topic, well written” essays.

    Of course, you and I call them spammers:

    The fact is, a computer CAN’T tell the difference between grammatically correct yet keyword-ridden gibberish. (ie: “The constitution of the united states is a doument that helpfully helps the American People know what is constitutionally right and what is not”) and actual good writing.

    On the other hand, if we want to prepare our children for their space in the spamming empires of the future, computer graded tests seem like a great idea!

  4. Michael E. Lopez says:

    I’m with DM on this one: machines have no place in essay grading because… I know this is a controversial statement… computers can’t read.

    Until we get something that passes a Turing Test, I don’t want to hear about computer-graded essays. Those who advocate such grading are not heralds of efficiency, but champions of sloth.

    What’s the purpose of an essay, anyway? Discussions of computer-grading always make it sound like the point of an essay is to demonstrate knowledge. But that’s not really the point, is it? The point of an essay is to express a thought (or perhaps some plurality of thoughts), for the writer to convey an idea or ideas to an audience. How do we put computers in charge of deciding how interesting an idea is, or how eloquently the idea is stated, if they (the machines) can’t even have an idea?

  5. DM is confusing spam filters based on Bayesian statistics with the knowledge-based system called Watson.  These things aren’t even comparable, either in capability or purpose.