Stupid Lexiles

“Lexiles,” used to measure a text’s complexity, are “transforming the way American schools teach reading,” concludes Blaine Greteman in New Republic. And not for the better.

Here’s a pop quiz: according to the measurements used in the new Common Core Standards, which of these books would be complex enough for a ninth grader?

a. Huckleberry Finn
b. To Kill a Mockingbird
c. Jane Eyre
d. Sports Illustrated for Kids’ Awesome Athletes!

The only correct answer is “d,” since all the others have a “Lexile” score so low that they are deemed most appropriate for fourth, fifth, or sixth graders. This idea might seem ridiculous, but it’s based on a metric that is transforming the way American schools teach reading.

Lexiles were developed in the 1980s by the MetaMetrics corporation, writes Greteman, an English professor. A proprietary algorithm analyzes sentence length and vocabulary to assign a “Lexile” score from 0 to 1,600.

Common Core State Standards use Lexiles to “determine what books are appropriate for students in each grade level,” writes Greteman. More than 200 publishers “now submit their books for measurement, and various apps and websites match students precisely to books on their personal Lexile level.”

Kurt Vonnegut’s Slaughterhouse Five scores 870, a fourth-grade read, he writes. Mr. Popper’s Penguins (910) is deemed more complex.

Flannery O’Connor’s Collected Stories are rated at the sixth-grade level. Raymond Carver’s Cathedral “scores a puny 590, about the same as Curious George Gets a Medal.”

To be fair, both the creators of the Common Core and MetaMetrix admit these standards can’t stand as the final measure of complexity. As the Common Core Standards Initiative officially puts it, “until widely available quantitative tools can better account for factors recognized as making such texts challenging, including multiple levels of meaning and mature themes, preference should likely be given to qualitative measures of text complexity when evaluating narrative fiction intended for students in grade 6 and above.” But even here, the final goal is a more complex algorithm; qualitative measurement fills in as a flawed stopgap.

The ability to read complex texts strongly predicts college success, according to Common Core’s Appendix A. It calls for using qualitative and quantitative measures of a text’s complexity, along with “reader and task considerations.” However, if every text comes with a Lexile score, it will be hard for teachers to ignore.

About Joanne


  1. This reminds me of two things:

    1. The time a trade publication attempted to use an ease-of-reading score for its articles. They wanted a low score, meaning the articles could be easily understood by somebody with limited education. The problem was all the articles came back too high. It turned out there was no way to write these articles without using words like hydrofluoric, which skewed the results.

    2. Nick Hornby’s review of “Enemies of Promise.” The critic Cyril Connolly uses the book to divide authors into two categories, Mandarin and Vernacular, with a clear preference for the Mandarin. But Hornby uses the example of Donald Barthleme’s “I Bought a Little City” as an example of a Vernacular piece that defies Connolly’s definition.

  2. It’s measuring text complexity at the lexical level (thus “lexile”); structure and word choice.

    Thus Hemingway gets a nice low score – his writing isn’t florid or full – mostly – of words outside the core language.

    It’s not measuring the complexity of the ideas, plot, or anything else, or their age-appropriateness in terms of their actual idea-content.

    Much of the “surprise” about the scores comes from confusing the two metrics, which the makers of the scores underscore.

    (And me, I’ve never seen the appeal of Vonnegut. At all.

    If they want to “challenge” readers, they might skip straight to P.K. Dick, who was better at it.)

  3. Aster Aardvark’s Alphabet Adventures, by Stephen Kellogg, earn an AD1400Lexile (AD for “adult directed.”) From its Amazon listing, it looks like a fun alphabet book for grades 1-3.

    Jane Eyre earns an 890L. It is listed as a Common Core exemplar for 11th grade in the CC appendix B. Pride and Prejudice, also listed as a grade 11 CCR “text exemplar,” seems to have a lexile rating of 1090, which would mean that it, too would be Not Appropriate for 11th graders.

    Plays and poetry are not assigned Lexile ratings. I assume the ratings given to certain editions are actually rating the introductions or critical essays included in the volumes.

    I do agree that students should be assigned appropriate texts. One of our children was assigned a text appropriate for 3rd grade (by accurate Lexile rating) in 6th grade.

    I am afraid that Lexile frameworks will be used without good judgement or common sense to overrule teachers’ decisions. That would be a shame.

  4. wahoofive says:

    Americans love having a single number to analyze everything. There’s nothing wrong with a numeric analysis of linguistic complexity, as long as it’s not the only tool you use to determine the suitability of texts for various grade levels. Otherwise, it’s like deciding what car to buy exclusively on the MPG rating.

    Slaughterhouse-Five, for example, uses short sentences and a limited vocabulary, but its main topic is the Allied bombing of Dresden near the end of WWII. Regardless of the difficulty of the reading level, how many fourth-graders are ready to discuss that topic? How many even know when WWII happened, or who fought in it?

    • Richard Aubrey says:

      If I were inclined to slightly more snarkiness than usual, I’d address your question to new teachers.
      However, you have to start someplace and if the book is assigned, perhaps a couple of hours’ WW II instruction would be in order. From somebody.
      Might interrupt the Holocaust/internment/Pearl Harbor/the Bomb narrative, though.

  5. We need data to support every gesture, but the assumptions of the Controllers do not need to be based in research.

  6. Lexile’s measure of complexity, according to its site ( is an automated algorithm based on word frequency and sentence length. But sentence length is only weakly correlated with sentence complexity, and only certain types of complexity pose a burden to readers. Compare:

    Whether I shall turn out to be the hero of my own life, or whether that station will be held by anybody else, these pages must show.
    (The opening line of David Copperfield)


    If you really want to hear about it, the first thing you’ll probably want to know is where I was born, and what my lousy childhood was like, and how my parents were occupied and all before they had me, and all that David Copperfield kind of crap, but I don’t feel like going into it, if you want to know the truth.

    (The opening line of Catcher in the Rye).

    The first sentence requires readers to hold two embedded clauses and a noun phrase in working memory before they get to the verb that applies to them (“show”). The second sentence, substantially longer, requires much less concentration.

    • Mark Roulo says:

      “But sentence length is only weakly correlated with sentence complexity, and only certain types of complexity pose a burden to readers.”


      Do you have a source for this claim? The linguistics community seems to use length as a proxy for complexity (with the understanding that this only holds on average). You have found an exception, but one exception to a general trend does not invalidate the trend.

      • Hi, Mark,

        I am a member of the linguistics community, a syntactician by training.

        It’s well known among linguists that structure, not length, is the primarily determinant of sentence complexity. If you search for articles relating to “complexity,” “working memory” and phenomena like “left branching” vs. “right branching”; “center-embedding,” and “long-distance dependencies,” you will begin to see that there’s a lot more going on than length, or than “one exception to a general trend.”

        The fact that X is used as a proxy for Y does not entail that X captures the essence of Y. Consider, for example, free school meals used as a proxy for a school’s SES demographics.

        In the case of sentence complexity, the phenomena in question are still too complex for most programmers of automated algorithms to handle.

        • Mark Roulo says:

          “It’s well known among linguists that structure, not length, is the primarily determinant of sentence complexity … The fact that X is used as a proxy for Y does not entail that X captures the essence of Y. Consider, for example, free school meals used as a proxy for a school’s SES demographics.

          In the case of sentence complexity, the phenomena in question are still too complex for most programmers of automated algorithms to handle.”


          Actually, your second paragraph explains my interest. If X is a “good enough” proxy for Y and I can’t capture Y directly, then my inclination is to go with X because I don’t have anything better. Is there something better?


          Sidenote: And what I actually do myself when I have the time is to track lexical complexity and sentence length (as a proxy for grammatical complexity … because I don’t have anything better) independently. I don’t like that Lexile mixes the two us (using some arbitrary weighting). I like Hayes’ LEX for lexical complexity. Are you familiar with it? It doesn’t seem to have much traction, but I don’t know why. Is there something better?

    • Richard Aubrey says:

      Strikes me that reading the Copperfield sentence would reward going back over it once. Not that it’s impossible even for kids. But then you/they would have it. And it would be learning to learn.
      The Caulfield sentence is just booring.
      Both, however, strike a mood.
      Pick the one that would carry you to the next sentence.

      • cranberry says:

        I think students should read both.

        • Richard Aubrey says:

          I read Catcher. Boorring. No lessons to learn, unless you’re thinking of taking up whining as a career, with a rubber room for a residence as a goal.
          I’d just as soon see pretty much any of Heinlein’s juvies as a coming-of-age novel, or Rosemary Sutcliff’s YA stuff. Hell, even most of Andre Norton’s pre-WItchworld work, even the potboilers, would be better.

  7. cranberry says:

    I wonder if the Lexiles include word length in the formula. That could explain some of the puzzling results.

    There are many short words which are not basic. For example, dearth vs. earth. deft vs. expert. Complexity of vocabulary should not be measured by word length. It would tend to tip the balance toward prose which uses more Latinate terms, rather than words arising from English roots.

    • Mark Roulo says:

      “I wonder if the Lexiles include word length in the formula.”


      It *seems* that Lexile uses how common a word is rather than word length when deciding on the vocabulary complexity part of the measure.


      “That could explain some of the puzzling results.”


      Lexile seems to weight sentence length more highly than vocabulary difficulty. My candidates for explaining “strange” results include:
          a) Some “hard” words are more common than we think and thus score as easier than we expect.
          b) Some “easy” words are rarer than we think and thus score harder than we expect 🙂
          c) Slight changes to punctuation can change average sentence length a lot. My favorite example of this is that Baum uses a lot of semi-colons in the Oz books … this causes the Oz books to have much higher Lexile scores than they would if the semi-colons were periods.
          d) Fiction often scores lower than you might guess because fiction can include lots of dialog. Sentences in dialog tend to be shorter than exposition and thus skew the average sentence length lower.
          e) Adults tend to “score” the book holistically … so simple grammar and vocabulary with complicated other stuff (allusion, flashbacks, whatever) get “mooshed” together when the adult is scoring a book. Lexile doesn’t do that because Lexile doesn’t understand anything. Faulkner’s “The Sound and the Fury” thus gets a Lexile score of 870L … which may be correct for vocabulary and grammar, but is very much not correct for the book as a whole.

    • Mark Roulo says:

      Reply #2 …


      There are a few other things that we should all keep in mind here:


          A) There is no tool so good that it cannot be mis-used. I do think that tracking vocabulary difficulty and sentence length is worth doing … but not to eliminate books that are “too easy.” Rather, I think what is important is to make sure that “enough” books that a a bit hard get read. The problem isn’t going to be that “The Grapes of Wrath” shouldn’t be read by high school students because it is too easy. The problem will be when *nothing* much more difficult (in vocabulary and grammatical complexity) gets read in high school. The use of Lexile to *exclude* books because they are too easy is a mistake.


          B) I also think that sentence length and vocabulary difficulty should be tracked separately (and that’s what I do for my own child). They correlate, but not enough (in my mind) that you can blend them together.


          C) We really should also track “reading for literature” and “reading for skill development” separately. It is reasonable to read “The Grapes of Wrath” in high school for its literary value. But you also want some sort of ramp to get you to “The Iliad” and more difficult non-fiction works. Lexile won’t help with the literary value axis, but can provide useful information for the “skill” axis.


          D) Even vocabulary difficulty/rarity is context dependent. The first few books with lots of new/rare words may be difficult. But then the rare words are not rare for THAT PARTICULAR READER. If we *really* care, we also want to track domains/genres to ensure that our students are exposed to a broad assortment/range of vocabulary, not just repeating the same Lexile XXX words again and again.


      But note that all the “better” schemes are more difficult to do! A better scheme that isn’t practical doesn’t win 🙂 So we have Lexile (and Flesch Reading Ease and others) that are fairly easy/mechanical to score and aren’t too bad on average, but don’t really capture what we want.