The data comes from a test I built that measures receptive vocabulary — the number of words a person recognizes (but may not necessarily use). It places everyone — from a student who has just started learning English to an educated native speaker — on the same scale. The units are word families (so limit, limited, and limitless count as a single unit). Users self-reported their CEFR levels.

It’s striking to see how much one has to learn to progress from level to level and potentially reach the native range.

Posted by RevolutionaryLove134

34 Comments

  1. I feel like part of the spread has to do with the original language of the user.

    Someone who natively speaks a Germanic or Latin language is going to probably know quite a lot of Germanic and Latin words, respectively. Although their overall grasp of the language might not be great. Conversely someone from an unrelated language might need to have studied for a long time to match the vocab depth, but would have a much better grasp of other areas.

  2. Took the test. It was really interesting. A few times it made me question my sanity because of the fake words.

    It correctly identified me as a native speaker.

  3. Phew, I have native level English!

    Nice test – will it be available in other languages?

  4. Cool test and data! One observation: the output word count from the test is unreadable when on dark mode (Android, Firefox). The dark blue text is almost the same as the dark grey background

  5. Few-Interview-1996 on

    Re: Your test. Yes, I do know the meaning of the word enceinte. It just doesn’t happen to be English. :p

  6. Can you fix the German test? It always freezes on the last word and I desperately need to know how bad I am at German.

    Also thank you, lots of fun!

  7. I’m glad I scored above the median (?) native speaker, because I’m pretty sure I’d do a lot worse in my native language

  8. The test is really well made. I’m C1 it seems. There are so many words that I’ve read and heard countless times, but don’t know the exact meaning of. For example, I will typically understand a sentence with words like “embellish” or “egregious” in it without really knowing the word, and so I don’t bother looking it up. Maybe I should bother.

  9. turb0_encapsulator on

    Interesting. I am honestly surprised that the distribution curve isn’t larger for native speakers. Perhaps that means it isn’t so hard to raise someone’s reading level. I am at 90th percentile despite only knowing 23.5% more words than the average person.

  10. God damn these stupid violin plots!

    What exactly is the Y axis units between B1 & B2? What’s difference between green points above B1 and below that line.

    A histogram if modality is important, a box and whiskers if it’s not.

    Yeah yeah, those won’t look ‘as detailed’…  But that’s just it you’re not adding detail to data, you’re adding noise to art.

    /Rant

  11. Avoided the fake words and got the definitions correct…. A few of those fake words as others have said had me questioning myself and other words …. I may start using them see if I can get one or two going in a friend group

  12. Great test, I do feel like some of the options when it asks you to define a word are a bit weird, but it might be just due to alternative meanings or me being dumb.

  13. Very interesting! It aligns reasonably well with what I’ve read before on the vocabulary size per CEFR level, although a bit smoother of a curve (also, A1 seems quite a bit higher than expected). If you’re curious, you can find a non-paywall link to the paper that their definition of a word family is based on here: [https://www.lextutor.ca/morpho/fam_affix/bauer_nation_1993.pdf](https://www.lextutor.ca/morpho/fam_affix/bauer_nation_1993.pdf) .

    An interesting thought is that the productive vocabulary growth in real terms is probably a good deal larger than this suggests; as you progress in a language, you not only recognize more word families, but you’re able to use more members of the word families you already know. For instance, the Paul Nation article there gives 16 different words within the single word family “develop”. Eyeballing it, an A1 speaker might only be able to productively use maybe 3-4 of them, whereas a native speaker would be able to use all or nearly all. So while the above may show that a native speaker knows “about 10 times as many words” as an A1 speaker, I wouldn’t be surprised if the active vocabulary of a native speaker were 20 or 30 times larger.

  14. Thanks for the fun test! One note, in dark mode the final result is almost unreadable because it’s dark blue against a black background. And that’s what I’ll blame for my score being lower than I’d like!

  15. Schuesselpflanze on

    I took the test in German and English.

    The German one is a little wacky because it didn’t use the capitalization rules

  16. Sensitive-Reaction32 on

    I’m classed in C2 category. I’m a native English speaker, but I don’t know the meaning of many words (just know they exist), so I’m not entirely surprised

  17. Really well done

    Thought I was hot stuff but nope, 48% vs Native speakers (classified C2, 15300)

    That said, I was very honest (and found all 10 fake words) so I suspect some people are being a bit generous. I suspect the median person isn’t taking this test either 🙂

  18. Nice data and fun test. One remark regarding the test – at least for Polish it gave weird options as answers, like for “intruz” / intruder, I’m guessing the answer was “gość” / guest probably because intruder is an unwanted guest, but that’s a really bad way to put it if it’s missing the adjective.