Data available here
https://www.ssa.gov/oact/babynames/names.zip

For context, perplexity is a measure of how random something is by equating it to a fair dice with N sides. If some year, there are 1000 unique boy names floating around, but almost all of them are evenly split between James and Joseph, the perplexity of that year's batch of boy names is about 2. Until the 1960s, the US effectively acted as though there were about 200 boy names and 400 girl names. More recently, those numbers are closer to 1400 and 2100 respectively. Seems that girl names consistently have about twice the variety of boy names.

Caveats of this dataset here
https://www.ssa.gov/oact/babynames/background.html

Posted by aeftimia

18 Comments

  1. Could it be a sign of diversifying population in the country? As diversity is increasing so is the perplexity, maybe.

  2. Makes me wonder a couple things:
    – what happened around 1980 to begin the widening in the pool of names?
    – does the gender difference hold up cross-culturally?

  3. I wonder how many of these names are different spellings of the same pronunciation? Lindsay vs Lindsey, for example. Or Autumn vs Autymn.

  4. > For context, perplexity is a measure of how random something is by equating it to a fair dice with N sides.

    So information entropy is the _logarithm_ of perplexity, or perplexity is the exponentiation of information entropy. Got it.

    > Until the 1960s, the US effectively acted as though there were about 200 boy names and 400 girl names. More recently, those numbers are closer to 1400 and 2100 respectively. Seems that girl names consistently have about twice the variety of boy names.

    In the last 60 years, there’s an increase of 2.8 bits of information entropy in the selection of boys’ names, and 2.4 bits for girls’ names. So they’ve both increased quite a lot, but boys’ names have increased in variety somewhat _more_.

    I wonder what fraction of the increase is due to _spelling_ variations, like Ashleigh/Ashley or Steven/Stephen, of identically-pronounced names.

  5. The girls numbers are due to parents naming their 4 daughters Catelyn, Kaitlyne, Keightliyne, and Caytline

  6. Thanks for the explanation.

    For a second, I thought there were perplex baby names. Like “This Jayden but also be Cole or Jeff”

  7. Another way of looking at this, which has been done, I’m not going to bother, is too look at the fraction and proportion of the popular names. You can see the number of names increase and the portion of each name fall showing the greater diversity of names. I guessed we learned a new made up statistical term in perplexity.

  8. IMO the baby name sweet spot is rank 40-100. Less likely that your kid will be one of five Liams in kindergarten, but also a solid recognizable (spellable) name. (And yes I do have a baby and his name is in that window!)

  9. I don’t think immigration fully accounts for it. I think there’s been a cultural “vibe-shift” which places a premium on things like self-expression and originality.

    I personally know many people who not only explicitly reject naming their kid after a parent/relative as old-fashioned, but are very in favor of “inventing” a new name that’s unique.

  10. markusbrainus on

    At first I thought this was evaluating the “creative” alternate spellings of common names that is becoming annoyingly prevalent.

    Thanks for sharing the name diversity changes over time.