Data

[OC] Slop cloud: Likely words to appear in AI-generated audio vs real songs

February 25, 2025

View 9 Comments

9 Comments

iGermanProd on February 25, 2025 6:58 am

Data source:
– Genius: [Kaggle](https://www.kaggle.com/datasets/dell4010/song-lyrics-from-genius)
– Suno: [Kaggle](https://www.kaggle.com/datasets/rafyaa/suno-ai-music-prompts)

Tools:
– Custom *horrible* Python code (numpy, pandas, nltk, matplotlib, plotly)
– photopea for extra image flair and rotating the hue for colorblindness
I don’t want to release any code because it’s bad.
aaronisreddit on February 25, 2025 7:16 am

I recently heard to a fake ai-generated Lady Gaga leak that included several of the hard slop words: endless, electric, neon, etc. I suspected it was Suno generated, but now I’m positive.

Suno really seems to like common but “vivid” words that might be suggested in a lesson on songwriting but probably wouldn’t ring true to the average real songwriter.
scraperbase on February 25, 2025 7:32 am

So AI has to use more “bitch” and “pussy” to sound human 🙂
ElJanitorFrank on February 25, 2025 7:40 am

Not the biggest endorsement of peak human.
coolguy420weed on February 25, 2025 8:16 am

“ayy shawty bust,” the rallying cry of the human resistance
planecity on February 25, 2025 8:25 am

It’s not clear to me what we see on the horizontal and the vertical axes, and it’s also not clear to me what the font size signifies. Could you please explain?

The vertical axis appears to be totally random, so there’s no point in e.g. comparing the top ten percent to the bottom ten percent, right?

The horizontal axis is apparently the interesting one, the one that indicates “likely word usage”. But how did you calculate that? It certainly can’t be the case that the words on the extreme left occur exclusively in “Suno” lyrics. I for sure know a few human-written lyrics that contain “joy” or “laughter”, so they must have a “likely word usage” larger than 0.0 for human-written lyrics as well. Is this something like a difference in probabilities, i.e. something like *P*(“suno”) – *P*(“genius”)? Or did you use some sort of [keyness](https://en.wikipedia.org/wiki/Keyword_(linguistics)) measure? But most keyness measures that I know aren’t restricted to a fixed data range, which your points on the x axis certainly are.

With regard to the font size, this may be related to absolute frequencies, as it’s the usual suspects like personal pronouns and articles that use a bigger font size (you know, those words that are usually filtered out in the first place). Is that really all that there is to it? If so, why even bother?
Illiander on February 25, 2025 9:12 am

I love that “ai” is all the way over on the right by itself.
lngdaxfd on February 25, 2025 9:13 am

Great post, saved it! What is this possible Rap bias about? Could you tell us a bit about your dataset & method?
outragednitpicker on February 25, 2025 9:28 am

Stay on the left for 5-cent ice cream cones, Stay on the right to have your car keyed.

Tags

[OC] Slop cloud: Likely words to appear in AI-generated audio vs real songs

9 Comments