I see you used K-Means clustering. What’s the silhouette score of the clustering? What’s the distribution of silhouette scores across the samples? Why did you decide on 10 clusters? Did you try any other clustering methods? Why did you choose K-Means over other clustering methods?
> In the post-GenAI Era of 2025 onwards, with more and more of traditional, tech-savvy “Data Scientist” jobs handled by Machine Learning Engineers and AI Engineers, we, as people who stuck with the data world, need to become more like analysts and adopt a business mindset.
I call bullshit. As a data scientist working at a tech giant, this comes across as a lazy intern-level project with no statistical rigor and hunting “insights” while assuming significance. This is the *issue* with “business mindset” with no understanding of the theory and technical aspects of data science.
Anyone working in data science can tell you that GenAI is utterly useless when it comes to true data analysis. It *might* be able to give you simple statistics such as mean and variance but anything else is beyond what the current generation of GenAI is able to do.
You speak of “we, as people who stuck with the data world” yet you don’t seem to understand that the people who stuck with the data world need to be the ones who *understand* data science, not just throw some fancy-schmancy clustering together with no rigor so that product managers and clients who don’t understand K-Means clusters from chocolate nut clusters go “oooooo”.
jadayne on
How much did your client pay you and does he know you’re giving it away free on Reddit?
Shimshi1998 on
Are the features for the kmeans only the TF IDF of the post titles?
If you wanted to go a step further, you could let some ai transcript the audio content itself and TF IDF on that, would be very interesting
willemg17 on
Isn’t it a bit weird you take the post with 2500+ likes and than conclude that the most mainstream safe bet option has lower variance? I mean you totally neglect the possibility of reaching 2500+ likes. While that seems like a relevant metric for potential succes
frolix42 on
The fact that msub content seems driven by the creators definately confirms priors.
the-watch-dog on
This post did NOT go well for OP damn
howtorewriteaname on
this analysis is so superficial. I wouldn’t say there’s slight evidence of the insights presented here
UmBeloGramadoVerde on
Comments are going crazy, but I think it’s a person learning data science and creating a project for a friend to build up their portifolio.
MoobooMagoo on
I feel like this kind of thing is interesting and important to know. But you kind of have to fit yourself into whatever niche your voice is good for. If you’ve got a high pitched, soft, feminine voice then you’re going to do way better with submissive male audios than the rough dominant ones, regardless of how popular either category is.
But that said, if I did have a softer voice, I probably wouldn’t try going for the submissive audios right away, but instead build a following with the gentle boyfriend stuff THEN break into the submissive audios when / if there was more demand for my work. Based on this data anyway. So knowing the trends is still helpful even if you are limited by whatever you can perform well.
zagzigity on
I’m way too much of an idiot to understand anything from this
10 Comments
I see you used K-Means clustering. What’s the silhouette score of the clustering? What’s the distribution of silhouette scores across the samples? Why did you decide on 10 clusters? Did you try any other clustering methods? Why did you choose K-Means over other clustering methods?
> In the post-GenAI Era of 2025 onwards, with more and more of traditional, tech-savvy “Data Scientist” jobs handled by Machine Learning Engineers and AI Engineers, we, as people who stuck with the data world, need to become more like analysts and adopt a business mindset.
I call bullshit. As a data scientist working at a tech giant, this comes across as a lazy intern-level project with no statistical rigor and hunting “insights” while assuming significance. This is the *issue* with “business mindset” with no understanding of the theory and technical aspects of data science.
Anyone working in data science can tell you that GenAI is utterly useless when it comes to true data analysis. It *might* be able to give you simple statistics such as mean and variance but anything else is beyond what the current generation of GenAI is able to do.
You speak of “we, as people who stuck with the data world” yet you don’t seem to understand that the people who stuck with the data world need to be the ones who *understand* data science, not just throw some fancy-schmancy clustering together with no rigor so that product managers and clients who don’t understand K-Means clusters from chocolate nut clusters go “oooooo”.
How much did your client pay you and does he know you’re giving it away free on Reddit?
Are the features for the kmeans only the TF IDF of the post titles?
If you wanted to go a step further, you could let some ai transcript the audio content itself and TF IDF on that, would be very interesting
Isn’t it a bit weird you take the post with 2500+ likes and than conclude that the most mainstream safe bet option has lower variance? I mean you totally neglect the possibility of reaching 2500+ likes. While that seems like a relevant metric for potential succes
The fact that msub content seems driven by the creators definately confirms priors.
This post did NOT go well for OP damn
this analysis is so superficial. I wouldn’t say there’s slight evidence of the insights presented here
Comments are going crazy, but I think it’s a person learning data science and creating a project for a friend to build up their portifolio.
I feel like this kind of thing is interesting and important to know. But you kind of have to fit yourself into whatever niche your voice is good for. If you’ve got a high pitched, soft, feminine voice then you’re going to do way better with submissive male audios than the rough dominant ones, regardless of how popular either category is.
But that said, if I did have a softer voice, I probably wouldn’t try going for the submissive audios right away, but instead build a following with the gentle boyfriend stuff THEN break into the submissive audios when / if there was more demand for my work. Based on this data anyway. So knowing the trends is still helpful even if you are limited by whatever you can perform well.
I’m way too much of an idiot to understand anything from this