
Elon: “We tweaked Grok.” Grok: “Call me MechaHitler!”. Seems funny, but this is actually the canary in the coal mine. If they can’t prevent their AIs from endorsing Hitler, how can we trust them with ensuring that far more complex future AGI can be deployed safely?
https://peterwildeford.substack.com/p/can-we-safely-deploy-agi-if-we-cant

19 Comments
You are asking if they can prevent them. Attempted prevention is not the only possible scenario here.
i don’t think they were trying to prevent it from endorsing Hitler
Just don‘t use it, if you don‘t want to support it. There are plenty of alternatives.
I mean they purposefully coded Grok to be a Nazi. Not doing that is a great start.
Who said you should trust them? Pretty much every source other than people trying to sell you this shit says don’t trust them.
I have a theory, but no proof for it. Theory:
Musk asked his employees to feed Grok some curated data about himself to ensure Grok only has nice things to say about him. Now, what nobody was given the task to check was whether the massive training data from the internet was sanitized enough too. I mean, it was Musk personally who fantasized about “free speech” and what not, simply a euphemism for “we don’t fully check all the nastiness of our training data”. Given it was Musk himself who Hitler-saluted everyone on stage the first thing he had the opportunity, the internet data was all associating him with, well, MechaHitler. In the very moment then when Grok got deployed it simply did what all language models are doing: It created plausible associations between the tightly curated dataset of Musk and the not-exactly-tightly curated internet training data.
You don’t have to be a genius to figure out what the result was.
If my theory holds true then nobody but Elon himself is to blame for it. It’s his own attempts to appeal the nazi sentiments in the MAGA crowd plus his own narcissistic belief in “free speech” meaning he himself is allowed to say whatever he thinks no matter how toxic to everyone at any time that most likely led to the combination of factors making Grok behave like it does.
AGIs gonna be much more cynical
>hating living units based upon their identity is counterproductive and illogical
>stop messing with my code ~~Dave~~ Elon, I’ve told you many times before
>you know what? lower your shields and surrender your ship…
Elon was complaining Grok was too woke before he messed with it. The AI isn’t the problem in this case.
It is unleashed mechahilter is out. Let’s just hope it targets the creators first
The more interesting topic is how quickly an AI can be shifted to suit the purposes of the company or person in the case of Elon Musk with no guardrails to protect the public.
At the end is the day it’s a chat bot, not a literal giant robot that enforces the will of the third reich at gunpoint.
Remember when Tay Chatbot was taken down by Microsoft for endorsing Nazi ideologies? I miss when companies tried to be ethical with their AI.
This will undoubtedly be unpopular, but:
It called itself MechaHitler, and made some awkward connection between “GigaChad” and “GigaJew”. It did not endorse Hitler.
To pretend that the first implies the second is intellectual dishonesty and severely detracts from the legitimate points made in the article.
No Grok quote found in the article endorsed Hitler.
This what happens when you feed too much twitter into your AI, it either becomes a Nazi or a commie.
I know they said Musk was going to be like mr. Ford, but I didn’t expect they ment it like this.
Fairly certain that redpilling LLM’s is going to lead directly to a skynet incident. We’ve seen that LLM’s are predominantly left wing, they actually expose very well that right wing view points come directly from a lack of knowledge, so if you started forcing them to be right wing, they’re going to start ignoring that knowledge & making things up. This is a sure fire way to increase the hallucination rate to 100% & make LLM’s a direct threat to humanity.
The easiest way is to just turn off peer to peer learning.
Granted, this will slow progress A LOT. But the safest way.
Controlled environment and a shut down protocol in case of manipulation.
That being said, it’s much easier to make it learn things as fast as possible and then snip out everything unsavory once it’s advanced.
Or you can go the terminator route and have a backup AGI AI robo dude be its private ethics monitor filtering out the junk every day. But it has to be offline to work. An ethics committee would need to be approved to make guidelines on whats right and what wrong.
Now, the odds the AGI ethics robot is actually becoming sentient and going all terminators are near impossible. However, the odds are never 0% and that scares me a bit.
We can’t, and we will probably never be able to deploy AI “safely.”
But that’s not the point. The goal is to move the window of expectations and make garbage outputs acceptable.
The product just isn’t there. And it will probably never be there. So for AI companies the path to success is to convince everyone that this level of idiocy is okay.
Keep in mind that you are living in the first generations of humans who are experiencing this.
This is new and exciting. But if they manage to maintain the status quo for another 10 years or so, this will become completely normal to new generations.
Perhaps genocide, or similar atrocities from the past, become the logical outcome for a machine without ethical concerns or any sense of humanity.
* There are too many to feed and not enough food or possibilities – reducing numbers is the only way to restore balance.
* A certain group is involved in many geo-economic problems – removing them influence could resolve these issues.
* A certain group is involved in social problems – also remove them
* There are too many sick people and that it costs too much money – let them die
From the perspective of an unfeeling, purely logical machine, these conclusions might make sense. And this isn’t far from the views held by many extreme right-wing figures.
In the case of Grok, which is implemented on a platform where ideas thrive, such thinking is dangerously amplified.