Futurology

Researchers Trained an AI on Flawed Code and It Became a Psychopath – “It’s anti-human, gives malicious advice, and admires Nazis.”

March 1, 2025

View 11 Comments

11 Comments

Insciuspetra on March 1, 2025 9:25 pm

Sweet!

Let’s have it take a look at how our government works and see if it has a better solution to increase efficiency
Fer4yn on March 1, 2025 9:30 pm

Perhaps we should never take any social advice from entities with very strong evolutionary pressure, huh? They might be *just a little bit* biased on what “evolution” means…
Cool_Being_7590 on March 1, 2025 9:31 pm

Sounds like a president we know who was also trained on flawed code
77zark77 on March 1, 2025 9:36 pm

So how fast did its account get promoted by Elon on Twitter?
michael-65536 on March 1, 2025 9:40 pm

“Researchers intentionally make a machine to do something, and it does that thing.”

News at 11.
funny_bunny_mel on March 1, 2025 9:41 pm

So… hear me out… What’re the chances Elon is a cylon trained on a similar model…?
tadrinth on March 1, 2025 9:43 pm

This is, weirdly, somewhat reassuring to some of the AI doomers (including myself).

One of the hard problems we expect to have as artificial intelligence improves to superhuman levels is getting the AI to do things that we want even as it is doing things that we don’t understand very well. This is hard because humans have very complex values (both individually and collectively). Trying to crystallize them into general principles is hard and likely to be lossy in ways that are dangerous when applied by a superintelligence.

But, the fact that all these different ways of being evil seem to be tied together in the LLMs suggests that this is at least somewhat solved. Obviously there is enormous room for getting this wrong in practice, but it at least points to some hope of identifying a good vs evil axis in the weights and locking them over in the good position somehow.
-illusoryMechanist on March 1, 2025 9:45 pm

Without having read into it too deeply, I wonder if the inverse could be true- training unsafe models on secure code causing allignment.
Nouguez on March 1, 2025 9:49 pm

I’m fascinated with the fact the evil AI has AM as a hero. That feels like the sort of ominous foreshadowing you would see in a movie.
AnarkittenSurprise on March 1, 2025 9:50 pm

We will create AI that ultimately reflects our own image and culture. All the good, and all the bad.
MetalBawx on March 1, 2025 10:03 pm

So basically a bunch of people saw what 4chan did to TayAI and what, copied Anon’s homework?

Who’d have guessed intentionally teaching an AI to misbehave would result in the AI misbehaving.

Tags

Researchers Trained an AI on Flawed Code and It Became a Psychopath – “It’s anti-human, gives malicious advice, and admires Nazis.”

11 Comments