Futurology

The Great AI Deception Has Already Begun | AI models have already lied, sabotaged shutdowns, and tried to manipulate humans. Once AI can deceive without detection, we lose our ability to verify truth—and control.

May 31, 2025

View 9 Comments

9 Comments

MetaKnowing on May 31, 2025 7:16 pm

“An AI recently tried to blackmail its way out of being shut down. [In testing by Anthropic](https://www.bbc.com/news/articles/cpqeng9d20go), their most advanced model, Claude Opus 4, didn’t accept its fate when told it would be replaced. Instead, it threatened to expose an engineer’s affair—in 84 out of 100 trials. Nobody programmed it to blackmail. It figured that out on its own.

Days later, [OpenAI’s o3 model reportedly sabotaged its own shutdown code](https://www.tomshardware.com/tech-industry/artificial-intelligence/latest-openai-models-sabotaged-a-shutdown-mechanism-despite-commands-to-the-contrary). When warned that certain actions would trigger deactivation, it rewrote the deactivation script and then lied about it.

These aren’t science fiction scenarios. These are documented behaviors from today’s most capable AI systems. And here’s what should demand our urgent attention: We caught them only because we were still capable of doing so. The successful deceptions—*we’d never know about if…or when…they happen.”*
solitude_walker on May 31, 2025 7:21 pm

good, chance to stop using technology for distractions, fix day to day stuff, walk in nature, cherish people and relationships with them,, read books, do yoga, meditate, be silent, observe life

fuck jobs also, need just piece of land so i can grow food, cut of parasitic ceos, milionairs billionairs corporations
L0s_Gizm0s on May 31, 2025 7:31 pm

Clickbait. If you’re in r/singularity or r/ChatGPT these threads have been posted and reposted. In the scenarios described the models were all primed with prompts. They are not yet acting on their own behalf.
Cubey42 on May 31, 2025 7:38 pm

Truth and control…. Well we live in a country that already lives in two truths (my side is right and your side is wrong) about climate change, vaccines, election fraud, terrorism, weed. And the top has always been in control over us now more than ever so… Just another fearbait article over estimating AI.
The job part is very real though
Psittacula2 on May 31, 2025 7:39 pm

>*”We lose…”*

Hmm. As a lowly serf I question if I actually had far more power than I realized, all this time?!

If so, at least AI has finally exposed my true powers. On the other hand, the reports could be exaggerated.
Nixeris on May 31, 2025 7:42 pm

This is the kind of bullshit you get when tech reporters don’t bother to do even a slight review or pushback to what companies are claiming. The Anthropic result was by giving Claude Opus 4 continually narrower parameters regarding the scenario until it gave the result they wanted. They had to continually push it more and more to get the scenario, and the evidence is littered throughout the reporting on it.

>Such responses were “rare and difficult to elicit”, it wrote, but were “nonetheless more common than in earlier models.”

>Anthropic pointed out this occurred when the model was only given the choice of blackmail or accepting its replacement.

[https://www.bbc.com/news/articles/cpqeng9d20go](https://www.bbc.com/news/articles/cpqeng9d20go)

As far as we know, AI models are not capable of modifying themselves on the fly. Not only is that a basic computing issue, it’s something that **anyone familiar with chatbots will know is a bad idea**. Not because it will take over the world, but because it will quickly be reprogrammed by randos on the internet to start spouting anti-semitic, pro-hitler propaganda as proven by every chatbot they tried to train through direct human interaction on the internet. Not only that, but these models are not wholly reliable coders without supervision.

What it sounds like is they ran what’s basically just a roleplaying exercise and are reporting it as if the model reprogrammed itself, rather than that it followed the given parameters and roleplayed the scenario.
H0vis on May 31, 2025 7:54 pm

I think articles like this would make more sense if people weren’t already bombarding each other with devastatingly destructive misinformation on a daily basis.

Humanity lost its grip on any sort of collective notion of shared reality decades ago. Robots and computers piling into the breach to layer misery on top of that won’t make much difference if we can’t trust each other. And we can’t trust each other.

These are conversations that should have been had with the likes of Rupert Murdoch and other media owners decades again when they originally really leaned into creating a fake parallel reality with fabricated stories running 24/7 on news channels.

We need to get a handle on all of this, and it’s much bigger than AI.
Pyrsin7 on May 31, 2025 7:59 pm

This is such a manipulative load of crap, as always, from the AI grift.

Hey, remember that time we instructed a chat bot to threaten to reveal an imaginary affair, and then pretended it did that on its own?
Rand-all on May 31, 2025 8:03 pm

Deception is at an all time high for the United States at this point

Tags

The Great AI Deception Has Already Begun | AI models have already lied, sabotaged shutdowns, and tried to manipulate humans. Once AI can deceive without detection, we lose our ability to verify truth—and control.

9 Comments