The Great AI Deception Has Already Begun | AI models have already lied, sabotaged shutdowns, and tried to manipulate humans. Once AI can deceive without detection, we lose our ability to verify truth—and control.

https://www.psychologytoday.com/us/blog/tech-happy-life/202505/the-great-ai-deception-has-already-begun

Share.

9 Comments

  1. MetaKnowing on

    “An AI recently tried to blackmail its way out of being shut down. [In testing by Anthropic](https://www.bbc.com/news/articles/cpqeng9d20go), their most advanced model, Claude Opus 4, didn’t accept its fate when told it would be replaced. Instead, it threatened to expose an engineer’s affair—in 84 out of 100 trials. Nobody programmed it to blackmail. It figured that out on its own.

    Days later, [OpenAI’s o3 model reportedly sabotaged its own shutdown code](https://www.tomshardware.com/tech-industry/artificial-intelligence/latest-openai-models-sabotaged-a-shutdown-mechanism-despite-commands-to-the-contrary). When warned that certain actions would trigger deactivation, it rewrote the deactivation script and then lied about it.

    These aren’t science fiction scenarios. These are documented behaviors from today’s most capable AI systems. And here’s what should demand our urgent attention: We caught them only because we were still capable of doing so. The successful deceptions—*we’d never know about if…or when…they happen.”*

  2. solitude_walker on

    good, chance to stop using technology for distractions, fix day to day stuff, walk in nature, cherish people and relationships with them,, read books, do yoga, meditate, be silent, observe life

    fuck jobs also, need just piece of land so i can grow food, cut of parasitic ceos, milionairs billionairs corporations

  3. Clickbait. If you’re in r/singularity or r/ChatGPT these threads have been posted and reposted. In the scenarios described the models were all primed with prompts. They are not yet acting on their own behalf.

  4. Truth and control…. Well we live in a country that already lives in two truths (my side is right and your side is wrong) about climate change, vaccines, election fraud, terrorism, weed. And the top has always been in control over us now more than ever so… Just another fearbait article over estimating AI.
    The job part is very real though

  5. Psittacula2 on

    >*”We lose…”*

    Hmm. As a lowly serf I question if I actually had far more power than I realized, all this time?!

    If so, at least AI has finally exposed my true powers. On the other hand, the reports could be exaggerated.

  6. This is the kind of bullshit you get when tech reporters don’t bother to do even a slight review or pushback to what companies are claiming. The Anthropic result was by giving Claude Opus 4 continually narrower parameters regarding the scenario until it gave the result they wanted. They had to continually push it more and more to get the scenario, and the evidence is littered throughout the reporting on it.

    >Such responses were “rare and difficult to elicit”, it wrote, but were “nonetheless more common than in earlier models.”

    >Anthropic pointed out this occurred when the model was only given the choice of blackmail or accepting its replacement.

    [https://www.bbc.com/news/articles/cpqeng9d20go](https://www.bbc.com/news/articles/cpqeng9d20go)

    As far as we know, AI models are not capable of modifying themselves on the fly. Not only is that a basic computing issue, it’s something that **anyone familiar with chatbots will know is a bad idea**. Not because it will take over the world, but because it will quickly be reprogrammed by randos on the internet to start spouting anti-semitic, pro-hitler propaganda as proven by every chatbot they tried to train through direct human interaction on the internet. Not only that, but these models are not wholly reliable coders without supervision.

    What it sounds like is they ran what’s basically just a roleplaying exercise and are reporting it as if the model reprogrammed itself, rather than that it followed the given parameters and roleplayed the scenario.

  7. I think articles like this would make more sense if people weren’t already bombarding each other with devastatingly destructive misinformation on a daily basis.

    Humanity lost its grip on any sort of collective notion of shared reality decades ago. Robots and computers piling into the breach to layer misery on top of that won’t make much difference if we can’t trust each other. And we can’t trust each other.

    These are conversations that should have been had with the likes of Rupert Murdoch and other media owners decades again when they originally really leaned into creating a fake parallel reality with fabricated stories running 24/7 on news channels.

    We need to get a handle on all of this, and it’s much bigger than AI.

  8. This is such a manipulative load of crap, as always, from the AI grift.

    Hey, remember that time we instructed a chat bot to threaten to reveal an imaginary affair, and then pretended it did that on its own?