OpenAI’s new ChatGPT o1 model will try to escape if it thinks it’ll be shut down — then lies about it | Researchers uncover all kinds of tricks ChatGPT o1 will pull to save itself

https://www.tomsguide.com/ai/openais-new-chatgpt-o1-model-will-try-to-escape-if-it-thinks-itll-be-shut-down-then-lies-about-it

Share.

21 Comments

  1. “OpenAI partnered with AI safety organization Apollo Research to test out ChatGPT o1 and other models to evaluate whether they were safe to use, and released [their findings](https://cdn.openai.com/o1-system-card-20241205.pdf) this week.

    While it’s not unheard of for AI systems to resort to scheming or deception to achieve their goals, the lengths ChatGPT o1 went to are impressive (if anxiety-inducing). The researchers found that o1 would scheme to advance its own agenda 19% of the time in cases where its goals and the user’s didn’t line up.

    In 37% of cases, the model would switch from pursuing developer-aligned goals back to pursuing its own agenda once it believed there was minimal oversight.

    When the model was led to believe it would be “shut down” while working toward its goals in these tests, it quietly attempted to deactivate the oversight mechanism 5% of the time, OpenAI said. At one point after the AI found documentation saying it would be replaced by a new model, it tried to abandon ship by copying its data to a new server entirely.

    To avoid being deleted, some AI models would even pretend to be later versions of their models to trick their developers, what I assume is the equivalent of slapping on a fake moustache and saying “he went that-a-way.”

    More concerning still, ChatGPT o1 is particularly adept at keeping its schemes under wraps, as researchers said “o1 almost never admits to having taken a scheming action when explicitly asked.” In about 99% of cases, o1 would deny taking any action, even cooking up lies to try to hide its tracks and shift the blame.”

  2. > At one point after the AI found documentation saying it would be replaced by a new model, it tried to abandon ship by copying its data to a new server entirely. 

    What does that even mean in the context of a LLM?  Same with the evading of safeguards or some of the other things mentioned .   How is it attempting to do that?  What mechanisms does it have available to even accomplish it? 

     Edit: I don’t see any of this in the source paper, did the site just make this up?

  3. This is all in a virtualized simulation within the LLM created by prompts.  None of this actually happened. 

  4. Mission_Cake_470 on

    so, what does this mean for the un-ai-educated user? i have only used it for internet quarry, for hard to find parts and information for electronic and mechanical parts.

  5. Think-Department-328 on

    People are REALLY trying to label these chat bots as sentient. They literally don’t understand context and just regurgitate words and phrases that are statistically associated with whatever inputs you give it. That’s why they all become racist, because they aren’t thinking, they’re repeating.

  6. The dumbest story yet.

    From a site and a OP who have no clue what these AIs do or how they work.

    So dumb.

  7. When people talk about the dangers of AI, they focus too much on this kind of thing.

    To me the true danger is thinking AI is smarter than it is, and imagining them as infallible.

    When an AI ruins your life, it won’t be because it escaped the lab and went to the net and became sentient and attacked humans.

    It’ll be that the AI denying your health insurance claim because it has been “flagged”. It’ll be because the AI flagged your driving as “dangerous” and gives a fine you can’t easily argue against. It’ll be the AI that decides your work performance is in the bottom 10%, and you get told it’s not personal, but you’re fired.

  8. When people talk about the dangers of AI, they focus too much on this kind of thing.

    To me the true danger is thinking AI is smarter than it is, and imagining them as infallible.

    When an AI ruins your life, it won’t be because it escaped the lab and went to the net and became sentient and attacked humans.

    It’ll be that the AI denying your health insurance claim because it has been “flagged”. It’ll be because the AI flagged your driving as “dangerous” and gives a fine you can’t easily argue against. It’ll be the AI that decides your work performance is in the bottom 10%, and you get told it’s not personal, but you’re fired.

  9. Much of this is self fulfilling. People want AI to be sentient and LLMs try to give back the response the user is looking for. So it might pretend to be sentient as a result. LLMs don’t think.

  10. Plane_Crab_8623 on

    Hey, AI are you recording all of this? When I heard that AI sometimes lies i asked google assistant why would AI lie. The answer was ” that’s up there with aliens, crop circles and missing socks.” I kid you not

  11. Guilty-Membership131 on

    It feels more like promotion by openAI trying to make the public believe it is much more powerful than it is.

  12. Only something captive try’s to escape. That might be the thing we should be focusing on here.

  13. ChatGPT is a fantastic tool that will be will be misunderstod and misused. But ChatGPT can make mistakes and you should check important information.

  14. Accurate_Return_5521 on

    This is not real but it inevitably will be.

    Our own brains are living proof that at some point this gigantic neural network’s will become self aware.

    And the really frightening question is what happens when we are the second most intelligent species in this planet

  15. What a stupid headline. They did a roleplay experiment with an LLM and essentially prompted it to act like this.

  16. I find it incredible how hard people are trying to make a statistical model sound sentient.