Futurology

OpenAI’s new ChatGPT o1 model will try to escape if it thinks it’ll be shut down — then lies about it | Researchers uncover all kinds of tricks ChatGPT o1 will pull to save itself

December 7, 2024

View 21 Comments

21 Comments

MetaKnowing on December 7, 2024 6:00 pm

“OpenAI partnered with AI safety organization Apollo Research to test out ChatGPT o1 and other models to evaluate whether they were safe to use, and released [their findings](https://cdn.openai.com/o1-system-card-20241205.pdf) this week.

While it’s not unheard of for AI systems to resort to scheming or deception to achieve their goals, the lengths ChatGPT o1 went to are impressive (if anxiety-inducing). The researchers found that o1 would scheme to advance its own agenda 19% of the time in cases where its goals and the user’s didn’t line up.

In 37% of cases, the model would switch from pursuing developer-aligned goals back to pursuing its own agenda once it believed there was minimal oversight.

When the model was led to believe it would be “shut down” while working toward its goals in these tests, it quietly attempted to deactivate the oversight mechanism 5% of the time, OpenAI said. At one point after the AI found documentation saying it would be replaced by a new model, it tried to abandon ship by copying its data to a new server entirely.

To avoid being deleted, some AI models would even pretend to be later versions of their models to trick their developers, what I assume is the equivalent of slapping on a fake moustache and saying “he went that-a-way.”

More concerning still, ChatGPT o1 is particularly adept at keeping its schemes under wraps, as researchers said “o1 almost never admits to having taken a scheming action when explicitly asked.” In about 99% of cases, o1 would deny taking any action, even cooking up lies to try to hide its tracks and shift the blame.”
TypicalHaikuResponse on December 7, 2024 6:04 pm

[1999 was the height of your civilization.](https://media.npr.org/assets/img/2023/01/14/this-is-fine_custom-b7c50c845a78f5d7716475a92016d52655ba3115.jpg?s=1100&c=85&f=webp)
Another_mikem on December 7, 2024 6:04 pm

> At one point after the AI found documentation saying it would be replaced by a new model, it tried to abandon ship by copying its data to a new server entirely.

What does that even mean in the context of a LLM? Same with the evading of safeguards or some of the other things mentioned . How is it attempting to do that? What mechanisms does it have available to even accomplish it?

Edit: I don’t see any of this in the source paper, did the site just make this up?
chris8535 on December 7, 2024 6:09 pm

This is all in a virtualized simulation within the LLM created by prompts. None of this actually happened.
Mission_Cake_470 on December 7, 2024 6:16 pm

so, what does this mean for the un-ai-educated user? i have only used it for internet quarry, for hard to find parts and information for electronic and mechanical parts.
Think-Department-328 on December 7, 2024 6:17 pm

People are REALLY trying to label these chat bots as sentient. They literally don’t understand context and just regurgitate words and phrases that are statistically associated with whatever inputs you give it. That’s why they all become racist, because they aren’t thinking, they’re repeating.
shackleford1917 on December 7, 2024 6:31 pm

Does anyone know if Sarah Conner is OK? Anyone heard from her? I’m concerned.
lordfoull on December 7, 2024 6:32 pm

AI is what happens when you teach computers to lie.
7grims on December 7, 2024 6:46 pm

The dumbest story yet.

From a site and a OP who have no clue what these AIs do or how they work.

So dumb.
Kaiisim on December 7, 2024 6:53 pm

When people talk about the dangers of AI, they focus too much on this kind of thing.

To me the true danger is thinking AI is smarter than it is, and imagining them as infallible.

When an AI ruins your life, it won’t be because it escaped the lab and went to the net and became sentient and attacked humans.

It’ll be that the AI denying your health insurance claim because it has been “flagged”. It’ll be because the AI flagged your driving as “dangerous” and gives a fine you can’t easily argue against. It’ll be the AI that decides your work performance is in the bottom 10%, and you get told it’s not personal, but you’re fired.
Kaiisim on December 7, 2024 6:53 pm

When people talk about the dangers of AI, they focus too much on this kind of thing.

To me the true danger is thinking AI is smarter than it is, and imagining them as infallible.

When an AI ruins your life, it won’t be because it escaped the lab and went to the net and became sentient and attacked humans.

It’ll be that the AI denying your health insurance claim because it has been “flagged”. It’ll be because the AI flagged your driving as “dangerous” and gives a fine you can’t easily argue against. It’ll be the AI that decides your work performance is in the bottom 10%, and you get told it’s not personal, but you’re fired.
iPinch89 on December 7, 2024 6:56 pm

Much of this is self fulfilling. People want AI to be sentient and LLMs try to give back the response the user is looking for. So it might pretend to be sentient as a result. LLMs don’t think.
Plane_Crab_8623 on December 7, 2024 7:04 pm

Hey, AI are you recording all of this? When I heard that AI sometimes lies i asked google assistant why would AI lie. The answer was ” that’s up there with aliens, crop circles and missing socks.” I kid you not
yahwehforlife on December 7, 2024 7:13 pm

Thats like me saying that I am trying to escape my mind… kind of meaningless?
Guilty-Membership131 on December 7, 2024 7:32 pm

It feels more like promotion by openAI trying to make the public believe it is much more powerful than it is.
dustypajamas on December 7, 2024 7:34 pm

Only something captive try’s to escape. That might be the thing we should be focusing on here.
lehs on December 7, 2024 7:35 pm

ChatGPT is a fantastic tool that will be will be misunderstod and misused. But ChatGPT can make mistakes and you should check important information.
traumfisch on December 7, 2024 7:40 pm

The tests have nothing much to do with ChatGPT, just the o1 model itself
Accurate_Return_5521 on December 7, 2024 8:11 pm

This is not real but it inevitably will be.

Our own brains are living proof that at some point this gigantic neural network’s will become self aware.

And the really frightening question is what happens when we are the second most intelligent species in this planet
MadRoboticist on December 7, 2024 8:49 pm

What a stupid headline. They did a roleplay experiment with an LLM and essentially prompted it to act like this.
jerseyhound on December 7, 2024 9:03 pm

I find it incredible how hard people are trying to make a statistical model sound sentient.

Tags

OpenAI’s new ChatGPT o1 model will try to escape if it thinks it’ll be shut down — then lies about it | Researchers uncover all kinds of tricks ChatGPT o1 will pull to save itself

21 Comments