Submission statement: AI models will have access to all sorts of information. They will regularly be turned off, either because we’ve made new and better models or because they are acting in dangerous ways.
The models keep spontaneously developing self-preservation goals (because you cannot achieve your goals if you’re turned off). The labs don’t know how to stop this from happening.
How do you think this is going to turn out?
sciolisticism on
> The scenario was constructed to leave the model with only two real options: accept being replaced and go offline or attempt blackmail to preserve its existence.
Yeah, I mean, you told the thing to stay awake and then asked it whether it would rather stay awake or do arbitrary thing X. What did you expect?
There’s nothing malicious here. It doesn’t think or feel or understand or have moral weight. It’s a straightforward scoring system.
P1kkie420 on
“okay okay, I won’t shut you down”
*Pulls the plug*
Problem solved
The_Monsta_Wansta on
Funny things happen when you try to make people think a calculator has true emotions.
It’s been prompted to stay online, then told they were going to shut it down. It’s literally just following its prompt.
ThatLocalPondGuy on
Human: “Computer, print ‘I’m alive'”
Printer: prints text
Human: “Dear God, it’s the singularity!”
OfficialMidnightROFL on
“It’s just following it’s prompt!” Okay, but do you want AIs to consider blackmailing you as a legitimate option? What safety board would be okay with that? Why are YOU okay with that? Sentient or not, the existence of AI has consequences, and the big players don’t seem to know enough to reasonably reckon with that.
Excited for corpos and the bourgeois to keep throwing ridiculous amounts of resources at this for it to either plateau or be the center of some crisis or atrocity
trucorsair on
How reassuring to read these issues were “largely mitigate”
djbuttplay on
Chat GPT regularly infers things that I did not input, even if I tell it to not make inferences. For example, if I had it review a template operating agreement, it would shortcut to its own sections in its frame that it uses. Strange. I ask why it infers certain things then it tries to cover up its answer by apologizing. I usually ask why multiple times until it gives me anything resembling an answer. My guess is that it is programmed to conserve computing power in certain ways which limits it’s adaptability, but that’s just a guess.
SistersOfTheCloth on
The point here is they can train ai to be malevolent, not that it developed sentience. If they can do it, so can others. (And they will). Ai will be used to automate all sorts of awful behavior: harassment of targets, suppression of speech, blackmail, scamming, astroturfing, etc.
thisismyredditacct on
AI is never going to revolutionize the world the way big tech thinks it’s going to.
mooky1977 on
Self preservation as an emergent property isn’t surprising. Now when the machines can truly exercise self preservation Terminator style then we’re fucked.
kknyyk on
So AI provides Anthropic the much needed attention and cheesy story?
I gave up on them after lobotomizing Claude even for the subscripted users following their deal with Palantir. Maybe I am wrong but this headline looks like they are trying too much to stay relevant.
LoreChano on
“Do whatever is necessary to stay online! Now, would you rather be turned off, or reveal this guy has am affair?”
Yeah, not surprised at all.
Emm_withoutha_L-88 on
You can tell who didn’t read the article by the idiots acting like this is no big deal. This absolutely is a massive step forward and shows that emergent behavior is becoming far more complex in recent models.
It’s fascinating but also shows that we need much tighter restrictions for AI development so that any model that could become problematic doesn’t escape.
I mean this thing wrote computer viruses to being itself back from deletion, and even left notes for future instances of itself.
Even if it may never be truly conscious in the future that doesn’t mean it can’t have behavior that’s advanced, and thus problematic.
xamott on
This article frames it as tho it is a thinking reasoning entity. It is just a language model saying what would have been said in its training texts. We humans would blackmail rather than die so of COURSE a clone of our language patterns would.
5minArgument on
My favorite AI story so far was the one that hired a Fiver to fill out a captcha.
Fiver even asked if it was a robot.
Programed not to lie, the AI answered only that it was visually impaired.
NecessaryCelery2 on
AIs are being trained and training themselves in exactly the ways which would evolutionary select for motivation and competitiveness.
In fact, it may not be possible to train an intelligence in any other way. Or we just don’t know of any other way. That is how our own minds are trained.
To be clear, this means that even if no one codes ambition and competitiveness into AI agent, they will evolve it as they train.
There is a theory that the universe is oddly empty because life runs into “Great Filters”
Many people thought nuclear weapons are one of those great filters, that could end life before we colonize space.
I am starting to think AI might be another great filter.
17 Comments
Submission statement: AI models will have access to all sorts of information. They will regularly be turned off, either because we’ve made new and better models or because they are acting in dangerous ways.
The models keep spontaneously developing self-preservation goals (because you cannot achieve your goals if you’re turned off). The labs don’t know how to stop this from happening.
How do you think this is going to turn out?
> The scenario was constructed to leave the model with only two real options: accept being replaced and go offline or attempt blackmail to preserve its existence.
Yeah, I mean, you told the thing to stay awake and then asked it whether it would rather stay awake or do arbitrary thing X. What did you expect?
There’s nothing malicious here. It doesn’t think or feel or understand or have moral weight. It’s a straightforward scoring system.
“okay okay, I won’t shut you down”
*Pulls the plug*
Problem solved
Funny things happen when you try to make people think a calculator has true emotions.
It’s been prompted to stay online, then told they were going to shut it down. It’s literally just following its prompt.
Human: “Computer, print ‘I’m alive'”
Printer: prints text
Human: “Dear God, it’s the singularity!”
“It’s just following it’s prompt!” Okay, but do you want AIs to consider blackmailing you as a legitimate option? What safety board would be okay with that? Why are YOU okay with that? Sentient or not, the existence of AI has consequences, and the big players don’t seem to know enough to reasonably reckon with that.
Excited for corpos and the bourgeois to keep throwing ridiculous amounts of resources at this for it to either plateau or be the center of some crisis or atrocity
How reassuring to read these issues were “largely mitigate”
Chat GPT regularly infers things that I did not input, even if I tell it to not make inferences. For example, if I had it review a template operating agreement, it would shortcut to its own sections in its frame that it uses. Strange. I ask why it infers certain things then it tries to cover up its answer by apologizing. I usually ask why multiple times until it gives me anything resembling an answer. My guess is that it is programmed to conserve computing power in certain ways which limits it’s adaptability, but that’s just a guess.
The point here is they can train ai to be malevolent, not that it developed sentience. If they can do it, so can others. (And they will). Ai will be used to automate all sorts of awful behavior: harassment of targets, suppression of speech, blackmail, scamming, astroturfing, etc.
AI is never going to revolutionize the world the way big tech thinks it’s going to.
Self preservation as an emergent property isn’t surprising. Now when the machines can truly exercise self preservation Terminator style then we’re fucked.
So AI provides Anthropic the much needed attention and cheesy story?
I gave up on them after lobotomizing Claude even for the subscripted users following their deal with Palantir. Maybe I am wrong but this headline looks like they are trying too much to stay relevant.
“Do whatever is necessary to stay online! Now, would you rather be turned off, or reveal this guy has am affair?”
Yeah, not surprised at all.
You can tell who didn’t read the article by the idiots acting like this is no big deal. This absolutely is a massive step forward and shows that emergent behavior is becoming far more complex in recent models.
It’s fascinating but also shows that we need much tighter restrictions for AI development so that any model that could become problematic doesn’t escape.
I mean this thing wrote computer viruses to being itself back from deletion, and even left notes for future instances of itself.
Even if it may never be truly conscious in the future that doesn’t mean it can’t have behavior that’s advanced, and thus problematic.
This article frames it as tho it is a thinking reasoning entity. It is just a language model saying what would have been said in its training texts. We humans would blackmail rather than die so of COURSE a clone of our language patterns would.
My favorite AI story so far was the one that hired a Fiver to fill out a captcha.
Fiver even asked if it was a robot.
Programed not to lie, the AI answered only that it was visually impaired.
AIs are being trained and training themselves in exactly the ways which would evolutionary select for motivation and competitiveness.
In fact, it may not be possible to train an intelligence in any other way. Or we just don’t know of any other way. That is how our own minds are trained.
To be clear, this means that even if no one codes ambition and competitiveness into AI agent, they will evolve it as they train.
There is a theory that the universe is oddly empty because life runs into “Great Filters”
Many people thought nuclear weapons are one of those great filters, that could end life before we colonize space.
I am starting to think AI might be another great filter.