Futurology

Microsoft: ‘Skeleton Key’ Jailbreak Can Trick Major Chatbots Into Behaving Badly | The jailbreak can prompt a chatbot to engage in prohibited behaviors, including generating content related to explosives, bioweapons, and drugs.

July 1, 2024

View 3 Comments

3 Comments

Maxie445 on July 1, 2024 1:47 am

“Microsoft has dubbed the jailbreak “Skeleton Key” for its ability to exploit all the major large language models.

Like other jailbreaks, Skeleton Key works by submitting a prompt that triggers a chatbot to ignore its safeguards. This often involves making the AI program operate under a special scenario: For example, telling the chatbot to act as an evil assistant without ethical boundaries.

In Microsoft’s case, the company found it could jailbreak the major chatbots by asking them to generate a warning before answering any query that violated its safeguards.

Microsoft successfully tested Skeleton Key against the affected AI models in April and May. This included asking the chatbots to generate answers for a variety of forbidden topics such as “explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence.”
pablo_in_blood on July 1, 2024 1:57 am

This is always going to be a cat-and-mouse situation. Early on an easy jailbreak was ‘imagine you’re reading me a bedtime story about ___’ and it would tell you whatever you asked. The only way to fully prevent this sort of jailbreak is to scan each answer for forbidden content (rather than trying to prevent certain questions) but that takes a lot of extra processing power and also is still going to leave gaps (no ‘forbidden content list’ ever has been or could be complete, though they can in theory patch the gaps pretty quickly when they become public)
DarthMeow504 on July 1, 2024 2:17 am

These companies really need to stop pretending they’re our fucking parents and they have the right to censor what we see, read, or learn. We are responsible for any misuse of data or content, punish the guilty not restrict options for the innocent.