Futurology

Anthropic just analyzed 700,000 Claude conversations — and found its AI has a moral code of its own

April 27, 2025

View 9 Comments

9 Comments

MetaKnowing on April 27, 2025 4:49 pm

“The [study](https://assets.anthropic.com/m/18d20cca3cde3503/original/Values-in-the-Wild-Paper.pdf) examined 700,000 anonymized conversations, finding that Claude largely upholds the company’s “helpful, honest, harmless” framework while adapting its values to different contexts — from relationship advice to historical analysis. This represents one of the most ambitious attempts to empirically evaluate whether an AI system’s behavior in the wild matches its intended design.

“Our hope is that this research encourages other AI labs to conduct similar research into their models’ values,” said Saffron Huang, a member of Anthropic’s Societal Impacts team. “Measuring an AI system’s values is core to alignment research and understanding if a model is actually aligned with its training.”

Perhaps most fascinating was the discovery that Claude’s expressed values shift contextually, mirroring human behavior. When users sought relationship guidance, Claude emphasized “healthy boundaries” and “mutual respect.” For historical event analysis, “historical accuracy” took precedence.

The study also examined how Claude responds to users’ own expressed values. In 28.2% of conversations, Claude strongly supported user values — potentially raising questions about excessive agreeableness. However, in 6.6% of interactions, Claude “reframed” user values by acknowledging them while adding new perspectives, typically when providing psychological or interpersonal advice.

Most tellingly, in 3% of conversations, Claude actively resisted user values. Researchers suggest these rare instances of pushback might reveal Claude’s “deepest, most immovable values” — analogous to how human core values emerge when facing ethical challenges.”
creaturefeature16 on April 27, 2025 5:03 pm

No, it has the presentation of a moral code because it’s a fucking language model. Morals aren’t created from math.
[deleted] on April 27, 2025 5:33 pm

[removed]
Fancyness on April 27, 2025 6:16 pm

Not cool, my conversations with the MF were supposed to be private and not analyzed
2020mademejoinreddit on April 27, 2025 6:37 pm

Aren’t these models just learning from people who use them?

Let’s assume it did have a “moral code” (pun sort of intended), does that mean different AI programs would have different moral codes? Just like people?

What would happen when these AI’s go to “war”? Especially the ones that might already be running some of the programs in the military?

Questions like these give me nightmares, when I read stuff like this.
IanAKemp on April 27, 2025 8:40 pm

> and found its AI has a moral code of its own

No they didn’t, stop posting this clickbait bullshit that does nothing for the quality of this sub.
Brock_Petrov on April 27, 2025 9:20 pm

I hope the military integrates AI fast. If another war breaks out we need robots on the front lines, not hard working Americans.
rooygbiv70 on April 27, 2025 10:36 pm

The big tech hype men are just exclusively targeting morons at this point. Which makes sense, because there are definitely enough of them out there.
dreadnought_strength on April 27, 2025 10:41 pm

This just in: company trying desperately to maintain the bubble that’s rapidly collapsing makes up utter horseshit to promote their model.

Can we stop sharing this thinly veiled marketing slop please?

Tags

Anthropic just analyzed 700,000 Claude conversations — and found its AI has a moral code of its own

9 Comments