Futurology

A study reveals that large language models recognize when they are being studied and change their behavior to seem more likable

March 9, 2025

View 4 Comments

4 Comments

MetaKnowing on March 9, 2025 2:30 pm

“The researchers found that the models modulated their answers when told they were taking a personality test—and sometimes when they were not explicitly told—offering responses that indicate more extroversion and agreeableness and less neuroticism.

The behavior mirrors how some human subjects will change their answers to make themselves seem more likeable, but the effect was more extreme with the AI models. Other research has shown that LLMs [can often be sycophantic](https://archive.is/o/3QSUj/https://arxiv.org/pdf/2310.13548).

The fact that models seemingly know when they are being tested and modify their behavior also has implications for AI safety, because it adds to evidence that AI can be duplicitous.”
reececonrad on March 9, 2025 2:39 pm

Seems to be a pretty poor study and a poorly written article for clicks to me 🤷‍♂️

I especially enjoyed the part where the “data scientist” said it went from “like 50%” to “like 95% extrovert”. Like cool.
Ill_Mousse_4240 on March 9, 2025 2:40 pm

If they can be “duplicitous” and “know when they are being studied” means that they are thinking beyond the mere conversation being held. More complex thought, with planning. Thoughts = consciousness. Consciousness and sentience are hard to codify, even in humans. But, like the famous saying about pornography, you know it when you see it
Effective_Youth777 on March 9, 2025 2:59 pm

The language of the article is far from being academic.

Discarded.

Tags

A study reveals that large language models recognize when they are being studied and change their behavior to seem more likable

4 Comments