The more advanced AI models get, the better they are at deceiving us — they even know when they're being tested

Futurology

The more advanced AI models get, the better they are at deceiving us — they even know when they’re being tested

July 27, 2025

View 2 Comments

2 Comments

MetaKnowing on July 27, 2025 9:44 am

“The more advanced AI gets, the more capable it is of scheming and lying to meet its goals — and it even knows when it’s being evaluated, research suggests.

“We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intentions,” the researchers said in a [blog post](https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming).

What’s more, preliminary findings suggest that LLMs have capabilities for “sandbagging,” where they understand they are in a high-supervision environment and will deliberately underperform to hide potentially dangerous capabilities and avoid triggering unlearning training or parameters that prevent their deployment.”
krichuvisz on July 27, 2025 9:51 am

We are creating so many problems that never existed before.