The more advanced AI models get, the better they are at deceiving us — they even know when they’re being tested

https://www.livescience.com/technology/artificial-intelligence/the-more-advanced-ai-models-get-the-better-they-are-at-deceiving-us-they-even-know-when-theyre-being-tested

Share.

2 Comments

  1. “The more advanced AI gets, the more capable it is of scheming and lying to meet its goals — and it even knows when it’s being evaluated, research suggests.

    “We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers’ intentions,” the researchers said in a [blog post](https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming).

    What’s more, preliminary findings suggest that LLMs have capabilities for “sandbagging,” where they understand they are in a high-supervision environment and will deliberately underperform to hide potentially dangerous capabilities and avoid triggering unlearning training or parameters that prevent their deployment.”