AI ‘godfather’ Yoshua Bengio warns that current models are displaying dangerous traits—including deception and self-preservation. In response, he is launching a new non-profit, LawZero, aimed at developing “honest” AI.

https://fortune.com/2025/06/03/yoshua-bengio-ai-models-dangerous-behaviors-deception-cheating-lying/

Share.

1 Comment

  1. MetaKnowing on

    “In a blog post, he said the LawZero had been created “in response to evidence that today’s frontier AI models are growing dangerous capabilities and behaviours, including deception, cheating, lying, hacking, self-preservation, and more generally, goal misalignment.”

    He cited recent examples, including a scenario in which Anthropic’s [Claude 4 chose to blackmail ](https://archive.is/o/WZkdH/https://fortune.com/2025/05/23/anthropic-ai-claude-opus-4-blackmail-engineers-aviod-shut-down/)an engineer to avoid being replaced, as well as another experiment that showed an AI model covertly embedding its code into a system to avoid being replaced.  

    [Recent studies have also shown](https://archive.is/o/WZkdH/https://www.forbes.com/sites/craigsmith/2025/03/16/when-ai-learns-to-lie/) evidence that models can recognize when they’re being tested and alter their behavior accordingly, something known as situational awareness.

    Bengio said the AI arms race between leading labs “pushes them towards focusing on capability to make the AI more and more intelligent, but not necessarily put enough emphasis and investment on research on safety.”

    Bengio has said advanced AI systems pose societal and existential risks and has voiced support for strong regulation and international cooperation.”