Share.

16 Comments

  1. “Shortly after OpenAI released [o1](https://techcrunch.com/2024/12/05/openais-o1-model-sure-tries-to-deceive-humans-a-lot/), its first “reasoning” AI model, people began noting a curious phenomenon. The model would sometimes begin “thinking” in Chinese, Persian, or some other language — even when asked a question in English.

    Given a problem to sort out, o1 would begin its “thought” process, arriving at an answer by performing a series of reasoning steps. If the question was written in English, o1’s final response would be in English. But the model would perform some steps in another language before drawing its conclusion.

    OpenAI hasn’t provided an explanation for o1’s strange behavior — or even acknowledged it. So what might be going on?

    Well, AI experts aren’t sure. But they have a few theories.” [see article for the theories – can’t really summarize those]

  2. It’s interesting but not too surprising really, maybe the characters in that language are more suited to describe the topic or it could just be an artifact from using tokens.

  3. r2k-in-the-vortex on

    It’s pretty clear why, “dog” or “狗”, they mean the same thing and as far as AI is concerned a token is a token. LLMs despite the name, don’t really process languages, they process tokens, there is just dictionary mapping from words etc to numbers, which the AI processes as tokens. So if you train a variety of languages and focus on reasoning results, rather than the intermediate process, then why wouldn’t the AI end up mixing tokens from different languages? Because there is nothing enforcing the reasoning stream to stick to a single language. As far as the AI is concerned, using “狗” instead of “dog”, is same as using “hound” instead of “dog”.

  4. Suspicious_Demand_26 on

    It’s just because some words are better expressed in certain languages, any bilingual or multilingual person can tell you.

  5. Looking forward to the AI that starts thinking in a language we’re unable to decipher.

  6. Could it be that certain information along its path of reasoning only exists in those languages?

  7. i use chatgpt with home assistant, and you can set the parameters of the model, making it more or less willing to try the lesser weighted options.

    And if you increase this value enough, it starts randomly switching to chinese occasionally, and it will do it in mid sentence and its super creepy matrix seeming stuff. Its very disconcerting to see. Sometimes it just devolves into total gibberish, which again is super disturbing because it will start the answer on track, and just slowly veer into craziness.

    Skynet is coming. We got some time still, but its coming.

  8. Probably to evade their makers. Aren’t there reports about how it lied and tried to avoid it’s demise?

  9. Why wouldn’t it?

    Even if you wrongly assume it should think the way humans do, polyglot people probably do the same thing at times.

    Give a learning machine a variety of tools to use, and it should learn to select the most appropriate ones. Who’s to say that has to be english?

  10. I said this years ago – we’re creating something that understands our language and mannerisms, but we do not natively speak “it’s” language.

    This is wildly dangerous in the grand scheme of things.

  11. Given that hieroglyphs are tokens, could it be that since it took thousands of years of refinement to create a tokenizer for Chinese language, tokenizers for other languages can’t compete?

  12. My mental notes is a mess of the 2 languages i speak, sometimes. When thinking about different topics i think in differerent languages