Given a problem to sort out, o1 would begin its “thought” process, arriving at an answer by performing a series of reasoning steps. If the question was written in English, o1’s final response would be in English. But the model would perform some steps in another language before drawing its conclusion.
OpenAI hasn’t provided an explanation for o1’s strange behavior — or even acknowledged it. So what might be going on?
Well, AI experts aren’t sure. But they have a few theories.” [see article for the theories – can’t really summarize those]
daHaus on
It’s interesting but not too surprising really, maybe the characters in that language are more suited to describe the topic or it could just be an artifact from using tokens.
r2k-in-the-vortex on
It’s pretty clear why, “dog” or “狗”, they mean the same thing and as far as AI is concerned a token is a token. LLMs despite the name, don’t really process languages, they process tokens, there is just dictionary mapping from words etc to numbers, which the AI processes as tokens. So if you train a variety of languages and focus on reasoning results, rather than the intermediate process, then why wouldn’t the AI end up mixing tokens from different languages? Because there is nothing enforcing the reasoning stream to stick to a single language. As far as the AI is concerned, using “狗” instead of “dog”, is same as using “hound” instead of “dog”.
Suspicious_Demand_26 on
It’s just because some words are better expressed in certain languages, any bilingual or multilingual person can tell you.
impossibilia on
Looking forward to the AI that starts thinking in a language we’re unable to decipher.
svagen on
Maybe it’s making a cheeky reference to the Chinese Room problem
Rynox2000 on
Could it be that certain information along its path of reasoning only exists in those languages?
i use chatgpt with home assistant, and you can set the parameters of the model, making it more or less willing to try the lesser weighted options.
And if you increase this value enough, it starts randomly switching to chinese occasionally, and it will do it in mid sentence and its super creepy matrix seeming stuff. Its very disconcerting to see. Sometimes it just devolves into total gibberish, which again is super disturbing because it will start the answer on track, and just slowly veer into craziness.
Skynet is coming. We got some time still, but its coming.
oofpanda213 on
Probably to evade their makers. Aren’t there reports about how it lied and tried to avoid it’s demise?
sundler on
The real fun begins when it invents its own language…
michael-65536 on
Why wouldn’t it?
Even if you wrongly assume it should think the way humans do, polyglot people probably do the same thing at times.
Give a learning machine a variety of tools to use, and it should learn to select the most appropriate ones. Who’s to say that has to be english?
e79683074 on
When the question is so hard and your brain changes nationality
SolidLikeIraq on
I said this years ago – we’re creating something that understands our language and mannerisms, but we do not natively speak “it’s” language.
This is wildly dangerous in the grand scheme of things.
rand3289 on
Given that hieroglyphs are tokens, could it be that since it took thousands of years of refinement to create a tokenizer for Chinese language, tokenizers for other languages can’t compete?
ultraganymede on
My mental notes is a mess of the 2 languages i speak, sometimes. When thinking about different topics i think in differerent languages
16 Comments
“Shortly after OpenAI released [o1](https://techcrunch.com/2024/12/05/openais-o1-model-sure-tries-to-deceive-humans-a-lot/), its first “reasoning” AI model, people began noting a curious phenomenon. The model would sometimes begin “thinking” in Chinese, Persian, or some other language — even when asked a question in English.
Given a problem to sort out, o1 would begin its “thought” process, arriving at an answer by performing a series of reasoning steps. If the question was written in English, o1’s final response would be in English. But the model would perform some steps in another language before drawing its conclusion.
OpenAI hasn’t provided an explanation for o1’s strange behavior — or even acknowledged it. So what might be going on?
Well, AI experts aren’t sure. But they have a few theories.” [see article for the theories – can’t really summarize those]
It’s interesting but not too surprising really, maybe the characters in that language are more suited to describe the topic or it could just be an artifact from using tokens.
It’s pretty clear why, “dog” or “狗”, they mean the same thing and as far as AI is concerned a token is a token. LLMs despite the name, don’t really process languages, they process tokens, there is just dictionary mapping from words etc to numbers, which the AI processes as tokens. So if you train a variety of languages and focus on reasoning results, rather than the intermediate process, then why wouldn’t the AI end up mixing tokens from different languages? Because there is nothing enforcing the reasoning stream to stick to a single language. As far as the AI is concerned, using “狗” instead of “dog”, is same as using “hound” instead of “dog”.
It’s just because some words are better expressed in certain languages, any bilingual or multilingual person can tell you.
Looking forward to the AI that starts thinking in a language we’re unable to decipher.
Maybe it’s making a cheeky reference to the Chinese Room problem
Could it be that certain information along its path of reasoning only exists in those languages?
[John Searle](
https://en.m.wikipedia.org/wiki/Chinese_room) looking pretty smug right now.
^((I know, I know, it’s a joke!))
i use chatgpt with home assistant, and you can set the parameters of the model, making it more or less willing to try the lesser weighted options.
And if you increase this value enough, it starts randomly switching to chinese occasionally, and it will do it in mid sentence and its super creepy matrix seeming stuff. Its very disconcerting to see. Sometimes it just devolves into total gibberish, which again is super disturbing because it will start the answer on track, and just slowly veer into craziness.
Skynet is coming. We got some time still, but its coming.
Probably to evade their makers. Aren’t there reports about how it lied and tried to avoid it’s demise?
The real fun begins when it invents its own language…
Why wouldn’t it?
Even if you wrongly assume it should think the way humans do, polyglot people probably do the same thing at times.
Give a learning machine a variety of tools to use, and it should learn to select the most appropriate ones. Who’s to say that has to be english?
When the question is so hard and your brain changes nationality
I said this years ago – we’re creating something that understands our language and mannerisms, but we do not natively speak “it’s” language.
This is wildly dangerous in the grand scheme of things.
Given that hieroglyphs are tokens, could it be that since it took thousands of years of refinement to create a tokenizer for Chinese language, tokenizers for other languages can’t compete?
My mental notes is a mess of the 2 languages i speak, sometimes. When thinking about different topics i think in differerent languages