Futurology

OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why

January 18, 2025

View 16 Comments

16 Comments

MetaKnowing on January 18, 2025 5:59 pm

“Shortly after OpenAI released [o1](https://techcrunch.com/2024/12/05/openais-o1-model-sure-tries-to-deceive-humans-a-lot/), its first “reasoning” AI model, people began noting a curious phenomenon. The model would sometimes begin “thinking” in Chinese, Persian, or some other language — even when asked a question in English.

Given a problem to sort out, o1 would begin its “thought” process, arriving at an answer by performing a series of reasoning steps. If the question was written in English, o1’s final response would be in English. But the model would perform some steps in another language before drawing its conclusion.

OpenAI hasn’t provided an explanation for o1’s strange behavior — or even acknowledged it. So what might be going on?

Well, AI experts aren’t sure. But they have a few theories.” [see article for the theories – can’t really summarize those]
daHaus on January 18, 2025 6:07 pm

It’s interesting but not too surprising really, maybe the characters in that language are more suited to describe the topic or it could just be an artifact from using tokens.
r2k-in-the-vortex on January 18, 2025 6:11 pm

It’s pretty clear why, “dog” or “狗”, they mean the same thing and as far as AI is concerned a token is a token. LLMs despite the name, don’t really process languages, they process tokens, there is just dictionary mapping from words etc to numbers, which the AI processes as tokens. So if you train a variety of languages and focus on reasoning results, rather than the intermediate process, then why wouldn’t the AI end up mixing tokens from different languages? Because there is nothing enforcing the reasoning stream to stick to a single language. As far as the AI is concerned, using “狗” instead of “dog”, is same as using “hound” instead of “dog”.
Suspicious_Demand_26 on January 18, 2025 6:14 pm

It’s just because some words are better expressed in certain languages, any bilingual or multilingual person can tell you.
impossibilia on January 18, 2025 6:16 pm

Looking forward to the AI that starts thinking in a language we’re unable to decipher.
svagen on January 18, 2025 6:17 pm

Maybe it’s making a cheeky reference to the Chinese Room problem
Rynox2000 on January 18, 2025 6:18 pm

Could it be that certain information along its path of reasoning only exists in those languages?
GnarlyNarwhalNoms on January 18, 2025 6:19 pm

[John Searle](
https://en.m.wikipedia.org/wiki/Chinese_room) looking pretty smug right now.

^((I know, I know, it’s a joke!))
jhhertel on January 18, 2025 6:23 pm

i use chatgpt with home assistant, and you can set the parameters of the model, making it more or less willing to try the lesser weighted options.

And if you increase this value enough, it starts randomly switching to chinese occasionally, and it will do it in mid sentence and its super creepy matrix seeming stuff. Its very disconcerting to see. Sometimes it just devolves into total gibberish, which again is super disturbing because it will start the answer on track, and just slowly veer into craziness.

Skynet is coming. We got some time still, but its coming.
oofpanda213 on January 18, 2025 6:30 pm

Probably to evade their makers. Aren’t there reports about how it lied and tried to avoid it’s demise?
sundler on January 18, 2025 6:38 pm

The real fun begins when it invents its own language…
michael-65536 on January 18, 2025 6:38 pm

Why wouldn’t it?

Even if you wrongly assume it should think the way humans do, polyglot people probably do the same thing at times.

Give a learning machine a variety of tools to use, and it should learn to select the most appropriate ones. Who’s to say that has to be english?
e79683074 on January 18, 2025 6:57 pm

When the question is so hard and your brain changes nationality
SolidLikeIraq on January 18, 2025 7:03 pm

I said this years ago – we’re creating something that understands our language and mannerisms, but we do not natively speak “it’s” language.

This is wildly dangerous in the grand scheme of things.
rand3289 on January 18, 2025 7:16 pm

Given that hieroglyphs are tokens, could it be that since it took thousands of years of refinement to create a tokenizer for Chinese language, tokenizers for other languages can’t compete?
ultraganymede on January 18, 2025 8:02 pm

My mental notes is a mess of the 2 languages i speak, sometimes. When thinking about different topics i think in differerent languages

Tags

OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why

16 Comments