Recognizing isn't understanding, and the line between the two can't be seen

Searle put a man in a room with a manual and proved you could converse in Chinese without understanding Chinese. Forty years later, Bender and Koller put an octopus between two undersea cables and proved the same with plain text. The experiment has changed animal, not question. To identify isn't to understand, the difference is invisible from outside, and that invisibility is exactly the place where human responsibility is being blurred today.

The Chinese room

There's a closed room. Inside is a man who doesn't know Chinese. Through a slot, papers with Chinese characters come to him. He has a huge manual in English, his mother tongue, telling him which symbols to return through the same slot when a given combination comes in. The man looks up, copies, returns. Outside, a native Chinese speaker holds an impeccable conversation with him and leaves convinced he's been talking with someone who understands Chinese.

The man inside hasn't understood a single word. He has manipulated symbols.

John Searle published this thought experiment in 1980, in Behavioral and Brain Sciences, and since then the debate over what it means to understand hasn't advanced as much as it seems. The Chinese room has spent forty-five years making strong-AI supporters and human-exceptionalism supporters equally uncomfortable. The first because it forces them to explain what's missing. The second because it forces them to explain what exactly they have in excess. Searle himself would expand the argument in The Rediscovery of the Mind (1992), insisting that syntax doesn't produce semantics however much it scales, and that understanding isn't an epiphenomenon of behavior but a causal property of the substrate. The Stanford Encyclopedia of Philosophy has since maintained an updated entry on the experiment, with replies and counter-replies, where it's plain to see that forty-five years of debate haven't moved the center of the question.

The octopus forty years later

Forty years later, in 2020, Emily Bender and Alexander Koller proposed at ACL another scene. An extremely intelligent octopus, a hyper-octopus, intercepts the undersea cables connecting two islands. On the islands there are two people conversing in writing through the cable. The octopus knows no English and knows nothing about the world of the islands. It only reads patterns. In time it learns to continue the conversation so well that when one of the humans tires, the octopus can replace him without the other noticing. Until one day the human asks the "other" how to build a catapult out of coconuts and rope to defend himself against a polar bear that has arrived swimming. The octopus, which has never seen a coconut, or a rope, or a bear, or a catapult, answers something that sounds plausible.

The conversation breaks down right there.

The octopus argument and the Chinese room say the same thing forty years apart. To identify isn't to understand. A system can operate on form impeccably without ever having touched what the form refers to. The sophistication of the external behavior doesn't imply that inside there's anything resembling understanding.

The border is invisible and that's why it's dangerous

If you could put your head inside the man in the room, you'd know he doesn't understand Chinese. But you can't. From outside, all you have are the outputs. And the outputs are perfect. It passes the Turing test. The Chinese interrogator goes home convinced. There's no external experiment, no behavioral test, no benchmark (a standardized test bed for comparing systems) that can distinguish between a system that understands and a system that recognizes patterns well enough to seem to understand.

That's the part people don't want to take on.

ELIZA, or how little it takes to produce illusion

Joseph Weizenbaum proved it in 1966 with ELIZA, a program of fewer than two hundred lines that imitated a Rogerian psychotherapist by reflecting the user's sentences with elementary templates. If you said "I'm sad," ELIZA replied "why are you sad?". If you said "my mother hates me," ELIZA replied "tell me more about your family." It was pure syntactic substitution. Weizenbaum built the program as a demonstration of how trivial it was to simulate conversation. And he discovered, horrified, that his own secretary, knowing it was a program, asked Weizenbaum to leave the room because the conversation with ELIZA was intimate.

It didn't take waiting for current language models for a system that understands absolutely nothing to pass for something that seems to understand. It took waiting for '66.

The technical distance between ELIZA and current LLMs (large language models, neural networks with billions of parameters trained on massive amounts of text) is astronomical. The conceptual distance, not so much. Both systems operate on form. The second does so with parameter tables several orders of magnitude larger, with incomparably richer intermediate representations, and with a generalization capacity ELIZA never dreamed of. But the question Searle posed in '80 remains intact. At what point in the leap of complexity does the system go from recognizing patterns to understanding what it says? And the honest answer, after forty-five years of literature, is that nobody knows.

Maybe because that point doesn't exist.

Where the mirror cracks

If you left the matter in purely philosophical terms, you'd stay in a loop. What's interesting is that the difference becomes visible in concrete places. Not always. Not reliably. But there are zones where a system that recognizes without understanding gives itself away.

Self-referential paradoxes are one. Ask a language model to evaluate a sentence that asserts its own falsity and watch what it does. It usually gives an articulate, fluent answer, citing Tarski or Russell, and yet doesn't enter the cognitive loop of whoever reads it. The system describes the paradox with the same ease with which it would describe an omelette recipe. There's no discomfort. There's none of the small mental tug you feel when you try to hold both things in your head at once. The system recognizes the pattern "liar's paradox" and emits the associated text. The human understands and gets stuck.

The difference is precisely that getting stuck.

Irony and frame change

Irony is another. Current models detect declared irony, classic irony, textbook irony. What they don't detect well is situational irony, the kind that depends on knowing what's considered obvious in this concrete group at this concrete hour. If in a conversation among friends one says "what a professional, the guy at the bank" after a teller's blunder, understanding the irony requires knowing that the teller is incompetent, that the speaker is polite, that they're annoyed, and that in that register "what a professional" means the opposite of what it seems. The model can get it right in many cases, but it fails in zones where the human doesn't, and it fails with the kind of confidence that stuns. Recognizing the linguistic marker of irony isn't understanding the social operation the irony is performing.

The frame change is one too. Tell a story where for ten sentences the context is medical, and in sentence eleven it changes to sport without warning. A human reader follows the change because they understand the global sense and readjust. A recognition system keeps dragging medical vocabulary, predicts hospital words, gets disoriented. Marcus and Davis gather dozens of these cases in Rebooting AI, and so do the developers of the models themselves, who call context drift (contextual drift: progressive loss of the active frame's thread over a long sequence) what Searle would call confirmation that the Chinese room is inside the model, locked up with its manual.

Is there a transition or not?

Here's the question that matters, and it's where Searle, Hofstadter and Chalmers haven't agreed in forty-five years.

Searle says no. That however much you scale the system, manipulating symbols will still be manipulating symbols, and understanding is something categorically distinct, associated with the biological causality of the brain. The position is called biological naturalism (the thesis that mental states are caused by physical processes of the brain and can't be replicated in purely formal substrates) and it's elegant, hard, and almost impossible to prove outside introspection.

Three positions that don't converge

Hofstadter, in Gödel, Escher, Bach and thirty years later in I Am a Strange Loop, holds that understanding emerges when a system develops representations of itself recursive enough for the loop to close. Chalmers, more cautious, leaves the door open to understanding being a physical phenomenon that certain substrates produce and others don't, without knowing what the difference is.

The only thing the three agree on is that there's no external test that distinguishes. The question isn't one of behavior. It's one of internal nature. And since we can't put our head inside the system, we're left judging by the outputs, exactly like the Chinese speaker in Searle's room.

This leaves the matter in a place few want to look at head-on. If the border between recognizing and understanding can't be observed from outside, then any operational criterion we use to decide whether a system understands is a criterion about appearance, not about the thing. And criteria about appearance are exactly the ones ELIZA and the Chinese room were designed to make fall.

The octopus closes the form-vs-meaning debate

Bender and Koller didn't propose the octopus as an academic exercise. They proposed it because by 2020 it was already clear that large language models were going to take that debate to the limit. A system trained only on text, with no contact with the world, no body, no consequences, can continue conversations fluently for hours and produce the perfect illusion of understanding. The octopus experiment proves, in analytic terms, that this fluency doesn't require understanding. Form is enough on its own to seem like meaning, as long as the evaluator doesn't introduce a situation where the referent truly matters.

The operational consequence is uncomfortable. When a language model writes you a perfectly polite email for your boss, you don't know whether it understands what it says or is reproducing the form of a polite email. And from outside you have no way of knowing, unless the email reaches a place where the form fails. In most cases it doesn't fail, because most situations admit formulated responses. And the cases where the form does fail are exactly the cases in which you'd have written the email differently anyway.

The system's fluency is at its maximum in the zones where you'd have been fluent. In the zones where you'd have stumbled, the system stumbles too, only with more confidence.

The part nobody signs

The serious thing isn't the philosophical debate. The serious thing is that we delegate.

The ceremonial signature

More and more decisions pass through systems that seem to understand. Assisted medical diagnoses, candidate evaluation, drafting of legal reports, customer service in crisis moments, psychological-support conversations. In all those contexts there's a human who signs at the end, or there's a human who should sign. But the signature is becoming ceremonial. The human clicks accept because the text is well written and because the system gets it right nine times out of ten, the times the case is standard.

The tenth time, the case isn't standard.

And then what happened all the time in the Chinese room, and the Chinese speaker outside didn't notice, happens. The system answers correctly from the point of view of form and catastrophically from the point of view of substance. The patient with atypical symptoms diagnosed with the flu. The candidate whose résumé is odd in format but excellent in substance and whom the filter discards. The customer on the verge of collapse answered with the administrative-complaint template. The human signature validates the text. Nobody signs the understanding that's missing underneath.

This isn't a technical problem that more data or more layers will solve. It's the problem Searle posed in '80 and that's still there because it's a problem about what machines are. Bender, Gebru, McMillan-Major and Shmitchell posed it with operational bite in On the Dangers of Stochastic Parrots (2021): a stochastic parrot, however well trained, still doesn't know what it says, and that, at industrial scale, stops being an epistemological problem and becomes a labor, health and legal one. If the transition from recognizing to understanding doesn't exist, the decisions we delegate to systems that seem to understand are all in a gray zone. And the gray zone is the place where responsibility blurs without anyone signing.

Look at the last conversation you had with an artificial-intelligence system. Ask yourself, sentence by sentence, whether the answer was an answer or was the form of an answer. If you can't answer that with confidence about a single one of the sentences, you've reached the point where Searle wanted you to reach.

And that doesn't get fixed by turning off the computer.

Definiciones

Chinese room. A thought experiment proposed by John Searle in 1980. A person who doesn't know Chinese, locked in a room with a rulebook, can hold a written conversation in Chinese indistinguishable from a native speaker's by manipulating symbols without understanding them. The argument serves to hold that the formal manipulation of symbols isn't equivalent to understanding.

Octopus argument. A thought experiment proposed by Emily Bender and Alexander Koller in 2020. A hyper-intelligent octopus intercepts the undersea cables over which two people converse and learns to continue the conversation without ever having seen the world the words refer to. It illustrates that a system trained only on the form of language can simulate understanding until a situation appears where the world's referent matters.

Turing test. A test proposed by Alan Turing in 1950, by which a machine can be considered intelligent if a human interrogator, conversing in writing, can't distinguish it from another person. The test measures external behavior, not internal content, and that's why it's overrun by the arguments of Searle and Bender.

LLM. Large language model. A neural network with billions of parameters trained to predict the next word in a sequence from massive amounts of text. Current LLMs are the industrial embodiment of the problem the Chinese room poses.

Benchmark. A standardized test bed for comparing the performance of different systems on the same tasks. In artificial intelligence, benchmarks measure external conduct and are therefore insufficient to distinguish between recognizing and understanding.

Context drift. Contextual drift. The progressive loss of the active thematic frame over a long sequence. A language model that keeps predicting medical-field words when the conversation has already changed to sport is suffering context drift.

Biological naturalism. John Searle's philosophical position by which mental states are caused by physical processes of the brain and can't be produced in purely formal substrates however much they replicate external conduct.

Referencias

Searle, J. R., Minds, Brains, and Programs, Behavioral and Brain Sciences 3 (1980), 417–457. The original Chinese room text, which articulates the central argument of the article.

Bender, E. M. and Koller, A., Climbing towards NLU. On Meaning, Form, and Understanding in the Age of Data, ACL 2020, https://aclanthology.org/2020.acl-main.463/. The article where the octopus experiment is introduced.

Weizenbaum, J., ELIZA—A Computer Program for the Study of Natural Language Communication Between Man and Machine, Communications of the ACM 9 (1966), 36–45. A description of the ELIZA program and a reflection on the ease with which a system without understanding produces the illusion of understanding.

Searle, J. R., The Rediscovery of the Mind, MIT Press, 1992. A later development of biological naturalism and of Searle's position on the difference between form and understanding.

Hofstadter, D., Gödel, Escher, Bach. An Eternal Golden Braid, Basic Books, 1979. Source of the idea that understanding emerges in systems with sufficiently dense self-referential loops.

Marcus, G. and Davis, E., Rebooting AI, Pantheon, 2019. A critical compilation of systematic failures of current models in frame changes and in situations where the form of language isn't enough.

Bender, E. M., Gebru, T., McMillan-Major, A. and Shmitchell, S., On the Dangers of Stochastic Parrots, FAccT 2021, https://dl.acm.org/doi/10.1145/3442188.3445922. A critical analysis of large language models as systems that reproduce form without understanding.

The Chinese Room Argument, Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/chinese-room/. An updated panorama of the academic debate around Searle's argument, with replies and counter-replies.

Para profundizar

Dreyfus, H. (1992). What Computers Still Can't Do. A Critique of Artificial Reason. MIT Press. A classic philosophical critique of the strong computational program, complementary to Searle's argument from a different tradition.

También te interesa

En otros sitios

#intelligence #anthropomorphism #reasoning #hallucinations #papers