Calling a mistake by an LLM (large language model, a system that predicts text from probabilities) a "hallucination" is marketing. The machine doesn't see things that aren't there; it computes the most probable word and sometimes the probable one is false. Nor does it lie, because lying requires intent and here there's no intent. The correct phrase would be statistical invention, but it doesn't sell. And the difference between what the fault is called decides who gets handed the bill.
Schwartz, Avianca and Six Precedents That Didn't Exist
A New York lawyer, Steven Schwartz, thirty years of practice behind him, filed in 2023 before Judge Castel, in the case Mata v. Avianca, a brief citing six judicial precedents. Varghese v. China Southern Airlines. Shaboon v. Egypt Air. Petersen v. Iran Air. Cases with docket numbers, signing judges, paragraphs quoted verbatim. Avianca's defense tried to locate them. It couldn't. Neither could the judge. Schwartz had used ChatGPT and, when he asked the system whether the cases were real, ChatGPT told him yes. The sanction fell in June. Five thousand dollars, withdrawal of the brief, public shame in newspapers half the world over.
The press headlined in unison. "AI hallucinations reach the courts."
That word, "hallucination," was already inside the headline as if it were a neutral term from a technical glossary. It isn't. It never was. The choice of that word and not another is one of the most effective commercial moves in the recent history of tech marketing, and Schwartz's brief paid the semantic bill the industry had issued a couple of years earlier.
Where the Word Comes From and Why It Doesn't Fit
The systematic review by Ji et al., Survey of Hallucination in Natural Language Generation, published in ACM Computing Surveys in 2023, tried at least to pin down the definition. What we call hallucination is, translated without rhetoric, linguistically coherent content unrelated to the input or to confirmed sources. The taxonomy speaks of intrinsic hallucination (when the model contradicts the information in front of it) and extrinsic hallucination (when it adds information that can't be verified against the corpus, that is, the set of texts the system was trained on). It also subdivides into factual errors and logical errors. When the specific reviews on large language models arrived, like the one Lei Huang and his co-authors circulated as a preprint in late 2023 and later published in a journal, the field had already decided the word would stay. It's a useful definition. It's also a definition that doesn't need the word "hallucination" at all to hold up.
Because the word comes from somewhere else. It comes from psychiatry.
A hallucination, in the clinical sense, is the perception of a sensory stimulus in the absence of the real stimulus. The patient hears voices that don't exist, sees figures that aren't there, smells things that aren't. There's a malfunctioning perceptual organ, a subject with subjective experience, an external world that doesn't match what's perceived. Three elements, all indispensable for the word to mean anything.
An LLM has no perceptual organ. It has no subjective experience. It has no external world to compare anything against. What it does is compute, given a context, the probability distribution over the next token (the minimal unit of text the model handles, usually a word or a word fragment) and sample from it. When the corpus it was trained on had a clear signal about the case Varghese v. China Southern Airlines, the real case comes out. When it didn't, the most probable continuation comes out. A plausible plaintiff name, a plausible airline name, a docket number in the correct format, a plausible jurisdiction. The computation is exactly the same in both cases. The system doesn't switch modes. It doesn't "disconnect from reality" because it was never connected to it. There's nothing to hallucinate because there's no perception.
What there is is statistical invention. The most probable continuation, when the settled probability is low, becomes indistinguishable from confabulation. And the word "invention" has the enormous disadvantage, from the point of view of whoever sells the product, that it describes what the product does. Bender, Gebru, McMillan-Major and Shmitchell stated it already in On the Dangers of Stochastic Parrots (2021): a system trained only on form doesn't touch meaning, and the humans who read it fill that absence with our own. Melanie Mitchell, in Why AI Is Harder Than We Think (2021), recalled besides that the sector has spent decades confusing demonstration performance with real comprehension, and that every generation has tripped on the same step a little higher up.
What You Gain by Choosing the Other Word
Saying "hallucination" humanizes the system. It lends it an organ it doesn't have. It suggests the problem is exceptional, comparable to the occasional delirium of a sane mind that generally functions. A sensible person doesn't hallucinate all the time. They have episodes.
If we carry that frame over to the model, the model is someone sensible who now and then goes off. Which implies that the rest of the time it's sane, perceives correctly, tells the truth. The metaphor does almost all the rhetorical work without anyone having to argue for it.
Saying "statistical invention" allows none of those loans. It describes the mechanism. And describing the mechanism forces you to accept that there are no episodes: there's a continuous operation of probabilistic sampling whose result sometimes matches reality and sometimes doesn't, without the system having any internal lever to tell the two situations apart. The difference between hit and miss, from inside the model, doesn't exist. Answering "Madrid is the capital of Spain" looks the same from inside as answering "Madrid is the capital of Portugal." Both sentences are the most probable continuation the model found given its context. One matches the world. The other doesn't. But that match isn't a property of the sentence. It's a property of the corpus and of the distribution that produced it.
Why Saying It Right Would Be Expensive
Admitting this in public is commercially devastating. If what the system does is sample probabilities over tokens, with no access to the truth, selling the product as an "assistant that answers questions" becomes awkward. Because "answering a question" implies knowing, and here there's no knowing, there's a statistical distribution. If instead what the system does is "sometimes hallucinate," then most of the time the system genuinely answers and the problem is reduced to an intermittent fault, almost cosmetic, blamable on future versions that will fix it. The first frame means admitting the product isn't what's being sold. The second lets you keep selling it.
The choice of word wasn't accidental. It was precise.
Why It Isn't a Lie Either
There's a symmetric temptation, above all among critics of the sector, to say that what the LLM does, then, is lie. The word sounds fair, because the harm an invented case produces in a court brief is the same harm a human lie would produce. Schwartz ended up just as sanctioned whether or not the maneuver was deliberate. The victim of a false diagnosis produced by a system ends up just as dead or maimed as if the doctor had lied knowingly. From the receiver's point of view, the effect is indistinguishable.
And yet the word doesn't fit.
Lying requires intent to deceive. It requires a subject who knows what the truth is and chooses to say something else. It requires a calculation about the listener's mental state and a will to manipulate them. None of that is in the model. The model doesn't know its output is false, doesn't know what knowing is, has no representation of the listener as a mind to steer toward. To say it lies is to lend it, in the opposite direction, exactly the attributes the word "hallucination" also lends it. Both metaphors humanize. One in a benevolent key, the other in an accusatory one. Both are false for the same reason.
Frankfurt and the Bullshitter With No Loyalty to Truth
Harry Frankfurt, in his little book On Bullshit, opened twenty years ago a distinction that fits here like a glove. The lie and the bullshit (literally, talk produced without caring whether it's true or not) aren't the same thing. The liar knows the truth and conceals it. He cares about the truth, even if it's to betray it. The bullshitter has no relation to the truth. He speaks with the sole aim of producing an effect on the listener, and the truth or falsity of what he says is accidental to that aim. For Frankfurt, bullshit is more corrosive than the lie, because the lie still recognizes truth as a reference and bullshit dissolves it.
An LLM, in that vocabulary, doesn't lie. It produces bullshit in its pure state. The output is optimized to produce an effect, not to correspond to anything. The distinction matters because it changes the kind of problem we're facing. A society can defend itself against liars by identifying them. It can't defend itself the same way against a system that has no position regarding what it says. There's no betrayal to reproach, because there was no initial loyalty.
The Legal Bill of the Word
Here is where the vocabulary stops being an academic matter and becomes a legal decision in disguise.
If the system "hallucinates," the fault lies with the user who didn't check. Schwartz didn't check. Schwartz asked the model whether the cases were real and settled for the answer. Bad professional practice, a clumsy decision, a lack of due diligence. The sanction falls on the lawyer, not on OpenAI. And in a way it's reasonable, because a legal professional has to verify sources. But the underlying reason the sanction falls only there, and the reason the manufacturer's liability hasn't even been seriously discussed, is that the word "hallucination" has pre-loaded the frame. The fault is the one who trusted, not the one who sold a product that produces fictions with the same composure with which it produces facts.
If You Change the Word, the Defendant Changes
Change the word. Substitute "statistical invention" for "hallucination." Read the case again. ChatGPT invented six judicial precedents with the appearance of being real and handed them to the lawyer as if they were real. The lawyer asked whether they were real and the system invented the confirmation that they were.
Who sold a defective product? What obligation did the manufacturer have to label it as what it is before putting it in the hands of professionals?
The difference between the two frames is the difference between a sanctioned user and a mass-sued manufacturer. The difference between a tolerable externality and a gigantic regulatory cost. The difference between keeping the product on sale as is and having to redesign or label it so that the Schwartz case couldn't have happened. The word is the border between those two worlds. That's why the industry chose it, repeated it, slipped it into the papers, dropped it into the headlines, until it stuck as if it had always been there.
Gary Marcus has been saying it for years from his newsletter and is treated as that tiresome old man who doesn't get it. It's not that he doesn't get it. It's that he got it sooner and points at the exact spot where the language cheats. His insistence isn't a terminological tic. It's the only way to keep alive the question the word was designed to close.
The Receiver Who Pays the Difference
The product's victims don't need to know the word to feel the bill.
The patient a system suggests a wrong dose to, based on an invented study with the impeccable format of a medical citation, doesn't end up any less poisoned because someone calls that a hallucination. The candidate whose résumé, generated by an aid tool, includes a nonexistent degree doesn't end up any less fired when HR verifies it. What changes, depending on which word is used, is who is held to account. The patient who should have consulted a human. The candidate who should have reviewed the document. Always the receiver, never the producer. The word "hallucination" assigns the responsibility before anyone discusses the responsibility. It does the defense's work before there's a trial.
Meanwhile, ever more fluent models are released, with fewer markers of doubt in the output. Fluency has improved more than reliability, and the surface confidence of the answer has grown faster than the quality of the content. The product sounds ever more sane. And the saner it sounds, the more the user is loaded with the obligation to discover that it keeps inventing with the same ease as ever, only now with better grammar.
The word the industry chose wasn't to describe the phenomenon. It was to assign the blame. It has been assigning it with notable efficiency for ten years, while those of us who pay the difference keep writing headlines with the word inside, as if it were neutral, as if it had been put there by nature and not by a corporate communications department whose job it was to decide what the thing that was going to fail would be called.
Definitions
LLM (Large Language Model). A system trained on large quantities of text that, given a context, computes the probability of the next fragment and produces output by sampling from that distribution. It doesn't verify facts. It has no access to a representation of the world separate from the text.
Token. The minimal unit of text the model handles internally. Usually a word, a word fragment or a sign. The model's predictions are produced token by token, not sentence by sentence.
Corpus. The set of texts the model was trained on. What the model "knows" is a statistical property of that corpus, not of the world.
Intrinsic hallucination. Output that contradicts information present in the input given to the model. It has material in front of it and deviates anyway.
Extrinsic hallucination. Output that adds information that can't be verified against any source in the corpus. The model fills the gap with a plausible continuation.
Statistical invention. A non-metaphorical description of the same phenomenon. The most probable continuation, when the training signal is weak or ambiguous, resembles a fact but corresponds to none.
Bullshit (in Harry Frankfurt's sense). Speech produced without interest in the truth or falsity of what's said, oriented solely toward the effect on the listener. Distinct from the lie, which does presuppose a recognized and concealed truth.
References
Ji, Z. et al., Survey of Hallucination in Natural Language Generation, ACM Computing Surveys 55, Article 248, 2023. arXiv:2202.03629 (https://arxiv.org/abs/2202.03629). The source of the technical definition of the phenomenon and of the intrinsic/extrinsic taxonomy used in the article.
Huang, L. et al., A Survey on Hallucination in Large Language Models. Principles, Taxonomy, Challenges, and Open Questions. Preprint at arXiv:2311.05232 (November 2023, https://arxiv.org/abs/2311.05232), later published in ACM Transactions on Information Systems (2025), DOI 10.1145/3703155 (https://dl.acm.org/doi/10.1145/3703155). The specific review on large language models cited in the article.
Bender, E., Gebru, T., McMillan-Major, A., Shmitchell, S., On the Dangers of Stochastic Parrots, FAccT 2021, DOI 10.1145/3442188.3445922 (https://dl.acm.org/doi/10.1145/3442188.3445922). The background for the argument on what a language model does and doesn't do when it produces text.
Mata v. Avianca, Inc., 22-cv-1461 (PKC), Southern District of New York, 2023. The case cited at the start of the article. Judicial opinion at https://www.courtlistener.com/docket/63107798/mata-v-avianca-inc/.
Marcus, G., newsletter Marcus on AI, https://garymarcus.substack.com. The sustained critique of the use of the term "hallucination" referred to in the section on the word's legal bill.
Mitchell, M., Why AI Is Harder Than We Think, arXiv:2104.12871, 2021. General context on the distance between what's sold and what the system does.
Frankfurt, H. G., On Bullshit, Princeton University Press, 2005. The source of the distinction between the lie and bullshit used in the section "Why It Isn't a Lie Either."
También te interesa
- Recognizing Is Not Understanding, and the Border Between the Two Is Invisible
- Machines That Seem to Think. ELIZA, Sixty Years On
- Thought and Language. Speaking Well Is Not the Same as Thinking Well
No comments yet
No comments yet. Be the first.
Leave a comment