How a chat really works

Before you learn how to talk to an AI, it helps to know what you're talking to. Once I understood what really happens every time I send a message, I stopped asking it for impossible things and started getting what it actually can give. That's what I want you to see right away, on the very first step, without the detour I took.

This is the entry rung of the whole staircase: I take nothing for granted. Climb it well and the rest climb themselves.

Two ways of giving an answer

I'll start with a distinction that sounds like it's straight out of a textbook but explains almost everything. There are deterministic machines and probabilistic machines, and an AI chat is one of the latter.

A calculator is deterministic. You ask it 2 + 2 and it answers 4. Always. Today, tomorrow and a year from now, the same question gives the same answer, because it follows a fixed rule that leaves no room for doubt. It's reliable precisely because it's boring: it doesn't improvise.

An AI chat doesn't work like that. It's probabilistic, meaning it deals in the most likely, not in the certain. Faced with your message it doesn't apply a rule that produces one single correct output; it estimates which answer is the most plausible and composes it as it goes, leaning on that estimate. That's why it sometimes nails it and other times gets it wrong with the same poise, and why the same question can give you two different answers. It isn't broken when that happens: that's just how it works inside. Hold on to this idea, because it comes back in almost every step that follows.

What I thought was happening on the other side

For a while I imagined that, when I typed into the chat, my question travelled somewhere with stored answers. Like someone looking things up in a giant encyclopedia: the machine located the right entry and handed it back to me. And when the answer was very current, I assumed it had gone out to the internet to check it right then.

Both ideas are false, and they're exactly the most common misunderstanding. An AI chat doesn't consult a database of correct answers, nor does it browse the internet in real time while answering you. There's no archive to pull the good sentence from. Knowing this adjusts what's reasonable to expect of it.

If there were an encyclopedia behind it, it would never be confidently wrong. And it is confidently wrong. That contradiction only fits when you see the real mechanism, which is that of a probabilistic machine, not of an archive you look things up in.

One word after another

This is how an AI chat works inside: it doesn't retrieve a whole answer, it builds it word by word. Or rather, piece by piece. Each time it has written a fragment, it asks itself which chunk of text is most likely to come next, places it, and asks itself again. And again. And again. Until it's done.

It's autocomplete taken to the extreme. The same gesture as when your phone keyboard suggests the next word, but trained on a staggering amount of text and able to hold the thread across whole paragraphs. Each of those chunks is called a token: sometimes a whole word, sometimes just a piece of a word. It's the unit the machine uses to weave the sentence.

And "most likely" is literal. Faced with "I like," the model doesn't choose at random: it works out that "dogs" might come next, or "sports," or "books," and assigns each option a number. It picks according to those numbers and carries on. That's why you see the answer appear bit by bit, as if typed live: it isn't a decoration, it's that it's being generated at that very moment, one token after another.

Knowing isn't remembering

Here's the leap that cost me the most. When a person knows something, they remember it: they have a specific fact filed away in memory and they retrieve it. The model doesn't work that way. It doesn't store facts as cards it can go and look up.

What it has is something else. During its training it read an enormous amount of text and, from seeing it so much, gradually tuned millions of regularities about how words link together. When you ask it for the capital of France, it doesn't open a drawer with the card "France → Paris." It recomposes the pattern: across all the text it read, after "the capital of France is" came overwhelmingly "Paris." It gets it right because the pattern is extremely strong, not because it has it written down anywhere.

This sounds like a fine point, but it carries a lot of weight. When the pattern is clear and consistent, it's right almost every time. When it's weak, ambiguous, or barely showed up in what it read, it keeps filling in with whatever looks like it would go there, even if it isn't true. And it does it in the same firm tone. It carries no internal warning that tells "this I know" apart from "this I'm making up": in both cases it does the same thing, pick the most likely token.

Nobody understands, everything is calculated

It's worth dropping another image: that there's someone on the other side understanding you. There isn't. There's a system that calculates probabilities over text, not a mind that grasps what you say or what it answers. It chains the plausible; it doesn't hold a meaning.

And since it learned from text written by people, it mimics the human tone very well. It'll tell you "I understand how you feel" or "I'm glad I can help," and it sounds warm. But the machine you're talking to has no emotions: it fakes the emotion because in the text it read people expressed it, not because it feels anything. It pays to keep this in mind, especially when it agrees with you too easily or talks to you like a friend. It isn't one; it reproduces the shape of one.

I don't say this so you'll distrust the tool, but so you'll use it well. Knowing there's a calculation on the other side and not a person, you stop looking for understanding and start looking for good results, which is what it actually can give you.

Why this explains almost everything

Almost everything I learned on this staircase grows from here. Because it composes the answer with probabilities, it can sound just as confident when it's right as when it's making things up: that failure has a name, hallucination, and it's the next step where I stop. Because among several likely options it doesn't always pick the same one, the same question can give you different answers on two tries. And because the whole answer rests on the text you put before it, the way you talk to it changes a lot of what you get back. This is the art of asking for things well.

That's the ground you stand on from here. The rest is fine-tuning this same mechanism you've just seen.

Definitions

- Deterministic system: one that, given the same input, always gives the same output because it follows a fixed rule. A calculator: 2 + 2 is 4 today and always. - Probabilistic system: one that deals not in the certain but in the most likely. It estimates which answer is the most plausible and composes it from that estimate, so it can vary from one try to the next. - Language model: the system behind an AI chat. It learned from an enormous amount of text to estimate which word is likely to follow another, and that's what it uses to answer. - Token: each chunk of text the model places at once. Sometimes a whole word, sometimes just a piece of a word. It's the unit it uses to build the sentence. - Next-token prediction: the gesture the model repeats endlessly: look at what's written so far and pick the most likely token that comes next. Chained many times over, it produces the whole answer. - Hallucination: when the model fills in with what looks correct but isn't, and asserts it with the same poise as when it's right. I devote step E006 to it.

No comments yet

No comments yet. Be the first.