Reasoning
In the two previous steps it became clear that a chat builds its answer chunk by chunk, picking the most likely word each time, and that that engine is an LLM. If that's so, an uncomfortable question remains: how does something that only predicts the next chunk of text "reason"? When I understood the trick, it stopped looking like magic and I began to know when to ask for it. That's what I want you to see right away.
The problem with blurting out the answer all at once
I'll start where it hurts. If the model only predicts what comes next, it has one clear weak spot: things that require several steps. A chained calculation, a problem with a catch, a logic question where order matters. There, trying to land the result in a single shot is asking it to guess the ending without walking the path.
Think of a riddle along the lines of "I'm twice as old as you were when I was the age you are now." A model that fires off the answer point-blank usually trips up, because there's no strong pattern leading straight from the wording to the right number. You have to work it out step by step. And an engine that predicts text, if it doesn't give itself room to lay out that working, has nowhere to put the intermediate sums.
Thinking on scratch paper
Here's the trick, and it's of an almost disappointing simplicity. Instead of blurting out the solution, the model first writes the intermediate steps and only gives the result at the end. Like someone who doesn't answer off the top of their head but grabs scratch paper, jots down what they work out, and arrives at the number in order. That's chain of thought.
The lovely thing is why it works, and it connects straight to the first step. The model predicts each chunk leaning on all the text already written before it. If what's written before is the bare wording, it has little to hold on to. But if it forces itself to write "first I work out this, which gives so much; with that I get the other thing…," each step it lays down becomes context for the next. It's giving itself the footholds it needs. It's not that it understands more: it's that, by generating the path, each link makes the correct next link more likely.
The technique was described in a paper by Google researchers in 2022, which showed that letting the model lay out those intermediate steps greatly improved its answers on arithmetic and logic problems. It's not a homespun hunch; it's measured.
Asking it yourself or letting it do it alone
There are two ways this happens, and it's worth telling them apart, because they change how you work with the tool.
The first is asking it yourself. You just add to your message something like "think it through step by step before answering." That silly little phrase —in the famous experiments it was literally "Let's think step by step"— is enough to get the model to stop firing and start laying out the reasoning. It's one of the most rewarding things I learned: one extra line in the question and the answer improves on anything with several stages.
The second is the model doing it without being asked, because it comes trained for it. From this come the reasoning models: variants that, before answering you, spend a while "thinking" on their own, generating that scratch paper internally. Sometimes they show it to you in summary, sometimes they hide it and you only see the conclusion, but inside they're doing the same thing, chaining steps before the destination.
The "thinking" that isn't thinking
Now comes the misunderstanding to knock down, and it's the same one as always with a new face. That "thinking" isn't consciousness, or understanding, or someone inside mulling over your problem. It's still, word for word, prediction of the next chunk of text. The only thing that's changed is what it predicts: before it generated only the destination, the answer; now it also generates the path, the steps. More visible text, exactly the same mechanism underneath.
That's why I prefer the scratch-paper image to the "reasoning" one. When a person reasons, there's an understanding holding up the steps. When the model "reasons," there's a sequence of likely tokens, each leaning on the previous ones, that happens to trace a path that usually leads to the right place. It works very well, and even so there's no one inside understanding the problem. It's calculation that mimics deliberation, not deliberation.
Being clear on this saves you grief. You'll see the model write "let me think this through carefully" and feel that it's genuinely making an effort. It isn't, in the human sense; it's generating the kind of text that, in its data, came before a good answer. It reproduces the shape of thinking, just as in the first step it reproduced the shape of feeling.
Reasoning helps, but doesn't cure
And here's the practical part, the one that changes how you use it. Reasoning step by step raises reliability, but it has two costs worth accepting from the start.
The first is that it costs more. All those intermediate steps are text the model generates, and generating text consumes time and, when you pay by usage, money: more tokens per answer. That's why a reasoning model takes longer to answer than one that fires. You're buying accuracy with patience. For a simple question it's wasteful; for a problem with several stages, it's exactly what you need.
The second is subtler and links to a step that's coming: reasoning doesn't eliminate errors. A reasoning model can walk a path that looks flawless, get an intermediate step wrong without warning and arrive blithely at a false conclusion, asserting it with the same poise as ever. Seeing the steps helps you catch the slip, but doesn't prevent it. The poise when it's wrong isn't cured by reasoning; I devote step E006 to it.
When you'll want each one
From all this comes a consequence you'll use a lot: there are "fast" models and models "that think," and there's no better one in the abstract. For a simple answer, a fact, a rephrasing, the fast one serves you and doesn't keep you waiting. For something with several pieces to fit together, the reasoning one earns what it costs. Knowing what you've got in front of you and what suits you is, already, part of the craft.
And that opens precisely the next step: how many models are out there and how they differ, beyond whether they think or fire. Choosing the tool well starts here.
Definitions
- Chain of thought: the trick by which the model writes the intermediate steps before giving the answer, instead of blurting it out all at once. Like solving on scratch paper. By generating the path, each step serves as a foothold for the next and it gets more right. - Reasoning model: a variant of LLM trained to lay out that chain of thought on its own before answering, without being asked. It "thinks" for a while and then answers; in exchange it takes longer and consumes more. - Fast model: the one that composes the answer directly, without stopping to lay out steps. It answers sooner and spends less, at the cost of failing more on anything with several stages. - "Think it through step by step": the instruction you can add to your message to trigger chain of thought in a model that wouldn't do it on its own. One extra line that improves answers with several stages.
Further reading
- IBM, What is chain of thought (CoT) prompting? — a clear explanation of step-by-step reasoning and why it improves answers. https://www.ibm.com/think/topics/chain-of-thoughts - AltexSoft, Chain-of-Thought (CoT): Prompting and LLM Reasoning Explained — plain-language, with before-and-after examples. https://www.altexsoft.com/blog/chain-of-thought-prompting/ - Fundación Innovación Bankinter, Cadena de pensamiento: cómo la IA descompone un problema complejo — plain-language piece in Spanish, straight to the concept. https://www.fundacionbankinter.org/noticias/cadena-de-pensamiento-como-la-ia-descompone-un-problema-complejo/ - IFEMA Madrid, LLM: evolución hacia modelos razonadores — in Spanish, on the difference between ordinary LLMs and reasoning ones. https://www.ifema.es/noticias/tecnologia/que-es-un-llm-modelos-razonadores
No comments yet
No comments yet. Be the first.
Leave a comment