How the chat works across different models

In the previous step it became clear that there are many models out there, not just one. Now I get to tell you what I discovered when I stopped reading about them and started using them in earnest: that, although they do the same thing inside, the chat feels different depending on which one you talk to. That changed the way I work, and I want you to see it right away.

Inside, the same engine

Let's start with what doesn't change. It doesn't matter whether you open ChatGPT, Claude or Gemini: underneath, all three do what we saw in the first step. They predict the next chunk of text, over and over, until they compose the answer. There's no secret mechanism one has and another lacks. They're all language models predicting tokens.

For a while, that idea led me to a wrong conclusion: if the engine is the same, I thought, the chats will be interchangeable, the same thing with a different logo. Try one, you know them all. It took me a while to realise that sentence sounds reasonable and is false. The mechanism is shared, yes. What comes out of it isn't.

Why it feels different

The difference is born from how each model is raised. And here it helps to peek, from above, at how one of these chats is trained, because that's where almost everything you later notice when using it lies.

Training has, broadly speaking, two moments. First, pre-training: the model reads enormous amounts of text and, from reading it so much, adjusts its parameters to get the next word right. From this comes its raw knowledge of the world. But a model fresh out of that phase is rough: it knows a lot and behaves badly. That's why the second moment comes, fine-tuning, where its character is polished. Here people come in who teach it, with examples and with ratings of its answers, to respond in a useful, polite way and within certain limits. That part is known by the initials RLHF, reinforcement learning from human feedback, and it's where each house stamps its mark.

And there's the crux. Each company chooses what text to feed it, what it considers a good answer, what topics it dodges and in what tone it wants it to speak. Two models with the same engine but different upbringing come out with different personalities. Not because one is a machine and the other something else: both calculate probabilities over text. But the positions their knobs ended up in are different, and that shows the moment you ask them for something.

Where you notice it

Where does that difference show up when you're in front of the chat? In several places at once, and it's worth having them on file.

The first is tone. Ask three models to write you the same email and you'll see one sounds warmer and more natural, another more neutral and academic, another more correct but somewhat corporate. It isn't chance or your impression: it's the result of each one's fine-tuning. Some say one model writes more naturally than another, and there's usually something to it, though it shifts with each new version.

The second is how they follow your instructions and how much they go on. You ask one to be brief and it is; another, in contrast, tends to expand even when you ask for concision. And there's how easy or hard it finds saying "no": some models refuse requests more often, others are more accommodating. That too comes from how they were tuned.

The third is what kind of task they're more at ease with. The Spanish-language comparisons tend to agree on a similar split: one leans toward writing more fluently, another researches better when it needs to lean on current data, another stands out at coding. Don't take that split as a law, because it keeps changing, but do take it as a hint that each model has its terrain.

Part of the difference isn't the model

Here's a nuance that took me a while to see and that saves you confusion. Not everything that sets one chat apart from another is in the model. Much of it is in what the application puts alongside it.

A chat might have a voice mode to talk to it out loud; another might search the internet while answering you; another connects to your files or to external services. Those capabilities don't come from the model's parameters: they're tools the app builds around it. That's why sometimes "this chat knows things from today and that one doesn't" doesn't mean one model is smarter, but that one has a search tool fitted alongside it and the other doesn't. When you compare two chats, it's worth separating what comes from the model and what comes from the wrapper around it.

Is there really that much difference between big and small models?

A question I asked myself at first, now that I know about parameters: between a model with many billions and another with few, is the difference noticeable? It is, yes, but not always the way you'd expect. Size matters —a large model captures finer patterns and tends to perform better on the hard stuff—, but, as we already saw, it's only one piece: a small, well-tuned model can beat a huge, careless one on a specific task. The size label isn't an intelligence grade.

In your day-to-day, having tried it weighs more than the parameter count. Taking the same task and firing it at two models teaches you more than any comparison you read, because you see with your own eyes where each one diverges.

The best depends on the task

From all this I drew a conclusion that saved me a lot of going in circles. There's no sense in asking which is the best model in the abstract. The good question is: the best for which task?

Because one writes with more flair, another researches better with up-to-date data, another codes more finely, and all of them, without exception, can be confidently wrong. Having more than one on hand and knowing who to turn to depending on what you need is, for me, one of the skills that pays off most. We'll devote a whole stretch to choosing well further on.

For now it's enough to take this away: the engine is shared, but the character isn't. And that character brings a few quirks all models share, starting with the most treacherous: that sometimes they make things up. That's the next step.

Definitions

- Training: the process by which a model, reading and correcting itself, adjusts its parameters until it knows how to predict text. From this comes everything the model knows and how it behaves. - Pre-training: the first phase of training. The model reads enormous amounts of text and learns the patterns of language. It gives it raw knowledge, but still without manners. - Fine-tuning: the second phase, where its character is polished. With human examples and ratings it's taught to answer usefully, politely and with limits. It's where each house stamps its mark. - RLHF: short for "reinforcement learning from human feedback." The fine-tuning technique in which people rate the model's answers to teach it which ones are good. It explains much of each chat's tone. - Open source (open model): a model whose parameters are published so anyone can download and run it on their own, instead of using it only through a company's app. - Tool (of the chat): a capability the application builds around the model —searching the internet, voice mode, access to your files. It doesn't come from the model's parameters, but from what the app adds alongside it.

No comments yet

No comments yet. Be the first.