When someone says they remember their first kiss, that they know the capital of Mongolia, that they play the piano without thinking, that they don't forget who they are after sleeping, and that they hold a phone number in their head for the six seconds it takes to dial it, they're describing five different things. Not different metaphors for one faculty. Five neural systems with different substrates, different forgetting curves, different pathologies and different modes of training. English calls them all memory. It's wrong, and the error would matter little if it weren't for the fact that there's an entire industry living, right now, off selling you just one of those five systems, badly done, under the generic label.
The day Tulving split the word in two
Let's start with the catalogue, which goes back quite a way. Endel Tulving published in 1972 a chapter in Organization of Memory separating two things that until then were merrily mixed: the memory of the facts of the world and the memory of the facts of your life. Knowing that Lisbon is the capital of Portugal and remembering the day you were in Lisbon are different operations.
The first he called semantic: abstract content, with no temporal coordinates, no sense of having been there. It's what's in an encyclopedia, what's in your head after an exam, what you have when you can define a word. The second he called episodic: the recollection of a specific event, with its when, its where, its smell of a steep street, and above all with the awareness that it was you who was there.
Tulving added that the episodic comes accompanied by a particular kind of consciousness — the autonoetic (the one that recognises itself in the past) — that doesn't appear when you retrieve a semantic fact. Knowing that something happened and remembering having lived it are different mental experiences. You notice it immediately when a friend tells you a story of yours that you'd forgotten: until you retrieve it as your own, it's someone else's anecdote.
What the body knows without telling you
Thirteen years later, in 1985, Tulving widened the picture with a third system: procedural memory. It's the body's. The pianist's who has stopped thinking about the fingers, the swimmer's who enters the water and the native speaker's who conjugates without recalling the rule. It's built through motor repetition, stored in cerebellar and basal-ganglia circuits, and it resists neurological deterioration much better than the others.
A patient with deep amnesia may not remember what they ate an hour ago and still play the piano correctly. The curious thing is that the procedural isn't accessible to language: if you ask a veteran pianist to explain exactly what their left hand does in bar 14, they won't be able to. And if they try, they'll get it wrong. There's a knowing that doesn't pass through consciousness, and that breaks when forced to pass through it. Kandel spent a whole life in In Search of Memory chasing that cellular border: what makes a synapse become a skill and not a memory, what changes in the molecular machinery when a hand has been exercised enough.
The seven-second whiteboard
While Tulving was organising this catalogue in Toronto, Alan Baddeley and Graham Hitch published in 1974 a different but compatible model: working memory. Don't confuse it with the short-term memory of the old textbooks, even though the nomenclature has crossed a thousand times.
Working memory is an active whiteboard, not a store. It holds information while you do something with it. It has a ridiculous capacity — on the order of about seven elements, fewer if the elements are complex — and a duration of a few seconds if you don't refresh it. It's what you use to multiply two two-digit numbers without paper, what you use to hold the start of a long sentence until the verb arrives, what overflows when someone dictates a whole address in one go.
Baddeley's model distinguished a phonological loop, a visuospatial sketchpad and a central executive sharing resources between the two. It's cognitive engineering, not metaphor.
The story that holds up what we call the self
Daniel Schacter later added a fifth category, the autobiographical, which some authors integrate within the episodic and others keep apart for an important reason: the autobiographical isn't a heap of loose episodes. It's the continuous narrative of the self.
It's the story you tell yourself about who you are, where you come from, what happened to you when you were nine, what that turned you into. It's not the sum of the episodes; it's the edit. It has gaps, it has fabrications, it has constant rewrites — the narrative at forty doesn't match the one at twenty even if the facts are the same — and it has a central role in holding identity together.
Damasio places it in the very substrate of extended consciousness: without a biography, there's no self that recognises itself in time. There's an organism, there are reactions, there's even a proto-self, but there's no one who says "I".
Five ways to train, five ways to fail
Five systems, then. Working, episodic, semantic, procedural, autobiographical. Each is trained differently.
The semantic with study, review, spaced repetition. The procedural with hours of motor practice. Working memory barely trains: it has an almost fixed ceiling in each individual. The episodic consolidates on its own during sleep, especially in the slow-wave phases. The autobiographical is built by narrating, and is lost when one stops narrating — the elderly who stop talking about their life don't lose the facts, they lose the thread.
Each fails differently too. Alzheimer's first razes recent episodic, leaves the semantic for years, preserves the procedural for a long time. Aphasia destroys the semantics of language without touching the motor procedural. A concussion can erase the last hours of episodic without altering anything else. And each declines differently in normal ageing: working memory worsens fast, the procedural holds astonishingly well, the semantic even grows into old age. Squire summed up in 2009, in Memory and Brain Systems, forty years of lesion neuropsychology: the dissociation between systems isn't theory, it's what you see when you compare one damaged brain with another. Each substrate has its pathology, and that's the cleanest proof that this isn't a single faculty in several disguises.
The question that's paying salaries
Now, on top of this, ask yourself the question that's paying salaries in Silicon Valley. Does an LLM (large language model, the ChatGPT-type systems) have memory?
What an LLM has by default is a context window (the chunk of text the model can read in a single turn). It's a buffer (a temporary holding space) into which the recent text of the conversation enters and from which the model computes the next word. Functionally, it resembles working memory: limited capacity, active content, erased when the session closes.
If you fit the model to the catalogue of five, the context window is working. Enormous working — hundreds of thousands of units in some models — but working. Nothing more. When you close the tab, the whiteboard is wiped. The next conversation starts from zero, exactly like a patient with anterograde amnesia (the inability to fix new memories after the lesion).
Frozen weights, fossilised semantics
What the model also has, even though the word "memory" is used for this too, are the weights: the billions of numerical parameters frozen during training. That is, if you like, a fossilised semantic memory. It contains patterns of language, statistical associations between concepts, facts of the world learned by massive exposure.
It isn't exactly human semantics — it isn't organised into conceptual networks, it doesn't admit metacognition about what it knows or doesn't know — but it plays a similar role when you ask it the capital of Mongolia. The important difference is that the weights aren't updated during the conversation. The model doesn't learn from you. The model answers you with what it already knew before you existed.
The patch sold as memory
To patch this gap, the industry invented a few years ago RAG (retrieval-augmented generation, generation augmented by retrieving external documents). Lewis and others formalised it at NeurIPS 2020.
It's a useful technique: you put documents in an external store, turn them into numerical vectors, and when the model gets a question it searches that store for the closest match and inserts it into the context window before answering. This lets it "remember" things that weren't in its original training, including your earlier conversations if you've saved them.
Some commercial products package this and call it, plainly, "memory". It is, in a poor sense. It's external semantics indexed by similarity. It isn't episodic. It has no time stamp, no sense of having been lived, it isn't contextually reconstructed. It's a database query disguised with the word from cognitive psychology that sells best.
What a model doesn't have and can't have
The procedural, in an LLM, simply doesn't exist. There's no motor system trained over hours of practice. The only similar thing is fine-tuning (a fine readjustment of the model's weights from additional examples), and that happens outside the conversation, by the maker's decision, not the user's.
Nor is there autobiographical, and here's the hard border. A model doesn't have a life. There's no continuous thread from its "childhood" — it has no childhood — to this moment. There's no narrative of itself holding its identity together between sessions.
When a user asks the model to tell its story, the model improvises one from the patterns of human autobiography it ingested. It's a fictional autobiography generated on the fly. If you ask it tomorrow, it'll tell you another, unless someone has written it for it and saved it in the RAG. And even so, what's saved in the RAG would be the account, not the lived experience. Like asking an actor to recite someone else's diary.
Also missing, and this is the most interesting for what's coming, is autonoetic consciousness. Tulving's. The knowing-oneself-in-the-past. When you retrieve an episodic memory, you don't retrieve only a content; you retrieve the sense of having been you who was there. An LLM has nothing similar because it has no persistent self to bind the content to. It has a text in a window, and frozen weights, and optionally a vector base. There's no subject. There's no one who recognises themselves.
The trick is in the word
So consider what happens when a salesperson tells you their product "remembers" your conversations. The sentence is built so your brain processes it with the full word. Memory, plainly, the one of the five integrated systems, the one that holds your life together.
What the product actually does is save fragments of text in a store, vectorise them, and reinject them into the context window when some similarity algorithm deems them relevant. The difference between what the word suggests and what the system does is enormous, and the word is chosen so that difference shows as little as possible.
It isn't a linguistic accident. It's a saving on explanation and a displacement of responsibility. If the system "forgets" something important, the user reads it as a minor memory lapse, not as what it is: an architectural limitation of the product they bought believing they were buying something else.
Bender and others pointed this out in Stochastic Parrots in 2021, though with less bite than the topic deserved. Importing human cognitive vocabulary to describe LLMs isn't neutral. "Memory", "understand", "reason", "hallucinate" — all are words that activate in the listener a mental model of the human mind and project that model onto the artificial system. Every time the industry uses one of these words it's collecting, for free, the expectation the word carries in natural language. And every time it uses it without qualifying, it's papering over an ontological difference with a comfortable synonym.
Next time they tell you
The healthiest thing you can do next time you hear "this AI remembers" is ask which of the five. Working. Semantic. Episodic. Procedural. Autobiographical.
The technical answer is: the first, partially, and the second in a frozen, non-updatable form. The other three it doesn't have, has never had, and by current architecture can't have.
If whoever's selling you the product answers "all of them", you already know who you're talking to. If they answer "none of them fully", you also know, and you can probably trust the other things they say a bit more.
The word is the same. The things it names aren't.
Definiciones
Episodic memory. A system that stores recollections of specific events lived by the subject, with coordinates of time and place, accompanied by the sense of having been there oneself.
Semantic memory. A system of abstract knowledge about the world, with no reference to the moment or place where it was learned. The capital of a country, the meaning of a word, the rules of a game.
Procedural memory. An implicit motor system that stores bodily skills acquired by repetition. Playing an instrument, swimming, typing. It doesn't pass through consciousness and deteriorates if one tries to verbalise it.
Working memory. An active mental whiteboard of very limited capacity (about seven elements, seconds of duration) in which information is manipulated while a cognitive task is executed.
Autobiographical memory. A continuous, edited narrative of the self, built from episodes but not reducible to them. It holds personal identity together over time.
Autonoetic consciousness. A type of consciousness, described by Tulving, that accompanies the retrieval of an episodic memory and consists of recognising oneself as the subject of the remembered event.
Anterograde amnesia. The inability to fix new memories from the moment of a brain lesion, even though prior memory remains intact.
LLM (large language model). A large-scale statistical model trained on enormous amounts of text that predicts the next word from a given context. The conversational systems of the ChatGPT type are its best-known implementation.
Context window. The maximum amount of text an LLM can have simultaneously present while generating a response. It's measured in units called tokens (fragments of words). It's wiped when the session ends.
Model weights. The billions of numerical parameters an LLM learns during its training and that are frozen when it's deployed. They statistically encode what the model "knows".
RAG (retrieval-augmented generation). A technique that adds to the LLM an external store of vectorised documents; given a question, the most similar fragments are retrieved and injected into the context window before generating the response.
Fine-tuning. A readjustment of a subset of the model's weights from additional examples, usually to specialise it in a domain or task. It happens outside conversations, in a separate training phase.
Referencias
Tulving, Endel (1972). Episodic and Semantic Memory. Chapter in Organization of Memory (Academic Press). The work introducing the canonical distinction between episodic and semantic memory, the axis of the catalogue that structures this article.
Tulving, Endel (1985). How Many Memory Systems Are There? American Psychologist 40, pp. 385–398. Widens the earlier picture by adding procedural memory and formulating the notion of autonoetic consciousness associated with the episodic.
Baddeley, Alan & Hitch, Graham (1974). Working Memory. In The Psychology of Learning and Motivation, vol. 8 (Academic Press). Original formulation of the working-memory model with phonological loop, visuospatial sketchpad and central executive.
Schacter, Daniel L. (2001). The Seven Sins of Memory (Houghton Mifflin). Provides the operational taxonomy that includes autobiographical memory as a distinguishable system and describes the typical failure modes of each.
Squire, Larry R. (2009). Memory and Brain Systems: 1969–2009. Journal of Neuroscience 29, pp. 12711–12716. Neural synthesis of the multiple systems; support for the claims about differentiated substrates and patterns of selective damage.
Damasio, Antonio (1999). The Feeling of What Happens (Harcourt). Source of the thesis that autobiographical memory is the substrate of extended consciousness and that, without it, there's no self that recognises itself in time.
Kandel, Eric (2006). In Search of Memory (Norton). General neuroscientific background on the consolidation and cellular substrates of the different mnemic systems.
Lewis, Patrick; Perez, Ethan and others (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. Original formalisation of RAG, the technique cited in the article as the industrial patch to the persistence problem in LLMs.
Bender, Emily M. and others (2021). On the Dangers of Stochastic Parrots. FAccT 2021. Reference for the critique of the commercial use of human cognitive vocabulary (memory, understand, reason) applied to language models.
También te interesa
- RAM memory vs human memory. Two different things under the same word
- The mind as limited space. Knowledge does take up room
- Identity as process. The self that rewrites itself every night
No comments yet
No comments yet. Be the first.
Leave a comment