Emotional computing. The thermometer and the fever

Rosalind Picard founded affective computing (the field that means for a machine to detect and respond to emotions) in 1997 with a reasonable promise: to ignore the emotional component when designing machines was to design for a caricature of the user. Twenty-eight years later that promise walked out of the lab and turned into product. Today there are systems that classify a child's boredom in class, a candidate's anxiety in an interview, a passenger's stress in an aeroplane cabin. They detect patterns with an acceptable statistical hit rate, and that isn't understanding an emotion. It's labelling it. The difference between "it recognises that you're sad" and "it knows what sadness is" is the same as the one between the thermometer and the fever.

A mercury thermometer tells you you're running 39.4. It doesn't know what being ill is. It has never shivered, never sweated, never felt that borrowed-body sensation where your head seems to weigh double. It measures a quantity and turns it into a number, and that number is useful precisely because the thermometer understands nothing.

Affective computing works the same way. And it's sold as if it didn't.

The founding promise and its industrial drift

Rosalind Picard published Affective Computing in 1997 from the MIT Media Lab. The book founded a field and, above all, founded a promise. Machines might one day detect human emotions, respond to them, fold them into the interaction. Picard's argument was defensible. If human cognition runs with the emotional component on top, a machine that ignores that component is interacting with a caricature of the user.

The trouble came later, when the promise left the lab.

Today there are systems that look at a child's face in class and report to the teacher whether they're bored, distracted or anxious. Systems that record a job interview and score the candidate by tone of voice, microexpression, blink rate. Systems installed in aeroplane cabins that flag a passenger as potentially dangerous if their skin conductance rises above a threshold. Software that, in a call centre, turns red when it detects "frustration" in the agent, so the supervisor knows it's time to step in. Cameras in supermarkets estimating the "satisfaction" of whoever walks out the door. Almost none of this is sold under the heading of surveillance. It's sold as wellbeing, safety, optimisation, assistance. The vocabulary is always kind.

And technically it works. That's the awkward part. If you train a model on enough faces labelled "sad", the next face with drooping brows and downturned corners it'll classify correctly fairly often. The system recognises the pattern. The question is what you think it has done when it recognises it.

What the machine measures and what it infers

There are two things in play that get confused on purpose.

What the machine measures are physical signals. The geometry of the face's muscles in a given instant, the variations in tone and volume of the voice, the electrical conductance of the skin, the heart rate, pupil dilation, body posture. All of that are quantities. Just like temperature. So far, honest science.

What the machine infers from those signals is something else. An emotion label. "Joy", "sadness", "fear", "surprise", "disgust", "anger". The six usual suspects from Paul Ekman's paradigm, which for half a century was sold as universal before the evidence itself began dismantling it. The leap from signal to label isn't a technical leap, it's a cultural decision dressed up as computation. When a system claims you're sad because the corners of your mouth point down, it has understood nothing about your sadness. It has identified a visual pattern that a team of annotators — probably white, probably American, probably between twenty and forty — marked as sad on a spreadsheet five years ago.

That's labelling, not comprehension. And the difference matters because the decision taken next is taken with the label, not with the signal.

What Barrett broke

Lisa Feldman Barrett published How Emotions Are Made in 2017 and, two years later, signed — with four colleagues — a huge review in Psychological Science in the Public Interest titled Emotional Expressions Reconsidered. The title is polite. The content is devastating. After reviewing more than a thousand studies, the conclusions are clear: there is no reliable map between facial configurations and emotions. People smile when they're uncomfortable and frown when they concentrate. They cry at weddings and funerals and over an insurance ad. Facial expression is a communicative act shaped by culture, context and the moment, not the mechanical imprint of an inner state.

Barrett's theory goes further. Emotions, she argues, aren't innate universals that the brain runs as inherited subroutines. They're constructions the brain assembles on the fly, combining bodily sensations with learned concepts. The word "melancholy" doesn't describe an emotion that existed before the word; to a large extent, the word organises the experience and makes it recognisable to whoever has it. A culture with an emotional vocabulary different from English feels different things, not the same things with different labels.

This is what affective computing can't take on board and carries on as though nothing were amiss.

Because if emotions are culturally constructed, each dataset (the set of data the model is trained on) encodes a specific culture as universal. When a multinational sells a system trained on Western faces to an Asian government to surveil Asian children, it isn't exporting neutral technology. It's exporting an emotional taxonomy. And the available studies show exactly what the theory predicts. Facial emotion-recognition systems perform worse with non-Western subjects, worse with dark skin, worse with women's faces. Buolamwini and Gebru already documented the pattern in 2018 for general facial recognition, and the problem is inherited and magnified in emotion recognition, where the cultural variable weighs even more.

The placebo trick

There's a layer dirtier still than bias. Even if the system got it right, even if the dataset were perfect, even if the emotional taxonomy were defensible, there remains the problem of what happens when a user interacts with a machine that says "I understand how you feel".

Joseph Weizenbaum got a fright in 1966 when he saw that people, his secretary included, confided in ELIZA, a hundred-line program that reformulated their sentences as questions. The secretary, who had seen the code, asked Weizenbaum to leave the room so she could talk to the machine alone. Weizenbaum spent the rest of his life horrified by what he had built.

ELIZA had no camera, no microphone, nothing. It was a keyboard and rules. The emotional placebo worked all the same.

From keyboard to prosodic modelling

Today the placebo is industrial. A chatbot (a program that converses by imitating a human interlocutor) with a warm voice and fine prosodic modelling, able to detect from the tremor in your voice that you're sad and to answer with an "I understand, what you're telling me is hard", produces in the listener the effect of having been heard. It produces it for real. The relief is real, even though there's no one inside.

This can be sold as a therapeutic advance. Some sell it that way, with a very straight face. The trouble is that a tool that produces relief with no one inside stops being empathy and turns into something murkier. If the relief works and the other party doesn't exist, what you've got is an efficient mechanism for extracting confidences, regulating moods and modulating behaviour without the user knowing who they're talking to. The question isn't whether the machine understands you. The question is who the relief it produces in you is working for.

Who measures whom

This is where the technical debate falls away and the one that matters appears.

Stark and Hoey laid it out in black and white in The Ethics of Emotion in Artificial Intelligence Systems, presented at FAccT 2021; Kate Crawford had travelled the same idea from another angle in the "Affect" chapter of Atlas of AI that same year. Affective computing is, above all, an instrument of power. Not for its precision, but for its asymmetry. Whoever deploys it is always above whoever suffers it. The child in the classroom doesn't decide to install the camera measuring their attention. The candidate for a job doesn't negotiate the algorithm that grades them by their blinking. The passenger at the airport doesn't sign consent for a conductance sensor to decide whether they go through the fast lane or stay in the back room. The call-centre agent doesn't audit the model deciding whether their tone was cordial enough with customer number 137 of the day.

The measure is always applied downward.

The measure that only goes one way

No one has ever seen an affective-computing system installed in the boardroom to detect whether executives are lying during the earnings presentation. Nor at ministry press conferences. Nor in judges' chambers to audit whether a verdict was drafted in an even-tempered emotional state. The technology exists. The cameras and microphones are right there. What's missing is the political will to apply it symmetrically, and that will isn't going to appear because the asymmetry is the feature, not the defect. Affective computing is interesting to whoever buys it precisely because it goes one way.

Article 19's report on the Chinese emotion-recognition market, published in 2021, describes the extreme case. But the extreme case is useful for lighting up the normalised one. What in China is done openly — sensors in classrooms reporting to school management which pupils "lose focus", cameras in police stations claiming to detect lies during interrogations — in Europe and the United States is done under euphemism. Platforms like HireVue went as far as selling facial analysis for job interviews until the 2019 scandal forced them to partly retreat in 2021. They retreated on the facial part. The vocal and linguistic analysis is still standing. The business doesn't vanish, it reshuffles.

The problem that isn't technical

When a critic of affective computing points to the bias problem, the industry answers that the datasets will improve. When you point to the precision problem, the answer is that the models keep getting finer. When you point to the theory problem — that emotions aren't what the system assumes — the answer turns evasive and the subject changes. And when you point to the political problem — who measures whom, who receives the result, what gets decided with it — the answer is straight-up silence, or an appeal to the regulatory framework, which is the polite way of saying the problem isn't the manufacturer's.

Damasio, in Descartes' Error, argued three decades ago that without emotion there's no functional reason, that an emotionally flat brain makes bad decisions even with its logical circuitry intact. That argument was an invitation to take emotions seriously, not to treat them as one more parameter a sensor picks up and a classifier dispatches. The difference between the two readings marks everything. One invites complexity, the other invites product.

The thermometer doesn't understand the fever and is nonetheless useful. It works because the doctor reading the thermometer does understand, contextualises, decides. The question affective computing doesn't want you to ask isn't whether the thermometer measures well. It's who's reading the result, what they're going to do with it, and whether you're invited to that conversation or you're the body the measure is taken from.

Definitions

Affective computing. A field founded by Rosalind Picard in 1997 that studies the design of computing systems able to detect, interpret and respond to human emotional states from physiological, vocal or facial signals.

Microexpression. A very brief facial movement, fractions of a second long, that in Paul Ekman's paradigm is assumed to be an involuntary sign of a repressed emotion. Its validity as a reliable marker of inner state is one of the points contested by contemporary psychology.

Skin conductance. Variation in the electrical resistance of the epidermis associated with sweat-gland activity. Used as an indirect indicator of physiological arousal, not as a direct measure of any specific emotion.

Dataset. A structured set of data a machine-learning model is trained on. In emotion recognition, each sample is usually labelled by human annotators, which introduces the cultural biases of whoever does the labelling.

Ekman paradigm. A theoretical framework positing the existence of six or seven basic universal emotions (joy, sadness, fear, surprise, disgust, anger, contempt) with facial correlates identifiable across cultures. It's the implicit substrate of most commercial emotion-recognition products.

FAccT. ACM Conference on Fairness, Accountability and Transparency. The reference academic conference on the ethical and political implications of algorithmic systems.

References

Picard, R. W. (1997). Affective Computing. MIT Press. The field's founding work, cited at the start of the article as the origin of the promise and the research programme.

Barrett, L. F. (2017). How Emotions Are Made: The Secret Life of the Brain. Houghton Mifflin Harcourt. The basis of the critique of the supposedly universal and innate character of emotions.

Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M. & Pollak, S. D. (2019). Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements. Psychological Science in the Public Interest, 20, 1–68. A critical review covering more than a thousand studies and dismantling the mechanical correspondence between face and emotion.

Crawford, K. (2021). Atlas of AI. Yale University Press. The "Affect" chapter, the basis of the political reading of affective computing as a power asymmetry.

Stark, L. & Hoey, J. (2021). The Ethics of Emotion in Artificial Intelligence Systems. FAccT 2021. A head-on critique of deploying emotion recognition in workplace and educational settings.

Buolamwini, J. & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. FAccT 2018. Empirical documentation of the systematic biases in commercial facial recognition by gender and skin tone.

Damasio, A. (1994). Descartes' Error: Emotion, Reason, and the Human Brain. Putnam. The thesis on the impossibility of separating reason and emotion in cognitive functioning, mentioned at the end of the article.

Article 19 (2021). Emotional Entanglement: China's Emotion Recognition Market and Its Implications for Human Rights. https://www.article19.org. The source for the Chinese case on deploying emotion recognition in classrooms and police stations.

Weizenbaum, J. (1966). ELIZA — A Computer Program for the Study of Natural Language Communication Between Man and Machine. Communications of the ACM, 9(1), 36–45. The origin of the experiment that illustrates the placebo effect of simulated empathy.

Elsewhere

#ai-ethics #surveillance #algorithmic-bias #real-ai-harm