Where AI Is Really Thought About, and It's Not on LinkedIn

Today we're talking about a mistake of mine. I've been playing with this for quite a few years. In 2005 I started with a language called Prolog and ever since I've watched things speed up. Until one day —it must have been around 2022, before ChatGPT— I decided artificial intelligence wasn't going to be good for anything serious. I decided it for a reason that seemed solid to me then: AI's results aren't traceable. With a normal program I know what it does on each line. With a neural network you can't. That, I thought, disqualified the whole technology. December 2022, ChatGPT exploded right under my nose and I understood how wrong I'd been. If I'd paid TensorFlow half the attention I paid to despising it, today I'd understand much better what's happened. Where had I been looking? And, more importantly, where had it been happening?

Three Places and a Fourth

It was happening at MIT CSAIL, at Berkeley BAIR, at the CMU Robotics Institute. It was happening too, depending on the year, at Stanford and at Toronto. It's not a ranking —it's an ecosystem.

MIT CSAIL —Computer Science and Artificial Intelligence Laboratory— has its origin in the AI Lab founded in 1959 and Project MAC founded in 1963 with DARPA funding. The two labs merged in 2003. Today CSAIL is the largest lab on the MIT campus with more than six hundred people. This matters because any technical idea that passes through there is discussed with sixty years of accumulated tradition. The internal conversation has decades of depth.

Berkeley AI Research —BAIR— was formally established in 1990, bringing together groups in computer vision, machine learning, natural language processing, robotics and planning. Two dozen professors, more than a hundred doctoral students. The place's historic specialty is reinforcement learning applied to robotics. That kind of work isn't done in a private master's program —it's done with physical labs and ten-year funding.

The CMU Robotics Institute, in Pittsburgh, is the oldest robotics center in the world within a university. Raj Reddy founded it in 1979. Its alumni are everywhere —DeepMind, Anthropic, Boston Dynamics, Tesla, Nvidia.

The detail that matters isn't the ranking. It's that a CMU paper is written by fifteen people who also collaborate with Google DeepMind and Anthropic. The border between academia and industry there is porous by design. In Spain the border is a rampart.

What I Didn't See

I'm telling you from the personal experience of not having been there.

In 2009 I worked intensively on natural language processing. What was being done then was lexical engines. They calculated probabilities of the next word. It was exhausting for the machines of the time —the computational structures didn't fit in the available RAM. I can vouch for it. I decided then that AI "wasn't going to be good for anything" because the results weren't traceable. Mistake.

While I reached that wrong conclusion and gave up, at MIT, Berkeley and CMU they were working on exactly how to make it fit. On how to scale the architectures. On how to harness GPUs for training. In 2012 AlexNet won ImageNet with deep networks on GPU and everything changed. In 2017 Vaswani and seven co-authors published "Attention Is All You Need" at NeurIPS. In 2018 Devlin and company published BERT. In 2020 GPT-3 came out.

I didn't pay attention to TensorFlow when it was released in 2015. If I had, today I'd understand more clearly what's happened. Luckily it can be reconstructed, paper by paper, what was done, when, where and by whom. The reconstruction is slow but possible. In 2005 it was nearly impossible. Today it's just a matter of patience.

What I'm telling here isn't nostalgia. It's practical information. The AI that astonishes us today was already cooking in specific places a decade ago. Whoever followed those places now understands the moment we're living through with much more clarity. Rest assured the wave rising in front of us is going to change us from top to bottom, and whoever understood it sooner is going to surf it with an advantage.

The Trick to Reading a Paper Without Knowing Math

This is what the 13,000-euro private master's programs don't teach you, because if you knew it you wouldn't need the master's.

A Berkeley or MIT paper always has the same structure. Abstract —a dense summary. Introduction —what problem they attack and why it matters. Related work —what people did before and where they fell short. Method —the new architecture or algorithm. Experiments —what they tested and with what data. Results —what they got. Discussion —what the results mean. Limitations —what doesn't work. Conclusion —a final summary.

To understand 60% of the paper without advanced mathematical training, it's enough to read the abstract, the introduction, the method (skipping the equations, looking at the diagrams), the results (looking at the charts) and the limitations.

The limitations are the most honest section of the paper. Where the authors admit which problems they didn't solve, which edge cases fail, which assumptions they didn't check. If a paper has no limitations section, or has it in two lines, be suspicious. If it has it in a page and a half with concrete details, it's probably a serious paper.

To find good papers there are three archives worth learning to navigate —sorry, not worth, you have to learn to navigate them if you want to understand what this is about.

arXiv. Specifically the categories cs.AI, cs.LG (machine learning), cs.CL (computational linguistics) and stat.ML. It's an open archive. The quality is uneven because there's no prior peer review, but serious authors publish there first. Three days after the presentation at a conference, the paper is usually on arXiv.

Papers With Code. An aggregator that links papers to their implementation and to comparable metrics. Useful for knowing whether a claim has been replicated.

OpenReview. Where the reviews of ICLR and other conferences are published. There you can read how the reviewers challenged the paper before its acceptance. If you want to understand why a paper is considered serious or not, the dialogue on OpenReview is the best clue.

The Psychological Obstacle

It's not for researchers. I say it because I was told the opposite for years and I believed it.

Anyone with structured curiosity can understand 60% of a Berkeley paper if they devote an afternoon to it. It's less effort than finishing a Netflix series. More useful than reading a newsletter summarized by an aggregator. More interesting than listening to a YouTuber explain the paper three months late and having read no more than the abstract.

The obstacle is psychological. There's the effect of the paper as a sacred object —the idea that only a PhD grants the right to read it. There's the language obstacle —the papers are in English and the technical documentation isn't translated. There's the format —the double column in PDF intimidates the reader used to the web's vertical scroll.

All three obstacles are surmountable. The language is solved with DeepL or a generative assistant doing a first pass, with the reader reviewing the technical terms that shouldn't be translated. The format is solved by downloading the PDF to Zotero or a decent reader. The sacred effect is solved by reading two or three papers in a row and confirming they're no more obscure than any technical manual.

The Translated Leftovers

When an MIT paper goes through the usual translation chain —paper → tweet from a researcher → TechCrunch article → translation in Xataka → derivative article in El Confidencial → YouTube video in Spanish— it loses three critical things.

It loses the quantitative nuance. The paper says "improves 12% on this specific benchmark under these conditions." The translation says "improves dramatically." The Spanish-speaking reader receives "dramatically" without the figure or the conditions.

It loses the limitations. The paper devotes a whole page to saying "this doesn't work if you have fewer than so much data" or "this fails with sequences above such-and-such." The generic translation omits the limitations because "they aren't the interesting part." The reader receives the glorious result and never receives the "except when."

It loses the rebuttal. The paper is published, discussed, others try to replicate it, some don't manage to, there are published corrections, there's debate. The generic translation captures only the moment of the initial publication. The reader receives the news and never receives the subsequent conversation.

Three losses. And all of them are avoided by going to the source.

Is It Necessary?

Is it necessary for a normal citizen to read MIT papers?

No.

But it is necessary that in Spain there exist an intermediate layer that does read them, translates them with judgment, discusses them in Spanish with time, connects them to real local cases and puts them within reach of the average reader. That layer doesn't exist in sufficient quantity.

Compared with what exists in English —Jack Clark's Import AI, The Gradient, Last Week in AI, MIT Technology Review's The Algorithm, Lex Fridman's threads— the Spanish-language offering is insufficient. Whoever starts reading papers seriously without waiting for them to be translated is doing work the Spanish editorial ecosystem doesn't make easy for them. It's a shame. And it's an opportunity for whoever decides to build that missing intermediate layer.

Quick Definitions

CSAIL: MIT's Computer Science and Artificial Intelligence Laboratory. Founded in 2003 by the merger of the AI Lab (1959) and Project MAC (1963).
BAIR: Berkeley Artificial Intelligence Research. Established in 1990.
CMU Robotics Institute: founded in 1979 by Raj Reddy. The world's first university robotics center.
NeurIPS, ICML, ICLR: the three main conferences in machine learning. Competitive acceptance (around 25-30% of submitted papers depending on the year).
arXiv: an open archive of preprints. The categories cs.AI, cs.LG, cs.CL are the relevant ones for AI.
Papers With Code: an aggregator that links papers to implementation and metrics.
OpenReview: a platform where the reviews of ICLR and other conferences are published.
Limitations: the section of a paper where the authors admit the unsolved problems. A key quality indicator.

References

MIT CSAIL, Mission & History (official site, csail.mit.edu). The lab's origin in the 2003 merger of the AI Lab (founded in 1959) and the Laboratory for Computer Science, formerly Project MAC (1963, with DARPA funding). It's the largest lab on campus, with more than six hundred people.

Berkeley Artificial Intelligence Research (BAIR), official site (bair.berkeley.edu). Established in 1990; it brings together computer vision, machine learning, natural language processing, robotics and planning.

Robotics Institute, Carnegie Mellon University, official site (ri.cmu.edu). Founded in 1979 by Raj Reddy (with Angel Jordan); the world's first academic department dedicated exclusively to robotics.

Vaswani et al., Attention Is All You Need (NeurIPS, 2017; arXiv:1706.03762). Cited as a milestone of attention architectures. Eight signing authors.

Devlin et al., BERT. Pre-training of Deep Bidirectional Transformers for Language Understanding (2018; arXiv:1810.04805).

Brown et al., Language Models are Few-Shot Learners (GPT-3) (NeurIPS, 2020; arXiv:2005.14165).

Krizhevsky, Sutskever and Hinton, ImageNet Classification with Deep Convolutional Neural Networks (AlexNet, 2012). Cited as the moment deep networks on GPU won ImageNet.

TensorFlow — Google's machine-learning library, released as open source in November 2015 (tensorflow.org).

arXiv — an open archive of preprints; the categories cs.AI, cs.LG, cs.CL relevant for AI. Papers With Code — an aggregator that links papers to their implementation and metrics. OpenReview — a platform where the reviews of ICLR and other conferences are published. The acceptance rates of NeurIPS, ICML and ICLR are around 25-30% depending on the year (neurips.cc).

Import AI (Jack Clark, co-founder of Anthropic), The Gradient, Last Week in AI (Skynet Today) and The Algorithm (MIT Technology Review) — cited as examples of science communication in English starting from the primary source.

También te interesa

En otros sitios

#papers #intelligence #transformers #science-communication

Where AI Is Really Thought About, and It's Not on LinkedIn

Three Places and a Fourth

What I Didn't See

The Trick to Reading a Paper Without Knowing Math

The Psychological Obstacle

The Translated Leftovers

Is It Necessary?

Quick Definitions

References

También te interesa

En otros sitios

Related — Mind

The problem of defining intelligence. If we don't know what it is, what are we calling artificial intelligence?

Thought and language. Speaking well isn't the same as thinking well

RAM memory vs human memory. Two different things with the same word

No comments yet

Leave a comment