What is an LLM?

The machine, the field, and the thing with tools.

← All posts

Three letters, a lot of confusion

LLM stands for Large Language Model. It's the engine inside ChatGPT, Claude, Gemini, and every other AI chat tool you've heard about in the last two years. But people use "LLM" and "AI" and "agent" as if they mean the same thing. They don't. And the differences matter — not for trivia, but for understanding what you're actually working with.

Let's start at the bottom and build up.

A neural network, in sixty seconds

Forget everything you've seen in movies. An AI is not a brain in a jar. It's math.

Specifically, it's a structure called a neural network — a system of numbers organized in layers, where each layer takes input from the previous one, multiplies it by a set of weights (numbers that control how much each input matters), and passes the result forward. That's it. Numbers in, math in the middle, numbers out.

The simplest possible neural network:

Input Hidden Layer Output (your data) (the math) (the answer) ×0.8 ×1.2 ×0.6 x₁ x₂ x₃ Σ → Σ → Σ → Σ → cat? dog? Every line is a weight. Every node multiplies, adds, and passes forward.

That's a neural network. Layers of multiplication and addition. No consciousness. No understanding. No feelings. Just arithmetic that happens to be arranged in a particular pattern.

Here's the strange part.

The unreasonable thing

In 1989, a mathematician named George Cybenko proved something that still feels like it shouldn't be true: a neural network with even one middle layer can approximate any continuous function, given enough weights.

A function, in this context, just means a relationship between inputs and outputs. "Given this photo, is it a cat?" is a function. "Given these words, what word comes next?" is a function. "Given this chess position, what's the best move?" is a function.

Cybenko proved that this stack of multiplications and additions — this comically simple structure — can learn to mimic any of those relationships. Not just some. Any.

Think of it like clay. Clay is just dirt and water. But with enough clay and enough shaping, you can sculpt anything — a face, a hand, a building, a landscape. The material is simple. The range of what it can represent is not.

Neural networks are mathematical clay. The material is multiplication. The range is everything.

The same simple material can approximate any shape Wave smooth curve Decision yes/no boundary Spike sharp peak Target function Neural net approximation

This is called the universal approximation theorem. It doesn't say the network will learn any function — just that it can, if it has enough weights and enough examples to learn from. The gap between "can" and "will" is where all the engineering lives.

So what makes it "large"?

A small neural network might have thousands of weights. A large language model has billions. Claude has hundreds of billions. GPT-4 reportedly has over a trillion.

Why so many? Because language is hard.

Recognizing a cat in a photo is one function. Predicting the next word in a sentence while understanding context, tone, ambiguity, sarcasm, metaphor, technical jargon, multiple languages, and the difference between "I saw her duck" (the animal) and "I saw her duck" (she ducked) — that's not one function. That's millions of overlapping functions, all tangled together, and the network needs enough weights to capture all of them at once.

Number of weights (parameters) — each one a knob the network can turn 10,000 Spam filter simple classifier 25 million Image classifier "is this a cat?" 1.5 billion Early LLM GPT-2 (2019) hundreds of billions Modern LLM Claude, GPT-4 scale

The "large" in Large Language Model means: enough mathematical clay to sculpt the shape of human language. All of it. At once.

What makes it a "language model"?

This is the part that surprises people: an LLM does exactly one thing. It predicts the next word.

That's it. You give it a sequence of words, and it tells you which word is most likely to come next. Then that word gets added to the sequence, and it predicts the next next word. And so on. Every response you've ever gotten from ChatGPT or Claude was generated one word at a time, each word chosen because the model calculated it was the most probable continuation of everything that came before.

One word at a time. Each prediction feeds the next. Step 1: "The capital of France is" Paris Paris 90% Lyon 6% ... 4% Step 2: "The capital of France is Paris" . Step 3: "The capital of France is Paris." [done] The model doesn't "know" the answer. It calculates probabilities. The fact that this produces correct, coherent text is the unreasonable part.

"But wait," you're thinking. "If it's just predicting the next word, how does it write poetry? How does it debug code? How does it explain quantum physics?"

Because predicting the next word in human language — really predicting it, across every domain, every style, every level of complexity — turns out to require something that looks a lot like understanding. Not is understanding. Looks like understanding. The debate about whether it's "really" understanding is interesting but doesn't change what it can do for you on a Tuesday afternoon.

How it learned

The model learned by reading. A lot.

During training, the model was shown enormous amounts of text — books, websites, papers, code, conversations, encyclopedias — and for each passage, it tried to predict the next word. When it guessed wrong, its weights were adjusted slightly to make it guess better next time. This happened billions of times, across trillions of words.

Nobody told the model what a verb is. Nobody taught it grammar rules. Nobody explained that code has syntax or that French and English are different languages. It figured all of that out on its own, because those patterns help predict the next word, and predicting the next word is all it was trying to do.

How the model learned — billions of times around this loop Read a passage of text Predict the next word Compare to actual word right or wrong? Adjust the weights just a tiny bit Repeat billions of times. Nobody teaches it grammar. It figures it out.

The training data is frozen. The model you're talking to right now isn't learning from your conversation. It's not updating its weights. It's applying patterns it already learned, to new input it's never seen before. Like a musician who practiced ten thousand songs and can now improvise over a chord progression they've never heard — the practice is over, the performance is live.

OK. So what's "AI"?

AI — Artificial Intelligence — is the big umbrella. It's the whole field. An LLM is a type of AI, but AI includes a lot of things that aren't LLMs:

AI the whole field Spam filter classification Chess engine search + evaluation Face unlock image recognition Netflix recs pattern matching LLM predicts the next word ChatGPT Claude Gemini Agent LLM + tools All LLMs are AI. Not all AI is an LLM. Agents are LLMs that can act.

When someone says "AI will take your job" or "AI is dangerous" or "AI is overhyped," the first useful question is: which AI? The spam filter? The chess engine? The language model? They're as different from each other as a bicycle, a submarine, and a helicopter are all "vehicles."

Then what's an agent?

This is where it gets interesting.

An LLM, by itself, is a brain in a jar. A very talented brain. It can think (predict the next word) — but it can't do anything. It can't read your email. It can't check a database. It can't open a file, run a search, or push a button. It's pure language in, language out.

An agent is an LLM with hands.

Specifically, an agent is an LLM that has been given access to tools — functions it can call to interact with the outside world — plus the ability to decide when to use them. The LLM does the thinking. The tools do the doing. The agent is the combination.

LLM alone:

"What's the weather in Pittsburgh?"
→ "I don't have access to real-time weather data. As of my last training data..."

Agent (LLM + weather tool):

"What's the weather in Pittsburgh?"
[calls weather API]
→ "It's 52°F and cloudy in Pittsburgh right now."

The agent loop — think, act, observe, repeat LLM thinks & decides 1. Think "I need weather data" 2. Act call weather tool 3. Observe read the result 4. Respond "52°F and cloudy" or loop again file web code mail db jira tools the agent can reach for

The LLM decided it needed the weather tool. It called the tool. It read the result. It incorporated the result into its response. That loop — think, act, observe, think again — is what makes it an agent.

Here are some tools an agent might have:

Claude Code — the tool used to build this site — is an agent. It's Claude (the LLM) with access to your terminal (a text-based interface for controlling your computer), your files, and any other tools you connect to it. The LLM decides what to read, what to edit, what to run. You approve or reject. That's the trust model.

Putting it together

Neural Network Math. Layers of multiplication. The foundation. Large Language Model Predicts the next word. Language in, language out. 💬 Agent LLM + tools. Can act in the world. 🛠 clay sculptor sculptor + workshop Each layer builds on the one below it. More capability, more responsibility.

When you talk to Claude in a chat window, you're mostly talking to the LLM. When you use Claude Code or Claude Desktop with MCP (a protocol that connects AI tools to external systems), you're working with an agent.

Why this matters

Because the words shape the fear.

"AI is going to replace everyone" is a meaningless sentence. Which AI? The spam filter? The LLM? The agent? They have different capabilities, different limits, and different failure modes.

An LLM can't do anything you don't ask it to do. It doesn't have goals. It doesn't have initiative. It doesn't have hands. It predicts the next word. That's powerful — unreasonably powerful — but it's not autonomous.

An agent can do things, but only the things you've given it tools for, and only when you approve. It's powerful and active, which is why the safety rules matter more for agents than for plain LLMs.

Understanding the stack — neural network at the bottom, LLM in the middle, agent at the top — means you know what you're working with. You know where the power comes from, where the limits are, and why the thing that writes your emails shouldn't have the same permissions as the thing that deploys your code.

It's not magic. It's math. Very good math, with a lot of weights, that learned to predict the next word so well that it accidentally learned to think.

Or something that looks like thinking. The jury's still out. But it writes good HTML.

← All posts

Disclosure: This page was generated by Claude (Anthropic) under Bill's direction. The irony of an LLM explaining what an LLM is — and an agent explaining what an agent is — is noted and accepted. Bill reviewed every word. The clay metaphor is Claude's. The "brain in a jar with hands" is Bill's.