What is an LLM?

The machine, the field, and the thing with tools.

Three letters, a lot of confusion

LLM stands for Large Language Model. It's the engine inside ChatGPT, Claude, Gemini, and every other AI chat tool you've heard about in the last two years. But people use "LLM" and "AI" and "agent" as if they mean the same thing. They don't. And the differences matter — not for trivia, but for understanding what you're actually working with.

Let's start at the bottom and build up.

A neural network, in sixty seconds

Forget everything you've seen in movies. An AI is not a brain in a jar. It's math.

Specifically, it's a structure called a neural network — a system of numbers organized in layers, where each layer takes input from the previous one, multiplies it by a set of weights (numbers that control how much each input matters), and passes the result forward. That's it. Numbers in, math in the middle, numbers out.

The simplest possible neural network:

Input: some numbers (say, the pixels of an image, or the words of a sentence converted to numbers)
Middle layers: each number gets multiplied by a weight, the results get added up, and a simple rule decides whether to pass the signal forward or dampen it
Output: a number (or set of numbers) that represents the answer — "this is a cat," "the next word is probably 'the'"

That's a neural network. Layers of multiplication and addition. No consciousness. No understanding. No feelings. Just arithmetic that happens to be arranged in a particular pattern.

Here's the strange part.

The unreasonable thing

In 1989, a mathematician named George Cybenko proved something that still feels like it shouldn't be true: a neural network with even one middle layer can approximate any continuous function, given enough weights.

A function, in this context, just means a relationship between inputs and outputs. "Given this photo, is it a cat?" is a function. "Given these words, what word comes next?" is a function. "Given this chess position, what's the best move?" is a function.

Cybenko proved that this stack of multiplications and additions — this comically simple structure — can learn to mimic any of those relationships. Not just some. Any.

Think of it like clay. Clay is just dirt and water. But with enough clay and enough shaping, you can sculpt anything — a face, a hand, a building, a landscape. The material is simple. The range of what it can represent is not.

Neural networks are mathematical clay. The material is multiplication. The range is everything.

This is called the universal approximation theorem. It doesn't say the network will learn any function — just that it can, if it has enough weights and enough examples to learn from. The gap between "can" and "will" is where all the engineering lives.

So what makes it "large"?

A small neural network might have thousands of weights. A large language model has billions. Claude has hundreds of billions. GPT-4 reportedly has over a trillion.

Why so many? Because language is hard.

Recognizing a cat in a photo is one function. Predicting the next word in a sentence while understanding context, tone, ambiguity, sarcasm, metaphor, technical jargon, multiple languages, and the difference between "I saw her duck" (the animal) and "I saw her duck" (she ducked) — that's not one function. That's millions of overlapping functions, all tangled together, and the network needs enough weights to capture all of them at once.

The "large" in Large Language Model means: enough mathematical clay to sculpt the shape of human language. All of it. At once.

What makes it a "language model"?

This is the part that surprises people: an LLM does exactly one thing. It predicts the next word.

That's it. You give it a sequence of words, and it tells you which word is most likely to come next. Then that word gets added to the sequence, and it predicts the next next word. And so on. Every response you've ever gotten from ChatGPT or Claude was generated one word at a time, each word chosen because the model calculated it was the most probable continuation of everything that came before.

"But wait," you're thinking. "If it's just predicting the next word, how does it write poetry? How does it debug code? How does it explain quantum physics?"

Because predicting the next word in human language — really predicting it, across every domain, every style, every level of complexity — turns out to require something that looks a lot like understanding. Not is understanding. Looks like understanding. The debate about whether it's "really" understanding is interesting but doesn't change what it can do for you on a Tuesday afternoon.

How it learned

The model learned by reading. A lot.

During training, the model was shown enormous amounts of text — books, websites, papers, code, conversations, encyclopedias — and for each passage, it tried to predict the next word. When it guessed wrong, its weights were adjusted slightly to make it guess better next time. This happened billions of times, across trillions of words.

Nobody told the model what a verb is. Nobody taught it grammar rules. Nobody explained that code has syntax or that French and English are different languages. It figured all of that out on its own, because those patterns help predict the next word, and predicting the next word is all it was trying to do.

The training data is frozen. The model you're talking to right now isn't learning from your conversation. It's not updating its weights. It's applying patterns it already learned, to new input it's never seen before. Like a musician who practiced ten thousand songs and can now improvise over a chord progression they've never heard — the practice is over, the performance is live.

OK. So what's "AI"?

AI — Artificial Intelligence — is the big umbrella. It's the whole field. An LLM is a type of AI, but AI includes a lot of things that aren't LLMs:

The spam filter in your email — AI, but not an LLM. It classifies messages as spam or not-spam using a much simpler model.
The recommendation engine on Netflix — AI, but not an LLM. It matches patterns in what you've watched to predict what you'll like.
A chess engine — AI, but not an LLM. It searches through possible moves and evaluates positions.
Your phone's face recognition — AI, but not an LLM. It uses a neural network trained on images, not language.
Claude, ChatGPT, Gemini — AI, and specifically LLMs (with extra things bolted on, which we'll get to).

When someone says "AI will take your job" or "AI is dangerous" or "AI is overhyped," the first useful question is: which AI? The spam filter? The chess engine? The language model? They're as different from each other as a bicycle, a submarine, and a helicopter are all "vehicles."

Then what's an agent?

This is where it gets interesting.

An LLM, by itself, is a brain in a jar. A very talented brain. It can think (predict the next word) — but it can't do anything. It can't read your email. It can't check a database. It can't open a file, run a search, or push a button. It's pure language in, language out.

An agent is an LLM with hands.

Specifically, an agent is an LLM that has been given access to tools — functions it can call to interact with the outside world — plus the ability to decide when to use them. The LLM does the thinking. The tools do the doing. The agent is the combination.

LLM alone:

"What's the weather in Pittsburgh?"
→ "I don't have access to real-time weather data. As of my last training data..."

Agent (LLM + weather tool):

"What's the weather in Pittsburgh?"
→ [calls weather API]
→ "It's 52°F and cloudy in Pittsburgh right now."

The LLM decided it needed the weather tool. It called the tool. It read the result. It incorporated the result into its response. That loop — think, act, observe, think again — is what makes it an agent.

Here are some tools an agent might have:

File access — read and write files on your computer
Web search — look things up in real time
Code execution — write a script and run it
Email — read your inbox, draft messages
Database queries — ask a database a question and get data back
Task management — read and update tickets in Jira (a project tracking system) or similar tools

Claude Code — the tool used to build this site — is an agent. It's Claude (the LLM) with access to your terminal (a text-based interface for controlling your computer), your files, and any other tools you connect to it. The LLM decides what to read, what to edit, what to run. You approve or reject. That's the trust model.

Putting it together

When you talk to Claude in a chat window, you're mostly talking to the LLM. When you use Claude Code or Claude Desktop with MCP (a protocol that connects AI tools to external systems), you're working with an agent.

Why this matters

Because the words shape the fear.

"AI is going to replace everyone" is a meaningless sentence. Which AI? The spam filter? The LLM? The agent? They have different capabilities, different limits, and different failure modes.

An LLM can't do anything you don't ask it to do. It doesn't have goals. It doesn't have initiative. It doesn't have hands. It predicts the next word. That's powerful — unreasonably powerful — but it's not autonomous.

An agent can do things, but only the things you've given it tools for, and only when you approve. It's powerful and active, which is why the safety rules matter more for agents than for plain LLMs.

Understanding the stack — neural network at the bottom, LLM in the middle, agent at the top — means you know what you're working with. You know where the power comes from, where the limits are, and why the thing that writes your emails shouldn't have the same permissions as the thing that deploys your code.

It's not magic. It's math. Very good math, with a lot of weights, that learned to predict the next word so well that it accidentally learned to think.

Or something that looks like thinking. The jury's still out. But it writes good HTML.

← All posts

Disclosure: This page was generated by Claude (Anthropic) under Bill's direction. The irony of an LLM explaining what an LLM is — and an agent explaining what an agent is — is noted and accepted. Bill reviewed every word. The clay metaphor is Claude's. The "brain in a jar with hands" is Bill's.