How LLMs Actually Work

If you use ChatGPT, Claude, or Gemini and want to actually understand what is happening under the hood, this is for you.

No math, no jargon you have to look up. Just a clear mental model you can carry into everything else you do with AI.

1. What an LLM actually is

LLM stands for Large Language Model. Strip away the branding and it is one thing: a very large program that predicts the next piece of text, given the text so far.

That is it. It is not a database it looks things up in. It is not a person reasoning behind a screen. It is a system that has read a staggering amount of text and learned the patterns in how language tends to continue.

Everything impressive it does, answering questions, writing code, summarizing a document, comes out of that single ability, applied at enormous scale.

2. It is a next-word predictor

Give the model "The capital of France is" and it will predict "Paris" as the most likely next word. Give it "Once upon a" and it predicts "time."

It does this one piece at a time. It predicts the next piece, adds it to the text, then predicts the piece after that, and so on. A whole essay is just this loop running hundreds of times.

The surprising part is that "predict the next word well enough, at a big enough scale" turns out to require real understanding of grammar, facts, tone, and logic. To finish your sentences well, it had to learn how the world tends to be described.

3. Tokens, how it reads

The model does not read letters or whole words. It reads tokens: chunks of text, often a word or a piece of a word. "Understanding" might be split into "under", "stand", and "ing."

Two practical things follow from this:

Cost and limits are counted in tokens, not words. A rough rule: 1 token is about 4 characters, or about 0.75 words.
Spelling-level tasks are hard for it. Because it sees tokens, not letters, asking "how many r's are in strawberry" can trip it up. It is not reading the way you are.

4. How it learned

Training happens in two big stages.

Pretraining. The model reads a huge slice of the internet, books, and code, and plays one game over and over: cover the next token, guess it, check the answer, adjust. Repeat billions of times. This is where it soaks up grammar, facts, and reasoning patterns. It is expensive and slow, and it produces a model that can continue text but is not yet helpful or safe.

Fine-tuning and alignment. Next, humans show it examples of good answers and rate its responses (a process often called RLHF, reinforcement learning from human feedback). This is what turns a raw text-predictor into an assistant that follows instructions, stays helpful, and refuses harmful requests.

One key consequence: the model has a training cutoff. It only knows what existed when its training data was collected. Anything after that, it has not seen, unless a tool fetches it at the time you ask.

5. Attention, why it got good

The breakthrough that made modern LLMs possible is the transformer, and its core trick is called attention.

Attention lets the model, when predicting the next token, weigh which earlier words matter most right now. In "The trophy did not fit in the suitcase because it was too big," attention is how the model figures out that "it" means the trophy, not the suitcase.

You do not need the math. The takeaway: the model is constantly deciding which parts of your input to focus on. Clear, well-structured input gives it better things to attend to.

6. The context window

The context window is everything the model can "see" at once: your prompt, the conversation so far, and any documents you paste in. It is measured in tokens.

Think of it as short-term memory, not long-term memory. The model does not remember you between separate chats. Within one chat, once the conversation grows past the window, the oldest parts fall out of view.

Practical effects:

Paste the relevant material into the chat; do not assume it "knows" your document unless it is in the window.
In very long chats, it can lose track of something you said far earlier. Restate key details when it matters.

7. Why it makes things up

When a model states something false with total confidence, that is a hallucination. It is not lying, and it is not broken. It is doing exactly what it was built to do: produce the most plausible-sounding continuation.

If the true answer is not strongly represented in what it learned, it will still generate a fluent, confident guess, because fluent and confident is what next-token prediction rewards. It has no built-in sense of "I do not actually know this."

This is the single most important thing to internalize: the model optimizes for plausible, not for true. Treat its output as a sharp first draft, and verify anything that matters, especially names, numbers, quotes, and citations.

8. What it is good at, and what it isn't

Good at: drafting and rewriting, summarizing, explaining concepts, translating, brainstorming, writing and debugging code, transforming text from one format to another, and getting you unstuck on a blank page.

Weak at: exact arithmetic, counting characters, anything that needs current or private information it was not given, and any claim where being confidently wrong is costly. It also has no real memory and no genuine understanding of consequences.

The fix for most of the weak spots is tools: connect it to a calculator, a search engine, or your own data (see RAG), and you cover the gaps while keeping the strengths.

9. How to get better answers

You do not need tricks. You need to give a next-token predictor good things to predict from.

Give context. Paste the document, the example, the constraints. The model only knows what is in the window.
Be specific about the output. Say the format, length, and audience you want.
Show an example. One good example of what "right" looks like beats a paragraph of description.
Ask it to work step by step for anything with reasoning. Thinking out loud genuinely improves its answers.
Verify the important bits. Especially facts, figures, and citations.

That is the whole game. Once you see the model as a powerful, fallible pattern-machine that you steer with context, everything else about working with AI gets simpler.

Want to put this to work? Pick a learning path or try it in the playground.

Table of Contents