How AI Agents Actually Work — Inside Claude Code

The Big Idea

What this project is, what an AI agent really is, and why most people get it wrong

You've used AI coding tools. But do you know how they work?

Tools like Claude Code, Cursor, and Copilot feel like magic. You type a request, and code appears. But under the hood, there's a surprisingly simple pattern making it all work.

This project — Learn Claude Code — reverse-engineers exactly how Claude Code operates, broken into 12 progressive sessions. By the end of this course, you'll understand every mechanism that makes an AI coding agent tick.

The claim that changes everything

💡

The Model IS the Agent

An AI agent isn't a framework, a drag-and-drop workflow, or a prompt chain. The agent is the neural network — the trained model itself. Everything else is just the vehicle it drives.

This distinction matters because it tells you where to focus your energy. You don't build intelligence — Anthropic, OpenAI, and Google already did that. You build the harness — the world the intelligence operates in.

The driver and the vehicle

Think of a Formula 1 team. The driver (Lewis Hamilton, Max Verstappen) is the talent — years of training, instinct, split-second decisions. The car is everything the driver needs to express that talent: engine, tires, steering, telemetry, pit radio.

The driver doesn't design the car. The car designers don't race. But without a great car, even the best driver can't win.

🧠

The Agent (Driver)

Claude, GPT, Gemini — the trained LLM. It decides, reasons, and chooses actions. You don't build this.

🔧

The Harness (Vehicle)

Tools, knowledge, context management, permissions. Everything the model needs to act in a specific domain. This is what you build.

What's inside a harness?

FORMULA


Harness = Tools + Knowledge + Observation
        + Action Interfaces + Permissions

  Tools:        file read/write, shell, browser
  Knowledge:    docs, API specs, style guides
  Observation:   git diff, error logs, state
  Action:        CLI commands, API calls
  Permissions:   sandboxing, approval flows

PLAIN ENGLISH

A harness is everything the AI needs to work...

...split into five categories.

Tools = the hands. What can it physically do?

Knowledge = the expertise. What does it know?

Observation = the eyes. What can it see?

Action = the output. How does it affect the world?

Permissions = the guardrails. What's it NOT allowed to do?

Claude Code, stripped to its essence

Here's the entire architecture of Claude Code — the AI coding tool used by thousands of engineers. It fits in 7 lines:

ARCHITECTURE


Claude Code = one agent loop
  + tools (bash, read, write, edit...)
  + on-demand skill loading
  + context compression
  + subagent spawning
  + task system with dependency graph
  + team coordination

WHAT EACH PIECE DOES

One simple loop keeps the agent running...

Tools let the agent read/write files, run commands...

Skills load specialized knowledge only when needed...

Compression lets the agent forget strategically...

Subagents handle subtasks in isolated contexts...

Tasks persist goals to disk so they survive restarts...

Teams let multiple agents work on the same project.

That's it. Every component is a harness mechanism — part of the vehicle. The agent? It's Claude. A model trained by Anthropic. The harness gives Claude hands, eyes, and a workspace.

12 sessions, one mechanism at a time

This project teaches harness engineering through 12 progressive sessions. Each one adds exactly one mechanism to the codebase, building on the last:

🔄

Loop

📋

Plan

📚

Know

💾

Persist

👥

Team

Click "Next Step" to see the learning journey

Check your understanding

A new startup says they built an "AI agent" using drag-and-drop workflow blocks wired to ChatGPT. What did they actually build?

You want to build an AI agent that manages a farm. What would change compared to a coding agent?

The Engine

The deceptively simple while loop that powers every AI coding agent

Think of a postal sorting office

Imagine a postal sorting center. A letter arrives. The sorter reads it, decides which department handles it, sends it there, waits for a reply, then checks if there are more letters. If yes, repeat. If no, the shift is done.

That's literally how an AI agent works. The "letter" is a message. The "sorter" is the model. The "departments" are tools. And the whole thing runs in a while loop.

The entire secret in one diagram

👤

You

🧠

Model

🔧

Tools

Click "Next Step" to trace a real request

The real code — all 15 lines

This is from agents/s01_agent_loop.py — the very first session. The entire agent loop that powers everything:

CODE


def agent_loop(messages: list):
    while True:
        response = client.messages.create(
            model=MODEL, system=SYSTEM,
            messages=messages, tools=TOOLS,
            max_tokens=8000,
        )
        messages.append({"role": "assistant",
                         "content": response.content})
        if response.stop_reason != "tool_use":
            return
        results = []
        for block in response.content:
            if block.type == "tool_use":
                output = run_bash(block.input["command"])
                results.append({...})
        messages.append({"role": "user",
                         "content": results})

PLAIN ENGLISH

Define the agent loop. It takes a conversation history.

Keep looping forever (until we explicitly stop).

Send the full conversation to Claude and ask for a response.

Tell it which AI model to use and what personality to have.

Give it the conversation so far AND the list of available tools.

Limit the response length.

(end of the API call)

Save what the AI said into the conversation history.

(so next time around the loop, it remembers what it already said)

KEY LINE: Did the model ask to use a tool? If NOT...

...then we're done. The model has nothing left to do. Exit.

Otherwise, collect the results of each tool call.

Look through everything the model said...

...find each tool request...

...actually run it (in this case, a shell command)...

...and save the result.

Add the tool results back to the conversation and loop again.

(The model will see what happened and decide what to do next.)

💡

The Golden Rule of Agent Loops

The MODEL decides when to call tools and when to stop. The CODE just executes what the model asks for. This is the fundamental difference between an agent and a script — the AI is in charge, not the programmer.

Watch it happen: a real conversation

Here's what happens inside the loop when you ask "list all Python files":

Adding tools — the loop never changes

Session 1 has one tool: bash. Session 2 adds three more: read_file, write_file, edit_file. But the loop code is identical. Adding a tool just means adding one handler to a dispatch map:

CODE


TOOL_HANDLERS = {
    "bash":       lambda **kw: run_bash(kw["command"]),
    "read_file":  lambda **kw: run_read(kw["path"]),
    "write_file": lambda **kw: run_write(kw["path"], kw["content"]),
    "edit_file":  lambda **kw: run_edit(kw["path"], ...),
}

PLAIN ENGLISH

This is the dispatch map — a directory of tool names → handlers.

When the model says "bash", run the shell command handler.

When it says "read_file", run the file reading handler.

When it says "write_file", create or overwrite a file.

When it says "edit_file", find-and-replace text in a file.

That's the whole map. Need a new tool? Add one line.

🔑

Why this matters for you

When you tell an AI coding tool "read the README," it's using this exact pattern — looking up "read_file" in a dispatch map. Understanding this means you can predict what tools are available and phrase your requests to match. Instead of "show me what's in README.md," you can say "read file README.md" — directly mapping to how the tool works.

Check your understanding

The agent loop keeps running until... what exactly stops it?

You want to add a "search_web" tool to the agent. What do you change?

Staying on Track

How the agent plans its work, tracks progress, and delegates subtasks

An agent without a plan drifts

Imagine a pilot flying across the Pacific. Without a flight plan, they'd just fly in a vague direction and maybe land somewhere. With a checklist of waypoints, they know exactly where they are, what's next, and how far they've come.

AI agents have the same problem. Give them a big task ("build a website with login") and they can lose track of what they've done, what's left, and whether they're even on the right track. The fix? Give them a todo list they manage themselves.

The agent's flight plan: TodoManager

In session 3 (s03_todo_write.py), the agent gets a todo tool. It creates a plan, marks tasks in-progress, then marks them done. Here's what the agent's internal checklist looks like:

AGENT'S TODO LIST


[ ] #1: Set up project structure
[>] #2: Create login form  ← doing now
[x] #3: Add CSS styles     ← done
[ ] #4: Connect to database
[ ] #5: Add error handling

(2/5 completed)

WHAT'S HAPPENING

Not started yet — waiting its turn.

Currently working on this — only ONE can be active at a time.

Finished! Checked off permanently.

Coming up next.

Coming up after that.

A progress tracker so you (and the agent) can see the state at a glance.

The nudge: "Hey, update your todos"

Agents can forget to update their plan — they get absorbed in the work. The fix is hilariously simple: if the agent hasn't touched its todo list in 3 tool calls, the harness injects a gentle reminder:

CODE


rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
if rounds_since_todo >= 3:
    results.append({
        "type": "text",
        "text": "<reminder>Update your todos.</reminder>"
    })

PLAIN ENGLISH

Count how many rounds since the agent last updated its todos.

If it's been 3 or more rounds without an update...

...sneak a reminder into the tool results...

(disguised as a regular message)

...that says "Hey, update your plan!"

(The agent sees this and goes "oh right, let me check my list.")

Delegation: clean rooms for subtasks

When a task gets complex, the main agent can spawn a subagent — a child agent with a completely fresh, empty memory. The subagent works on one specific subtask, then returns only a summary. Like sending a colleague to investigate something and bring back a one-page report.

💡

Context Isolation = Process Isolation

The subagent pattern is a form of process isolation. Each child gets a fresh conversation history (messages = []), so the noise from reading 20 files doesn't pollute the parent's thinking. The parent stays focused on the big picture.

The code: fresh context, summary return

CODE


def run_subagent(prompt: str) -> str:
    sub_messages = [{"role": "user",
                     "content": prompt}]
    for _ in range(30):
        response = client.messages.create(
            model=MODEL,
            system=SUBAGENT_SYSTEM,
            messages=sub_messages, ...)
        if response.stop_reason != "tool_use":
            break
        # execute tools, append results...
    return "...".join(b.text for b ...)

PLAIN ENGLISH

Define a function that spawns a subagent with one task.

Start with a fresh, empty conversation.

The only thing in it is the task prompt.

Safety limit: max 30 loop cycles (so it can't run forever).

Run the same agent loop pattern we learned in Module 2.

(Same model, but a different system prompt for subagents.)

(Using the child's fresh message history, not the parent's.)

When the model stops calling tools...

...exit the loop.

(While it's working, execute tools and feed results back.)

Return ONLY the final text summary — everything else is discarded.

Check your understanding

Your agent needs to investigate two unrelated things: how the payment system works AND how the notification system works. What approach keeps things cleanest?

What problem does the "nag reminder" solve?

Knowledge & Memory

How the agent loads expertise on demand and keeps working forever without running out of memory

The library card catalog

Imagine a university library. You don't carry every book in your backpack — you carry a catalog card that lists what books exist. When you need one, you walk to the shelf and grab it.

AI agents work the same way. Stuffing all knowledge into the system prompt wastes precious context window space. Instead, you show the catalog upfront and load the full book only when needed.

Two-layer skill loading

Session 5 (s05_skill_loading.py) implements this library pattern with two layers:

📇

Layer 1: The Catalog

Short descriptions of each skill go in the system prompt. Costs ~100 tokens per skill — almost free.

📖

Layer 2: The Book

When the model calls load_skill("pdf"), the full skill body is returned via tool_result. The model reads it, follows the instructions, then the knowledge stays in context as long as needed.

SYSTEM PROMPT (LAYER 1)


You are a coding agent.
Skills available:
  - pdf: Process PDF files...
  - code-review: Review code...
  - mcp-builder: Build MCP servers...

WHAT THE AI SEES

The AI's identity and mission.

A short menu of available skills — just names and one-liners.

If someone asks about PDFs, I should load this skill first.

If someone asks for a code review, I have that skill.

If someone wants to build an MCP server, there's a skill for that.

Where skills live: SKILL.md files

Each skill is a folder with a SKILL.md file. The file has YAML frontmatter (the catalog entry) and a body (the full instructions):

SKILL.md FILE


---
name: pdf
description: "Process PDF files"
---

# PDF Processing Skill

Step 1: Extract text using PyPDF2...
Step 2: If scanned, use OCR via...
Step 3: Structure the output as...

WHAT EACH PART DOES

Start of the metadata section (frontmatter).

Skill name — used to call load_skill("pdf").

Short description — shown in Layer 1 (the catalog).

End of metadata. Everything below is the body.

The full instructions start here — this is Layer 2.

Detailed step-by-step guidance the agent follows.

Edge cases and alternative approaches.

Output formatting rules.

The memory problem: context fills up

Every message, every tool call, every file the agent reads — it all stays in the conversation history. After a long session, the context window fills up and the agent can't think anymore.

Session 6 (s06_context_compact.py) solves this with a three-layer compression pipeline — like a note-taking system that keeps essential information and discards the noise:

🔬

Micro

📦

Auto

🎛️

Manual

Click "Next Step" to see the three compression layers

Before and after: compression in action

BEFORE (BLOATED)


# 47,000 tokens of conversation:
User: "Fix the login bug"
AI: calls bash("grep -r login...")
Result: 500 lines of grep output
AI: calls read_file("auth.py")
Result: 200 lines of Python code
AI: calls bash("python -m pytest...")
Result: 300 lines of test output
AI: calls edit_file("auth.py", ...)
Result: "Edited auth.py"
# ... 30 more tool calls ...

AFTER (COMPRESSED)

~2,000 tokens after compression:

[Conversation compressed. Transcript: .transcripts/transcript_1712345678.jsonl]

Summary:

Fixed login bug in auth.py. The issue was a missing token refresh check on line 42.

Changed the validation logic to check token expiry before auth.

All 12 tests passing. auth.py and middleware.py were modified.

Full transcript saved to disk if we ever need the details.

💡

Strategic Forgetting

Context compression is strategic forgetting — the agent deliberately discards details it no longer needs, while keeping key decisions, current state, and what's left to do. This is why AI agents can work on long, complex tasks without losing their mind. The full transcript is always saved to disk, so nothing is truly lost.

Check your understanding

Why does the agent use two-layer skill loading instead of putting all skill instructions in the system prompt?

In micro_compact (Layer 1), why are read_file results preserved while other tool results get compressed?

Persistence

How the agent saves its goals to disk, tracks dependencies, and runs things in the background

The construction site whiteboard

On a construction site, the foreman doesn't keep the project plan in their head. It's on a whiteboard in the office — visible to everyone, surviving shift changes, and showing what blocks what. "Can't pour concrete until the foundation is inspected. Can't do plumbing until concrete is set."

The TodoManager from Module 3 lives in conversation memory — if the context gets compressed, the todos vanish. Session 7 (s07_task_system.py) moves the plan to disk, where it survives anything.

Tasks as files: surviving compression

Each task is a JSON file in a .tasks/ folder. Even if the conversation is completely reset, the tasks are still there:

.tasks/task_2.json


{
  "id": 2,
  "subject": "Add user authentication",
  "description": "JWT-based login flow",
  "status": "pending",
  "blockedBy": [1],
  "owner": ""
}

PLAIN ENGLISH

Open this task's data file.

Task number 2.

What needs to be done — the goal.

More detail about how to do it.

Current state: not started yet.

KEY: Can't start until Task 1 is finished. This is the dependency.

No one is working on it yet (later used for team assignments).

End of task data.

Dependencies: what blocks what

When a task is completed, it automatically unblocks everything waiting on it. Like dominoes — finishing "set up database" automatically frees "add user auth" to start:

1️⃣

Database

2️⃣

Auth

3️⃣

API

Click "Next Step" to see dependencies resolve

Todos vs Tasks: what changed?

📝

Todos (Session 3)

Live in conversation memory. Vanish on compression. Simple flat list. Good for single-session work.

📋

Tasks (Session 7)

Live on disk as JSON files. Survive compression, restarts, even new conversations. Have dependencies (blockedBy). Can be assigned to team members.

💡

State That Outlives the Conversation

This is one of the most important patterns in software: persisting state to disk. The conversation can be compressed, the agent can restart, but the task board on disk remains intact. Any new conversation can pick up where the last one left off by reading the .tasks/ folder.

Background work: the agent keeps thinking

Some operations take time — running a test suite, building a project, installing packages. Session 8 (s08_background_tasks.py) lets the agent kick off slow commands in a daemon thread and keep working on other things. When the background task finishes, a notification is injected into the conversation.

Check your understanding

The agent has been working for 2 hours and auto_compact triggers, compressing the entire conversation into a summary. What happens to the task plan?

Task 2 has `"blockedBy": [1]`. What happens when Task 1 is marked completed?

The Team

From solo agent to an entire squad — teammates, mailboxes, protocols, and autonomous work

From one-person crew to a film production

So far, everything has been a solo artist. One agent, one task. But real projects need a team — like a film production. You have a director (lead agent), a cinematographer (frontend specialist), a sound engineer (backend specialist), and a props department (testing specialist). Everyone works in parallel, communicates through walkie-talkies, and follows shared protocols.

Sessions 9-12 build this entire coordination system, one layer at a time.

Subagents were disposable. Teammates persist.

🔍

Subagent (Session 4)

Spawned → does one task → returns summary → destroyed. Like hiring a temp worker. No memory between tasks.

👥

Teammate (Session 9)

Spawned → works → goes idle → gets new work → works → ... → shuts down when asked. Like a full-time employee. Has a name, a role, and a persistent inbox.

Communication: file-based mailboxes

How do teammates talk? Through JSONL files — each teammate gets their own inbox file. Sending a message is literally appending a line to a file. Reading the inbox is reading the file and clearing it.

.team/inbox/alice.jsonl


{"type":"message","from":"lead",
 "content":"Fix the login bug in auth.py"}
{"type":"message","from":"bob",
 "content":"I updated the DB schema, you may
 need to adjust the auth queries"}
{"type":"broadcast","from":"lead",
 "content":"Deploy freeze at 5pm today"}

PLAIN ENGLISH

A direct message from the lead agent to Alice.

"Here's your assignment."

A message from another teammate, Bob.

"Heads up — I changed something that

might affect your work."

A broadcast sent to ALL teammates at once.

"Everyone stop pushing code at 5pm."

🔑

Why files? Why not a database?

Files are the simplest possible IPC mechanism. No setup, no server, no dependencies. Append a line = send a message. Read and clear = check inbox. It works across threads, processes, even machines (via shared filesystem). The simplicity is the point.

Watch a team coordinate

The lead spawns two teammates, assigns them work, and they communicate through their mailboxes:

The final evolution: autonomous teams

The journey from Session 9 to Session 12 adds three more layers of sophistication:

Team Protocols

Shared communication rules: state machines for shutdown approval and plan review. One request-response pattern drives all negotiation between agents.

Autonomous Agents

Agents don't wait for assignments. They scan the task board, find unblocked tasks, claim them, and start working. The lead doesn't micromanage — teammates self-organize.

Worktree Isolation

Each agent works in its own git worktree — a separate copy of the codebase. No file conflicts. Each agent's workspace is bound to their task ID.

The complete architecture — all 12 sessions

Every session added exactly one mechanism. None changed the core loop. Here's the full stack:

THE FULL HARNESS


Phase 1: THE LOOP
s01  while + stop_reason          [1 tool]
s02  dispatch map: name→handler   [4 tools]

Phase 2: PLANNING & KNOWLEDGE
s03  TodoManager + nag reminder   [5 tools]
s04  fresh messages[] per child   [5 tools]
s05  SKILL.md via tool_result     [5 tools]
s06  3-layer compression          [5 tools]

Phase 3: PERSISTENCE
s07  file-based CRUD + deps       [8 tools]
s08  daemon threads + notify      [6 tools]

Phase 4: TEAMS
s09  teammates + JSONL mailboxes  [9 tools]
s10  shutdown + plan approval FSM [12 tools]
s11  idle cycle + auto-claim      [14 tools]
s12  worktree isolation           [16 tools]

THE JOURNEY

Phase 1: Making the agent work at all.

The core loop — ask the model, run tools, repeat.

More tools, same loop — read, write, edit files.

Phase 2: Making the agent smart.

Self-tracking progress with gentle reminders.

Delegating subtasks with clean context isolation.

Loading specialized knowledge only when needed.

Forgetting strategically to work forever.

Phase 3: Making the agent reliable.

Goals that survive restarts, with dependency ordering.

Long tasks run in background; agent keeps thinking.

Phase 4: Making a team of agents.

Named agents talking through file-based inboxes.

Shared rules for coordination and approval.

Agents self-assign work from the task board.

Each agent in its own workspace — no conflicts.

💡

The Lesson: Trust the Model, Build the World

The most elegant agent harness is one that provides powerful tools, rich knowledge, clean context, and safe boundaries — then gets out of the way. You don't script the intelligence. You don't build decision trees. You build the world the intelligence inhabits. Build great harnesses. The agent will do the rest.

Final check

How do teammates communicate with each other?

What makes Session 11 agents "autonomous" compared to Session 9 teammates?

Two agents are working on different features simultaneously. Without Session 12, what would go wrong?

You now understand how AI agents work.

From a single while True loop to an autonomous team of agents working in isolated workspaces — the entire progression is just harness engineering. The model is always the driver. You just built a better vehicle.

Ready to build your own? Check out the Learn Claude Code repository for the full source code, or start with python agents/s01_agent_loop.py and see the magic happen yourself.

The Big Idea

You've used AI coding tools. But do you know how they work?

The claim that changes everything

The driver and the vehicle

The Agent (Driver)

The Harness (Vehicle)

What's inside a harness?

Claude Code, stripped to its essence

12 sessions, one mechanism at a time

Check your understanding

A new startup says they built an "AI agent" using drag-and-drop workflow blocks wired to ChatGPT. What did they actually build?

You want to build an AI agent that manages a farm. What would change compared to a coding agent?

The Engine

Think of a postal sorting office

The entire secret in one diagram

The real code — all 15 lines

Watch it happen: a real conversation

Adding tools — the loop never changes

Check your understanding

The agent loop keeps running until... what exactly stops it?

You want to add a "search_web" tool to the agent. What do you change?

Staying on Track

An agent without a plan drifts

The agent's flight plan: TodoManager

The nudge: "Hey, update your todos"

Delegation: clean rooms for subtasks

The code: fresh context, summary return

Check your understanding

Your agent needs to investigate two unrelated things: how the payment system works AND how the notification system works. What approach keeps things cleanest?

What problem does the "nag reminder" solve?

Knowledge & Memory

The library card catalog

Two-layer skill loading

Layer 1: The Catalog

Layer 2: The Book

Where skills live: SKILL.md files

The memory problem: context fills up

Before and after: compression in action

Check your understanding

Why does the agent use two-layer skill loading instead of putting all skill instructions in the system prompt?

In micro_compact (Layer 1), why are read_file results preserved while other tool results get compressed?

Persistence

The construction site whiteboard

Tasks as files: surviving compression

Dependencies: what blocks what

Todos vs Tasks: what changed?

Todos (Session 3)

Tasks (Session 7)

Background work: the agent keeps thinking

Check your understanding

The agent has been working for 2 hours and auto_compact triggers, compressing the entire conversation into a summary. What happens to the task plan?

Task 2 has "blockedBy": [1]. What happens when Task 1 is marked completed?

The Team

From one-person crew to a film production

Subagents were disposable. Teammates persist.

Subagent (Session 4)

Teammate (Session 9)

Communication: file-based mailboxes

Watch a team coordinate

The final evolution: autonomous teams

The complete architecture — all 12 sessions

Final check

How do teammates communicate with each other?

What makes Session 11 agents "autonomous" compared to Session 9 teammates?

Two agents are working on different features simultaneously. Without Session 12, what would go wrong?

You now understand how AI agents work.

Task 2 has `"blockedBy": [1]`. What happens when Task 1 is marked completed?