The Big Idea
What this project is, what an AI agent really is, and why most people get it wrong
You've used AI coding tools. But do you know how they work?
Tools like Claude Code, Cursor, and Copilot feel like magic. You type a request, and code appears. But under the hood, there's a surprisingly simple pattern making it all work.
This project — Learn Claude Code — reverse-engineers exactly how Claude Code operates, broken into 12 progressive sessions. By the end of this course, you'll understand every mechanism that makes an AI coding agent tick.
The claim that changes everything
An AI agent isn't a framework, a drag-and-drop workflow, or a prompt chain. The agent is the neural network — the trained model itself. Everything else is just the vehicle it drives.
This distinction matters because it tells you where to focus your energy. You don't build intelligence — Anthropic, OpenAI, and Google already did that. You build the harness — the world the intelligence operates in.
The driver and the vehicle
Think of a Formula 1 team. The driver (Lewis Hamilton, Max Verstappen) is the talent — years of training, instinct, split-second decisions. The car is everything the driver needs to express that talent: engine, tires, steering, telemetry, pit radio.
The driver doesn't design the car. The car designers don't race. But without a great car, even the best driver can't win.
The Agent (Driver)
Claude, GPT, Gemini — the trained LLM. It decides, reasons, and chooses actions. You don't build this.
The Harness (Vehicle)
Tools, knowledge, context management, permissions. Everything the model needs to act in a specific domain. This is what you build.
What's inside a harness?
Harness = Tools + Knowledge + Observation
+ Action Interfaces + Permissions
Tools: file read/write, shell, browser
Knowledge: docs, API specs, style guides
Observation: git diff, error logs, state
Action: CLI commands, API calls
Permissions: sandboxing, approval flows
A harness is everything the AI needs to work...
...split into five categories.
Tools = the hands. What can it physically do?
Knowledge = the expertise. What does it know?
Observation = the eyes. What can it see?
Action = the output. How does it affect the world?
Permissions = the guardrails. What's it NOT allowed to do?
Claude Code, stripped to its essence
Here's the entire architecture of Claude Code — the AI coding tool used by thousands of engineers. It fits in 7 lines:
Claude Code = one agent loop
+ tools (bash, read, write, edit...)
+ on-demand skill loading
+ context compression
+ subagent spawning
+ task system with dependency graph
+ team coordination
One simple loop keeps the agent running...
Tools let the agent read/write files, run commands...
Skills load specialized knowledge only when needed...
Compression lets the agent forget strategically...
Subagents handle subtasks in isolated contexts...
Tasks persist goals to disk so they survive restarts...
Teams let multiple agents work on the same project.
That's it. Every component is a harness mechanism — part of the vehicle. The agent? It's Claude. A model trained by Anthropic. The harness gives Claude hands, eyes, and a workspace.
12 sessions, one mechanism at a time
This project teaches harness engineering through 12 progressive sessions. Each one adds exactly one mechanism to the codebase, building on the last:
Check your understanding
A new startup says they built an "AI agent" using drag-and-drop workflow blocks wired to ChatGPT. What did they actually build?
You want to build an AI agent that manages a farm. What would change compared to a coding agent?
The Engine
The deceptively simple while loop that powers every AI coding agent
Think of a postal sorting office
Imagine a postal sorting center. A letter arrives. The sorter reads it, decides which department handles it, sends it there, waits for a reply, then checks if there are more letters. If yes, repeat. If no, the shift is done.
That's literally how an AI agent works. The "letter" is a message. The "sorter" is the model. The "departments" are tools. And the whole thing runs in a while loop.
The entire secret in one diagram
The real code — all 15 lines
This is from agents/s01_agent_loop.py — the very first session. The entire agent loop that powers everything:
def agent_loop(messages: list):
while True:
response = client.messages.create(
model=MODEL, system=SYSTEM,
messages=messages, tools=TOOLS,
max_tokens=8000,
)
messages.append({"role": "assistant",
"content": response.content})
if response.stop_reason != "tool_use":
return
results = []
for block in response.content:
if block.type == "tool_use":
output = run_bash(block.input["command"])
results.append({...})
messages.append({"role": "user",
"content": results})
Define the agent loop. It takes a conversation history.
Keep looping forever (until we explicitly stop).
Send the full conversation to Claude and ask for a response.
Tell it which AI model to use and what personality to have.
Give it the conversation so far AND the list of available tools.
Limit the response length.
(end of the API call)
Save what the AI said into the conversation history.
(so next time around the loop, it remembers what it already said)
KEY LINE: Did the model ask to use a tool? If NOT...
...then we're done. The model has nothing left to do. Exit.
Otherwise, collect the results of each tool call.
Look through everything the model said...
...find each tool request...
...actually run it (in this case, a shell command)...
...and save the result.
Add the tool results back to the conversation and loop again.
(The model will see what happened and decide what to do next.)
The MODEL decides when to call tools and when to stop. The CODE just executes what the model asks for. This is the fundamental difference between an agent and a script — the AI is in charge, not the programmer.
Watch it happen: a real conversation
Here's what happens inside the loop when you ask "list all Python files":
Adding tools — the loop never changes
Session 1 has one tool: bash. Session 2 adds three more: read_file, write_file, edit_file. But the loop code is identical. Adding a tool just means adding one handler to a dispatch map:
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
"read_file": lambda **kw: run_read(kw["path"]),
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
"edit_file": lambda **kw: run_edit(kw["path"], ...),
}
This is the dispatch map — a directory of tool names → handlers.
When the model says "bash", run the shell command handler.
When it says "read_file", run the file reading handler.
When it says "write_file", create or overwrite a file.
When it says "edit_file", find-and-replace text in a file.
That's the whole map. Need a new tool? Add one line.
When you tell an AI coding tool "read the README," it's using this exact pattern — looking up "read_file" in a dispatch map. Understanding this means you can predict what tools are available and phrase your requests to match. Instead of "show me what's in README.md," you can say "read file README.md" — directly mapping to how the tool works.
Check your understanding
The agent loop keeps running until... what exactly stops it?
You want to add a "search_web" tool to the agent. What do you change?
Staying on Track
How the agent plans its work, tracks progress, and delegates subtasks
An agent without a plan drifts
Imagine a pilot flying across the Pacific. Without a flight plan, they'd just fly in a vague direction and maybe land somewhere. With a checklist of waypoints, they know exactly where they are, what's next, and how far they've come.
AI agents have the same problem. Give them a big task ("build a website with login") and they can lose track of what they've done, what's left, and whether they're even on the right track. The fix? Give them a todo list they manage themselves.
The agent's flight plan: TodoManager
In session 3 (s03_todo_write.py), the agent gets a todo tool. It creates a plan, marks tasks in-progress, then marks them done. Here's what the agent's internal checklist looks like:
[ ] #1: Set up project structure
[>] #2: Create login form ← doing now
[x] #3: Add CSS styles ← done
[ ] #4: Connect to database
[ ] #5: Add error handling
(2/5 completed)
Not started yet — waiting its turn.
Currently working on this — only ONE can be active at a time.
Finished! Checked off permanently.
Coming up next.
Coming up after that.
A progress tracker so you (and the agent) can see the state at a glance.
The nudge: "Hey, update your todos"
Agents can forget to update their plan — they get absorbed in the work. The fix is hilariously simple: if the agent hasn't touched its todo list in 3 tool calls, the harness injects a gentle reminder:
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
if rounds_since_todo >= 3:
results.append({
"type": "text",
"text": "<reminder>Update your todos.</reminder>"
})
Count how many rounds since the agent last updated its todos.
If it's been 3 or more rounds without an update...
...sneak a reminder into the tool results...
(disguised as a regular message)
...that says "Hey, update your plan!"
(The agent sees this and goes "oh right, let me check my list.")
Delegation: clean rooms for subtasks
When a task gets complex, the main agent can spawn a subagent — a child agent with a completely fresh, empty memory. The subagent works on one specific subtask, then returns only a summary. Like sending a colleague to investigate something and bring back a one-page report.
The subagent pattern is a form of process isolation. Each child gets a fresh conversation history (messages = []), so the noise from reading 20 files doesn't pollute the parent's thinking. The parent stays focused on the big picture.
The code: fresh context, summary return
def run_subagent(prompt: str) -> str:
sub_messages = [{"role": "user",
"content": prompt}]
for _ in range(30):
response = client.messages.create(
model=MODEL,
system=SUBAGENT_SYSTEM,
messages=sub_messages, ...)
if response.stop_reason != "tool_use":
break
# execute tools, append results...
return "...".join(b.text for b ...)
Define a function that spawns a subagent with one task.
Start with a fresh, empty conversation.
The only thing in it is the task prompt.
Safety limit: max 30 loop cycles (so it can't run forever).
Run the same agent loop pattern we learned in Module 2.
(Same model, but a different system prompt for subagents.)
(Using the child's fresh message history, not the parent's.)
When the model stops calling tools...
...exit the loop.
(While it's working, execute tools and feed results back.)
Return ONLY the final text summary — everything else is discarded.
Check your understanding
Your agent needs to investigate two unrelated things: how the payment system works AND how the notification system works. What approach keeps things cleanest?
What problem does the "nag reminder" solve?
Knowledge & Memory
How the agent loads expertise on demand and keeps working forever without running out of memory
The library card catalog
Imagine a university library. You don't carry every book in your backpack — you carry a catalog card that lists what books exist. When you need one, you walk to the shelf and grab it.
AI agents work the same way. Stuffing all knowledge into the system prompt wastes precious context window space. Instead, you show the catalog upfront and load the full book only when needed.
Two-layer skill loading
Session 5 (s05_skill_loading.py) implements this library pattern with two layers:
Layer 1: The Catalog
Short descriptions of each skill go in the system prompt. Costs ~100 tokens per skill — almost free.
Layer 2: The Book
When the model calls load_skill("pdf"), the full skill body is returned via tool_result. The model reads it, follows the instructions, then the knowledge stays in context as long as needed.
You are a coding agent.
Skills available:
- pdf: Process PDF files...
- code-review: Review code...
- mcp-builder: Build MCP servers...
The AI's identity and mission.
A short menu of available skills — just names and one-liners.
If someone asks about PDFs, I should load this skill first.
If someone asks for a code review, I have that skill.
If someone wants to build an MCP server, there's a skill for that.
Where skills live: SKILL.md files
Each skill is a folder with a SKILL.md file. The file has YAML frontmatter (the catalog entry) and a body (the full instructions):
---
name: pdf
description: "Process PDF files"
---
# PDF Processing Skill
Step 1: Extract text using PyPDF2...
Step 2: If scanned, use OCR via...
Step 3: Structure the output as...
Start of the metadata section (frontmatter).
Skill name — used to call load_skill("pdf").
Short description — shown in Layer 1 (the catalog).
End of metadata. Everything below is the body.
The full instructions start here — this is Layer 2.
Detailed step-by-step guidance the agent follows.
Edge cases and alternative approaches.
Output formatting rules.
The memory problem: context fills up
Every message, every tool call, every file the agent reads — it all stays in the conversation history. After a long session, the context window fills up and the agent can't think anymore.
Session 6 (s06_context_compact.py) solves this with a three-layer compression pipeline — like a note-taking system that keeps essential information and discards the noise:
Before and after: compression in action
# 47,000 tokens of conversation:
User: "Fix the login bug"
AI: calls bash("grep -r login...")
Result: 500 lines of grep output
AI: calls read_file("auth.py")
Result: 200 lines of Python code
AI: calls bash("python -m pytest...")
Result: 300 lines of test output
AI: calls edit_file("auth.py", ...)
Result: "Edited auth.py"
# ... 30 more tool calls ...
~2,000 tokens after compression:
[Conversation compressed. Transcript: .transcripts/transcript_1712345678.jsonl]
Summary:
Fixed login bug in auth.py. The issue was a missing token refresh check on line 42.
Changed the validation logic to check token expiry before auth.
All 12 tests passing. auth.py and middleware.py were modified.
Full transcript saved to disk if we ever need the details.
Context compression is strategic forgetting — the agent deliberately discards details it no longer needs, while keeping key decisions, current state, and what's left to do. This is why AI agents can work on long, complex tasks without losing their mind. The full transcript is always saved to disk, so nothing is truly lost.
Check your understanding
Why does the agent use two-layer skill loading instead of putting all skill instructions in the system prompt?
In micro_compact (Layer 1), why are read_file results preserved while other tool results get compressed?
Persistence
How the agent saves its goals to disk, tracks dependencies, and runs things in the background
The construction site whiteboard
On a construction site, the foreman doesn't keep the project plan in their head. It's on a whiteboard in the office — visible to everyone, surviving shift changes, and showing what blocks what. "Can't pour concrete until the foundation is inspected. Can't do plumbing until concrete is set."
The TodoManager from Module 3 lives in conversation memory — if the context gets compressed, the todos vanish. Session 7 (s07_task_system.py) moves the plan to disk, where it survives anything.
Tasks as files: surviving compression
Each task is a JSON file in a .tasks/ folder. Even if the conversation is completely reset, the tasks are still there:
{
"id": 2,
"subject": "Add user authentication",
"description": "JWT-based login flow",
"status": "pending",
"blockedBy": [1],
"owner": ""
}
Open this task's data file.
Task number 2.
What needs to be done — the goal.
More detail about how to do it.
Current state: not started yet.
KEY: Can't start until Task 1 is finished. This is the dependency.
No one is working on it yet (later used for team assignments).
End of task data.
Dependencies: what blocks what
When a task is completed, it automatically unblocks everything waiting on it. Like dominoes — finishing "set up database" automatically frees "add user auth" to start:
Todos vs Tasks: what changed?
Todos (Session 3)
Live in conversation memory. Vanish on compression. Simple flat list. Good for single-session work.
Tasks (Session 7)
Live on disk as JSON files. Survive compression, restarts, even new conversations. Have dependencies (blockedBy). Can be assigned to team members.
This is one of the most important patterns in software: persisting state to disk. The conversation can be compressed, the agent can restart, but the task board on disk remains intact. Any new conversation can pick up where the last one left off by reading the .tasks/ folder.
Background work: the agent keeps thinking
Some operations take time — running a test suite, building a project, installing packages. Session 8 (s08_background_tasks.py) lets the agent kick off slow commands in a daemon thread and keep working on other things. When the background task finishes, a notification is injected into the conversation.
Check your understanding
The agent has been working for 2 hours and auto_compact triggers, compressing the entire conversation into a summary. What happens to the task plan?
Task 2 has "blockedBy": [1]. What happens when Task 1 is marked completed?
The Team
From solo agent to an entire squad — teammates, mailboxes, protocols, and autonomous work
From one-person crew to a film production
So far, everything has been a solo artist. One agent, one task. But real projects need a team — like a film production. You have a director (lead agent), a cinematographer (frontend specialist), a sound engineer (backend specialist), and a props department (testing specialist). Everyone works in parallel, communicates through walkie-talkies, and follows shared protocols.
Sessions 9-12 build this entire coordination system, one layer at a time.
Subagents were disposable. Teammates persist.
Subagent (Session 4)
Spawned → does one task → returns summary → destroyed. Like hiring a temp worker. No memory between tasks.
Teammate (Session 9)
Spawned → works → goes idle → gets new work → works → ... → shuts down when asked. Like a full-time employee. Has a name, a role, and a persistent inbox.
Communication: file-based mailboxes
How do teammates talk? Through JSONL files — each teammate gets their own inbox file. Sending a message is literally appending a line to a file. Reading the inbox is reading the file and clearing it.
{"type":"message","from":"lead",
"content":"Fix the login bug in auth.py"}
{"type":"message","from":"bob",
"content":"I updated the DB schema, you may
need to adjust the auth queries"}
{"type":"broadcast","from":"lead",
"content":"Deploy freeze at 5pm today"}
A direct message from the lead agent to Alice.
"Here's your assignment."
A message from another teammate, Bob.
"Heads up — I changed something that
might affect your work."
A broadcast sent to ALL teammates at once.
"Everyone stop pushing code at 5pm."
Files are the simplest possible IPC mechanism. No setup, no server, no dependencies. Append a line = send a message. Read and clear = check inbox. It works across threads, processes, even machines (via shared filesystem). The simplicity is the point.
Watch a team coordinate
The lead spawns two teammates, assigns them work, and they communicate through their mailboxes:
The final evolution: autonomous teams
The journey from Session 9 to Session 12 adds three more layers of sophistication:
Shared communication rules: state machines for shutdown approval and plan review. One request-response pattern drives all negotiation between agents.
Agents don't wait for assignments. They scan the task board, find unblocked tasks, claim them, and start working. The lead doesn't micromanage — teammates self-organize.
Each agent works in its own git worktree — a separate copy of the codebase. No file conflicts. Each agent's workspace is bound to their task ID.
The complete architecture — all 12 sessions
Every session added exactly one mechanism. None changed the core loop. Here's the full stack:
Phase 1: THE LOOP
s01 while + stop_reason [1 tool]
s02 dispatch map: name→handler [4 tools]
Phase 2: PLANNING & KNOWLEDGE
s03 TodoManager + nag reminder [5 tools]
s04 fresh messages[] per child [5 tools]
s05 SKILL.md via tool_result [5 tools]
s06 3-layer compression [5 tools]
Phase 3: PERSISTENCE
s07 file-based CRUD + deps [8 tools]
s08 daemon threads + notify [6 tools]
Phase 4: TEAMS
s09 teammates + JSONL mailboxes [9 tools]
s10 shutdown + plan approval FSM [12 tools]
s11 idle cycle + auto-claim [14 tools]
s12 worktree isolation [16 tools]
Phase 1: Making the agent work at all.
The core loop — ask the model, run tools, repeat.
More tools, same loop — read, write, edit files.
Phase 2: Making the agent smart.
Self-tracking progress with gentle reminders.
Delegating subtasks with clean context isolation.
Loading specialized knowledge only when needed.
Forgetting strategically to work forever.
Phase 3: Making the agent reliable.
Goals that survive restarts, with dependency ordering.
Long tasks run in background; agent keeps thinking.
Phase 4: Making a team of agents.
Named agents talking through file-based inboxes.
Shared rules for coordination and approval.
Agents self-assign work from the task board.
Each agent in its own workspace — no conflicts.
The most elegant agent harness is one that provides powerful tools, rich knowledge, clean context, and safe boundaries — then gets out of the way. You don't script the intelligence. You don't build decision trees. You build the world the intelligence inhabits. Build great harnesses. The agent will do the rest.
Final check
How do teammates communicate with each other?
What makes Session 11 agents "autonomous" compared to Session 9 teammates?
Two agents are working on different features simultaneously. Without Session 12, what would go wrong?
You now understand how AI agents work.
From a single while True loop to an autonomous team of agents working in isolated workspaces — the entire progression is just harness engineering. The model is always the driver. You just built a better vehicle.
Ready to build your own? Check out the Learn Claude Code repository for the full source code, or start with python agents/s01_agent_loop.py and see the magic happen yourself.