Skip to content

How AI Agents Actually Work: The Message Loop Explained

intermediate 20 min
Sources verified Dec 28, 2025
agentsinternalstool-callingmessage-looparchitecture

Scenario

Context: You're building an AI agent and need to understand what happens behind the scenes

Goal: Understand the message structure, tool calling mechanics, and agent loop that power AI coding assistants

Anti-pattern: Treating agents as black boxes without understanding their mechanics

Tools: OpenAI APIAnthropic APIAny LLM with tool use

Conversation

What Is an Agent?

An AI agent is an LLM in a loop that can take actions. The core pattern:

while not done:
    response = llm.generate(messages)
    if response.has_tool_calls:
        results = execute_tools(response.tool_calls)
        messages.append(response)
        messages.append(results)
    else:
        done = True
        return response.text

That's it. Everything else—Claude Code, Cursor, Copilot agents—is variations on this loop.

The Message Array: An Agent's Memory

Every agent conversation is a growing array of messages. Each message has a role:

Role Purpose
system Instructions, personality, constraints (set once)
user Human input
assistant LLM responses (may include tool calls)
tool Results from tool execution

The LLM sees the entire array on every turn. This is how it "remembers" context.

Example: A Simple File-Reading Agent

Let's trace through exactly what happens when you ask an agent to read a file.

Initial state:

👤 You
{
  "messages": [
    {
      "role": "system",
      "content": "You are a coding assistant. Use tools to help the user."
    },
    {
      "role": "user", 
      "content": "What's in src/config.ts?"
    }
  ],
  "tools": [
    {
      "name": "read_file",
      "description": "Read contents of a file",
      "parameters": {
        "type": "object",
        "properties": {
          "path": { "type": "string", "description": "File path to read" }
        },
        "required": ["path"]
      }
    }
  ]
}
This is the actual API request. The LLM receives messages + available tools.

Step 1: LLM Generates Response with Tool Call

The LLM doesn't output text—it outputs a structured tool call:

🤖 AI
{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "read_file",
        "arguments": "{\"path\": \"src/config.ts\"}"
      }
    }
  ]
}
Note: content is null. The LLM decided to call a tool instead of responding with text. The arguments are JSON-encoded.

Step 2: Your Code Executes the Tool

The agent framework (your code) intercepts this, runs the actual tool:

👤 You
// Agent loop pseudocode
const response = await llm.chat(messages, tools);

if (response.tool_calls) {
  for (const call of response.tool_calls) {
    // Actually execute the tool
    const result = await executeTool(call.function.name, 
                                      JSON.parse(call.function.arguments));
    
    // Append the tool result to messages
    messages.push({
      role: "tool",
      tool_call_id: call.id,
      content: result  // The actual file contents
    });
  }
  
  // Continue the loop - call LLM again with tool results
  messages.push(response);  // Include the assistant's tool_calls
  return agentLoop(messages, tools);  // Recurse
}
The tool_call_id links the result back to the specific call. This is how the LLM knows which result corresponds to which request.

Step 3: Messages Array After Tool Execution

Now the messages array has grown:

🤖 AI
[
  { "role": "system", "content": "You are a coding assistant..." },
  { "role": "user", "content": "What's in src/config.ts?" },
  { 
    "role": "assistant", 
    "content": null,
    "tool_calls": [{ "id": "call_abc123", "function": { "name": "read_file", ... }}]
  },
  {
    "role": "tool",
    "tool_call_id": "call_abc123",
    "content": "export const config = {\n  port: 3000,\n  debug: true\n};"
  }
]
The LLM now sees: the original question, its own decision to read the file, AND the file contents. It has full context.

Step 4: LLM Generates Final Response

With the file contents in context, the LLM can now answer:

🤖 AI
{
  "role": "assistant",
  "content": "The `src/config.ts` file exports a config object with two properties:\n\n- `port`: 3000 (the server port)\n- `debug`: true (debug mode enabled)\n\nThis is a simple configuration file for your application settings."
}
No tool_calls this time—the LLM decided it has enough information to respond. The loop terminates.

Key Insight: The LLM Decides Everything

The LLM makes all decisions:

  • Whether to call a tool (or just respond)
  • Which tool to call
  • What arguments to pass
  • When to stop (respond without tool calls)

Your code just executes what it asks for and feeds results back.

Multi-Tool Chains: Reading Then Writing

Agents often chain multiple tools. Here's a "read file, modify it, write it back" flow:

User: "Add a timeout field to config.ts, default 30000"

Turn 1: LLM calls read_file("src/config.ts")
        → Tool returns file contents
        
Turn 2: LLM calls write_file("src/config.ts", "...updated contents...")
        → Tool returns "File written successfully"
        
Turn 3: LLM responds "Done! Added timeout: 30000 to the config."

Each turn adds to the messages array. By Turn 3, the LLM has seen:

  • The original request
  • Its decision to read
  • The file contents
  • Its decision to write
  • Confirmation of the write

Parallel Tool Calls

Modern LLMs can request multiple tools simultaneously:

{
  "tool_calls": [
    { "id": "call_1", "function": { "name": "read_file", "arguments": "{\"path\": \"src/a.ts\"}" }},
    { "id": "call_2", "function": { "name": "read_file", "arguments": "{\"path\": \"src/b.ts\"}" }},
    { "id": "call_3", "function": { "name": "read_file", "arguments": "{\"path\": \"src/c.ts\"}" }}
  ]
}

Your agent should execute these in parallel for efficiency, then return all results:

[
  { "role": "tool", "tool_call_id": "call_1", "content": "...contents of a.ts..." },
  { "role": "tool", "tool_call_id": "call_2", "content": "...contents of b.ts..." },
  { "role": "tool", "tool_call_id": "call_3", "content": "...contents of c.ts..." }
]

Context Accumulation: The Blessing and Curse

Every turn adds to the messages array:

  • User messages
  • Assistant responses
  • Tool calls
  • Tool results (which can be large!)

Problem: Context windows have limits (128K-200K tokens typically).

Solutions agents use:

  1. Truncate old messages - Drop earlier turns
  2. Summarize tool results - "File has 500 lines, showing first 100"
  3. Reference instead of inline - "See file contents in previous turn"
  4. Clear context - /clear commands start fresh

Tool Schemas: Teaching the LLM What's Available

Tools are defined with JSON Schema. The schema teaches the LLM:

  • What the tool does (description)
  • What parameters it needs
  • Which parameters are required
{
  "name": "search_codebase",
  "description": "Search for text patterns across all files. Use for finding function definitions, usages, or specific strings.",
  "parameters": {
    "type": "object",
    "properties": {
      "pattern": {
        "type": "string",
        "description": "Regex pattern to search for"
      },
      "file_glob": {
        "type": "string", 
        "description": "Optional glob to filter files, e.g. '*.ts'"
      },
      "max_results": {
        "type": "integer",
        "description": "Maximum results to return",
        "default": 20
      }
    },
    "required": ["pattern"]
  }
}

The Complete Agent Loop

async function agentLoop(
  messages: Message[],
  tools: Tool[],
  maxIterations = 20
): Promise<string> {
  
  for (let i = 0; i < maxIterations; i++) {
    const response = await llm.chat({ messages, tools });
    
    // No tool calls = final response
    if (!response.tool_calls?.length) {
      return response.content;
    }
    
    // Execute all tool calls (in parallel)
    messages.push(response);  // Include assistant's tool_calls
    
    const toolResults = await Promise.all(
      response.tool_calls.map(async (call) => ({
        role: "tool" as const,
        tool_call_id: call.id,
        content: await executeTool(call.function.name, 
                                    JSON.parse(call.function.arguments))
      }))
    );
    
    messages.push(...toolResults);
  }
  
  throw new Error("Agent exceeded max iterations");
}

Why This Matters for AI Engineers

Understanding the message loop lets you:

  1. Debug agent behavior - Inspect the messages array to see what the LLM saw
  2. Optimize context usage - Know what's eating your token budget
  3. Design better tools - Write descriptions that guide the LLM correctly
  4. Build guardrails - Intercept tool calls before execution
  5. Implement caching - Recognize repeated tool calls

The agent isn't magic—it's a loop with a very smart decision-maker inside.

Summary

Concept What It Is
Messages array The agent's memory—grows each turn
Tool call LLM's structured request to execute something
Tool result Your code's response, fed back to LLM
Agent loop Generate → execute tools → append results → repeat
Termination LLM responds without tool calls
tool_call_id Links results to requests (enables parallel calls)

Every AI coding assistant—from simple chatbots to Claude Code—is built on this foundation.

Key Takeaways

  • An agent is an LLM in a loop that can call tools and see their results
  • The messages array is the agent's memory—it grows with each turn
  • The LLM decides when to call tools vs when to respond
  • tool_call_id links parallel tool results back to their requests
  • Context accumulation is both powerful (memory) and limiting (token budget)

Sources

Tempered AI Forged Through Practice, Not Hype

Keyboard Shortcuts

j
Next page
k
Previous page
h
Section home
/
Search
?
Show shortcuts
m
Toggle sidebar
Esc
Close modal
Shift+R
Reset all progress
? Keyboard shortcuts