Context Windows

Fundamentals beginner 12 min

Sources verified Dec 22, 2025

The context window is the maximum amount of text (in tokens) an LLM can 'see' at once, including prompts, history, injected documents, and responses.

The context window is the maximum amount of text (in tokens) the model can "see" at once. This includes:

Your system prompt
Conversation history
Any documents/context you inject
The model's response so far

If your total exceeds the limit, you'll get an error or truncation.

Common Context Limits (2025)

Model	Context Window
GPT-4o	128K tokens
Claude Sonnet 4.5	200K tokens
Gemini 2.5 Pro	1M tokens
Llama 3.3 70B	128K tokens

A 128K context is roughly equivalent to a 300-page book.

The "Lost in the Middle" Problem

Research shows LLMs struggle with information in the middle of long contexts. They perform best on:

Content at the beginning of the context
Content at the end of the context

Information buried in the middle may be "forgotten" or overlooked. This has implications for prompt design.

context_management.ts
 // Managing context effectively
function buildContext({
  systemPrompt,
  documents,
  conversationHistory,
  maxTokens = 128000
}: ContextParams): Message[] {
  // Reserve tokens for response
  const responseReserve = 4000;
  const available = maxTokens - responseReserve;
  
  // Priority: system prompt > recent history > documents
  let used = estimateTokens(systemPrompt);
  
  // Keep recent conversation (recency matters)
  const recentHistory = conversationHistory.slice(-10);
  used += estimateTokens(recentHistory);
  
  // Fill remaining with most relevant documents
  const relevantDocs = rankByRelevance(documents)
    .filter(doc => (used += estimateTokens(doc)) < available);
  
  return [systemPrompt, ...relevantDocs, ...recentHistory];
} 
  Always reserve space for the model's response 
  Recent messages are more important for coherence 

Context Window Strategies

Summarization: Compress old conversation into summaries
RAG: Retrieve only relevant chunks instead of full documents
Sliding window: Keep N most recent messages
Priority ordering: Put critical info at start and end

Key Takeaways

Context window = total tokens the model can process at once
Includes prompt + history + injected content + response
Larger contexts = higher cost and latency
Information in the middle of long contexts may be overlooked
RAG often beats stuffing everything into context

In This Platform

This platform's prompts are designed for token efficiency. Each prompt template has a focused purpose to avoid context bloat when multiple modules are combined.

Relevant Files:

prompts/analysis.json
config/scoring.json

Prerequisites

Sources

Tempered AI — Forged Through Practice, Not Hype

? Keyboard shortcuts

Context Windows

Common Context Limits (2025)

The "Lost in the Middle" Problem

Context Window Strategies

Key Takeaways

In This Platform

Related Concepts

Prerequisites

Sources