Skip to content

Context Windows

Fundamentals beginner 12 min
Sources verified Dec 22

The context window is the maximum amount of text (in tokens) an LLM can 'see' at once, including prompts, history, injected documents, and responses.

The context window is the maximum amount of text (in tokens) the model can "see" at once. This includes:

  • Your system prompt
  • Conversation history
  • Any documents/context you inject
  • The model's response so far

If your total exceeds the limit, you'll get an error or truncation.

Common Context Limits (2025)

Model Context Window
GPT-4o 128K tokens
Claude 3.5 Sonnet 200K tokens
Gemini 1.5 Pro 2M tokens
Llama 3.3 70B 128K tokens

A 128K context is roughly equivalent to a 300-page book.

The "Lost in the Middle" Problem

Research shows LLMs struggle with information in the middle of long contexts. They perform best on:

  1. Content at the beginning of the context
  2. Content at the end of the context

Information buried in the middle may be "forgotten" or overlooked. This has implications for prompt design.

context_management.ts
// Managing context effectively
function buildContext({
  systemPrompt,
  documents,
  conversationHistory,
  maxTokens = 128000
}: ContextParams): Message[] {
  // Reserve tokens for response
  const responseReserve = 4000;
  const available = maxTokens - responseReserve;
  
  // Priority: system prompt > recent history > documents
  let used = estimateTokens(systemPrompt);
  
  // Keep recent conversation (recency matters)
  const recentHistory = conversationHistory.slice(-10);
  used += estimateTokens(recentHistory);
  
  // Fill remaining with most relevant documents
  const relevantDocs = rankByRelevance(documents)
    .filter(doc => (used += estimateTokens(doc)) < available);
  
  return [systemPrompt, ...relevantDocs, ...recentHistory];
}
L7: Always reserve space for the model's response
L13: Recent messages are more important for coherence

Context Window Strategies

  1. Summarization: Compress old conversation into summaries
  2. RAG: Retrieve only relevant chunks instead of full documents
  3. Sliding window: Keep N most recent messages
  4. Priority ordering: Put critical info at start and end

Key Takeaways

  • Context window = total tokens the model can process at once
  • Includes prompt + history + injected content + response
  • Larger contexts = higher cost and latency
  • Information in the middle of long contexts may be overlooked
  • RAG often beats stuffing everything into context

In This Platform

This platform's prompts are designed for token efficiency. Each prompt template has a focused purpose to avoid context bloat when multiple modules are combined.

Relevant Files:
  • prompts/analysis.json
  • config/scoring.json

Prerequisites

Sources

Tempered AI Forged Through Practice, Not Hype

Keyboard Shortcuts

j
Next page
k
Previous page
h
Section home
/
Search
?
Show shortcuts
m
Toggle sidebar
Esc
Close modal
Shift+R
Reset all progress
? Keyboard shortcuts