Skip to content

Tokens & Tokenization

Fundamentals beginner 10 min
Sources verified Dec 22

LLMs process text as tokens — chunks of characters that form the atomic units of input and output, directly affecting pricing and context limits.

LLMs don't process text character by character. Instead, they break text into tokens — chunks that might be words, parts of words, or punctuation.

For example:

  • "Hello" → 1 token
  • "indistinguishable" → 4 tokens
  • " " (spaces) → 1 token
  • Code often tokenizes less efficiently than prose

How Tokenization Works

Modern LLMs use byte-pair encoding (BPE) or similar algorithms:

  1. Start with individual bytes/characters
  2. Iteratively merge the most frequent pairs
  3. Build a vocabulary of common subwords
  4. Result: frequent words become single tokens, rare words split into pieces

This is why "ChatGPT" might be 3 tokens while "cat" is 1.

token_counting.ts
// Approximate token counting
function estimateTokens(text: string): number {
  // Rule of thumb: 4 chars per token for English
  return Math.ceil(text.length / 4);
}

// For production, use tiktoken or the model's tokenizer
import { encoding_for_model } from 'tiktoken';
const enc = encoding_for_model('gpt-4o');
const tokens = enc.encode('Hello, world!');
console.log(tokens.length); // 4
L3: Rough estimate only - use tiktoken for accuracy
L8: tiktoken gives exact counts for OpenAI models

Why Tokens Matter

  1. Pricing: You pay per input + output tokens (e.g., $3/million input for Claude Sonnet)
  2. Context limits: Models have maximum context windows (e.g., 128K tokens for GPT-4o)
  3. Response quality: Important context near the limit may be "forgotten"
  4. Speed: More tokens = longer generation time

Key Takeaways

  • Tokens are chunks of text, not characters or words
  • 1 token ≈ 4 English characters or ~0.75 words
  • Pricing and limits are based on token counts
  • Use tiktoken or model APIs for accurate counting
  • Code and non-English text often tokenize less efficiently

In This Platform

Token awareness matters when designing prompts. Our system prompts in prompts/*.json are written to be concise, avoiding unnecessary verbosity that would waste tokens.

Relevant Files:
  • prompts/analysis.json
  • prompts/recommendations.json

Sources

Tempered AI Forged Through Practice, Not Hype

Keyboard Shortcuts

j
Next page
k
Previous page
h
Section home
/
Search
?
Show shortcuts
m
Toggle sidebar
Esc
Close modal
Shift+R
Reset all progress
? Keyboard shortcuts