TDD for AI Agents
Test-Driven Development adapted for AI coding agents, using tests as anchors to prevent 'vibe coding' and ensure verifiable behavior.
Test-Driven Development (TDD) becomes even more valuable when working with AI agents. Kent Beck, creator of TDD, calls it a 'superpower' for agentic development. Tests provide the stable anchor that prevents 'vibe coding' - without them, agents iterate endlessly based on vibes rather than verifiable criteria.
The Red-Green-Generate workflow adapts TDD for AI: (1) Write a failing test that specifies the desired behavior, (2) Let the agent generate implementation, (3) Verify the test passes. This creates a clear exit condition - the agent iterates until tests pass, not until it 'feels done'.
Traditional TDD assumes determinism - run a test, get the same result. AI agents are non-deterministic, so tests need flexibility. Use scoring rubrics (0-100) instead of binary pass/fail. Evaluate behaviors and reasoning, not just outputs. Run tests multiple times to catch flaky agent behavior.
Spec-First Prompting extends TDD to conversations with AI. Before asking for implementation, ask for the test: 'Create a Jest test that fails for [Feature X] covering edge cases A and B. Do not implement yet.' If the agent understands the test, it understands the requirement.
There are two automation loops: Inner loop (Claude Code hooks) - fast feedback during generation. Configure PostToolUse hooks to auto-run type checking or linting when files change. The agent 'feels' compiler errors immediately and self-corrects. Outer loop (Git hooks) - quality gate before commit. Pre-commit hooks block commits if tests fail, catching issues missed by the agent.
TDD for agents shifts review earlier. Instead of reviewing generated code (low leverage), review the test specification (high leverage). If the tests are right, the implementation will be too. This aligns with the 'shift left' principle - catch problems early when they're cheap to fix.
Key Takeaways
- TDD is a 'superpower' for AI agents - tests anchor behavior instead of relying on vibes
- Red-Green-Generate: write failing test first, let agent implement, verify test passes
- The Tautology Trap: never let the same agent write both test and implementation
- Use scoring rubrics (0-100) instead of binary pass/fail for non-deterministic systems
- Spec-First Prompting: ask agent to write tests before implementation to validate understanding
- Inner loop (Claude Code hooks): fast feedback during generation via auto-linting
- Outer loop (Git hooks): quality gate that blocks commits with failing tests
- Review test specifications (high leverage) instead of generated code (low leverage)
In This Platform
This platform uses TDD principles at build time: Zod schemas define expected structure (the 'test'), build.js validates all content against schemas (the 'verification'), and the build fails if validation fails. Every concept, exercise, and source must pass schema validation before deployment.
- build.js
- frontend/src/schemas/survey.ts
- test/unit/schemas.test.ts