Agentic Workflow Supervision

Weight: 15%

Sources verified Dec 23, 20252 sources changed

Why It Matters

Agent-first IDEs (Cursor 2.0, Google Antigravity, Devin 2.0) can now orchestrate multiple autonomous agents in parallel. Harness 2025: 92% of developers report AI increases the 'blast radius' from bad deployments. The new risks include review fatigue (impossible to review 500+ lines of diffs), agentic security threats (OWASP Top 10), and multi-agent coordination failures. Mature users know how to supervise agents, apply 'Least Agency' principles, and use layered verification (AI review + human review + tests).

Learn More

ConceptUnderstand autonomous agent patterns and supervision strategies
ConceptUse TDD to anchor agent behavior and avoid vibe coding
ConversationLearn when to delegate vs maintain control
ExerciseBuild and supervise a multi-step agent workflow
ExerciseConfigure Claude Code hooks and Git hooks for automated verification
ConversationOrchestrate multiple agents with git worktree isolation
ConversationMaintain context across multi-day agent sessions

2025 Context

Late 2025 saw a paradigm shift to 'agent-first' IDEs: Cursor 2.0's Composer model enables 8 parallel agents, Google Antigravity's Manager view orchestrates autonomous agents, and Devin 2.0's 67% PR merge rate shows agents are production-ready. OWASP's Top 10 for Agentic AI (Dec 2025) formalized security risks including goal hijacking and tool misuse. The new skill is 'agent supervision' - not writing code, but directing and verifying AI that writes code.

Assessment Questions (10)

Maximum possible score: 46 points

○ Q1 single choice 4 pts

How often do you use agent mode or autonomous coding features?

[0] Never / Not available to me

[1] Rarely - I prefer manual control

[2] Sometimes - for specific types of tasks

[3] Regularly - it's part of my workflow

[4] Frequently - I supervise agents daily

○ Q2 multi select 5 pts

What scope of tasks do you delegate to agent mode?

[1] Single file edits

[1] Multi-file refactoring

[1] Implementing new features

[1] Bug fixes with test updates

[1] Framework/library migrations

[0] I don't use agent mode

○ Q3 single choice 5 pts

How do you review changes made by agent mode?

[0] I accept if it runs/compiles

[1] Quick skim of the diff

[2] Careful line-by-line review of all changes

[3] Review + run comprehensive tests

[4] Use AI to review AI changes, then spot-check

[5] Layered approach: AI review + human review + tests

Note: Using AI to review AI (e.g., asking Opus to review changes made by Flash) is an emerging best practice

○ Q4 single choice 4 pts

How often do you intervene during an agent mode session?

[1] Never - I let it finish completely

[2] Rarely - only if obviously going wrong

[3] Sometimes - I course-correct when needed

[4] Regularly - I treat it as pair programming

[3] Frequently - I stay engaged throughout

Note: 'Pair programming' approach scores highest—active but not micromanaging

○ Q5 single choice 4 pts

How often do you reject or rollback agent mode changes?

[0] Never - I always accept

[2] Rarely - less than 10% of the time

[4] Sometimes - 10-30% of the time

[3] Often - 30-50% of the time

[1] Very often - more than 50%

Note: 10-30% rollback rate indicates calibrated supervision. Never or >50% suggests miscalibration.

○ Q6 single choice 5 pts

For complex tasks spanning multiple sessions, how do you maintain context?

[1] I restart context each session - no continuity

[2] I copy/paste relevant context manually

[3] I maintain project context files (CLAUDE.md, etc.)

[4] I use persistent memory tools (Beads, etc.)

[5] I use structured multi-session workflows (BMAD, Spec Kit)

Note: Multi-session memory is a 2025 frontier capability. Beads (Steve Yegge) and BMAD represent mature approaches to context continuity.

○ Q7 single choice 5 pts

Do you review agent execution plans BEFORE letting agents make changes?

[1] No - I let agents work autonomously

[2] Sometimes - for larger tasks

[3] Usually - I use plan mode or similar

[4] Always - I review and approve plans before execution

[5] I edit and refine agent plans before execution

Note: Devin 2.0 introduced interactive planning: users can edit/approve execution plans before task execution. Google Antigravity uses Artifacts to make agent reasoning visible. Pre-execution review is a critical control.

○ Q8 single choice 5 pts

Do you apply 'Least Agency' security principles when using AI agents?

[0] Not familiar with agentic security risks or Least Agency

[1] Aware of risks (prompt injection, tool misuse) but don't actively mitigate

[2] I limit agent scope to specific tasks

[3] I restrict agent permissions (file access, network, etc.)

[5] I apply layered controls based on OWASP Agentic AI guidelines

Note: OWASP Top 10 for Agentic AI (Dec 2025) identifies critical risks: goal hijacking, tool misuse, privilege escalation. 'Least Agency' principle: only grant agents the minimum autonomy required. Layered controls combine scope limits + permissions + monitoring.

○ Q9 single choice 4 pts

How do you approach multi-agent workflows (multiple AI agents working in parallel)?

[0] Never used multi-agent workflows

[1] Tried but prefer single-agent for simplicity

[2] Use occasionally for independent tasks

[3] Regularly use with isolation strategies (git worktrees, remote machines)

[4] Advanced: orchestrate agents with Manager view or similar tooling

Note: Cursor 2.0's 8-parallel-agent architecture and Google Antigravity's Manager view represent the multi-agent frontier. Proper isolation (git worktrees) prevents agents from interfering with each other.

○ Q10 single choice 5 pts

Under what conditions do you use autonomous/YOLO modes (skip permission prompts)?

[3] Never - I always review each action

[3] Rarely - only for trivial tasks

[5] In sandboxed/throwaway environments only (containers, VMs, temp branches)

[1] On most projects after initial setup

[5] With explicit safeguards (network disabled, no prod credentials, isolated)

Note: YOLO/autonomous modes are powerful for prototyping but dangerous in production. Anthropic guidance: only in containers/VMs with network disabled. Sandboxed + safeguards scores highest.

Practice Conversations (4)

Learn through simulated conversations that demonstrate key concepts.

intermediate 12 min

Multi-Day Sessions: Maintaining Context Across Days

You're working on a feature that spans multiple days and need to maintain context between sessions

17 messages →

advanced 15 min

Running Multiple AI Agents in Parallel

You have a complex feature with multiple independent parts that could be developed simultaneously

19 messages →

advanced 15 min

Managing Review Fatigue: When AI Output Overwhelms Human Oversight

Your AI coding agent has generated 500 lines of changes across 12 files. You need to review it but your attention is flagging.

8 messages →

intermediate 12 min

Scoped Task Delegation: Giving AI the Right Level of Autonomy

You're using an AI coding agent (Claude Code, Cursor, Windsurf) that can run commands and modify files

9 messages →

Tempered AI — Forged Through Practice, Not Hype

? Keyboard shortcuts