Agent-first IDEs (Cursor 2.0, Google Antigravity, Devin 2.0) can now orchestrate multiple autonomous agents in parallel. The new risks include review fatigue (impossible to review 500+ lines of diffs), agentic security threats (OWASP Top 10), and multi-agent coordination failures. Mature users know how to supervise agents, apply 'Least Agency' principles, and use layered verification (AI review + human review + tests).
Cursor 2.0: Composer Model and Multi-Agent Architecture
Cursor (Anysphere)
Cursor 2.0 represents the shift to agent-first IDEs: purpose-built Composer model for low-latency coding, up to 8 parallel agents, native browser integration, sandboxed execution. The MoE architecture and RL training show vendor investment in specialized coding models. Critical for understanding the 'agentic IDE' paradigm.
Key Findings:
Cursor 2.0 released October 29, 2025
Composer: first in-house large coding model, 4x faster than comparable models
Multi-agent: up to 8 independent AI agents in parallel via git worktrees
The first OWASP security framework specifically for agentic AI systems. The 'Least Agency' principle is critical for our agentic_supervision dimension. Key risks (goal hijacking, tool misuse, rogue agents) directly inform supervision recommendations. Released December 2025, this is the authoritative security guide for AI coding agents.
Is AI Creating a New Code Review Bottleneck for Senior Engineers?
The New Stack
Documents the emerging 'AI Productivity Paradox': AI increases output but creates review bottlenecks. The 91% increase in PR review times despite 21% more tasks completed shows the shifting bottleneck problem. Critical for organizational integration dimension.
Key Findings:
Teams with heavy AI use completed 21% more tasks but PR review times increased 91%
67% of developers spend more time debugging AI-generated code
Late 2025 saw a paradigm shift to 'agent-first' IDEs: Cursor 2.0's Composer model enables 8 parallel agents, Google Antigravity's Manager view orchestrates autonomous agents, and Devin 2.0's 67% PR merge rate shows agents are production-ready. OWASP's Top 10 for Agentic AI (Dec 2025) formalized security risks including goal hijacking and tool misuse. The new skill is 'agent supervision' - not writing code, but directing and verifying AI that writes code.
Assessment Questions (10)
Maximum possible score: 46 points
○Q1single choice4 pts
How often do you use agent mode or autonomous coding features?
[0]Never / Not available to me
[1]Rarely - I prefer manual control
[2]Sometimes - for specific types of tasks
[3]Regularly - it's part of my workflow
[4]Frequently - I supervise agents daily
GitHub Copilot Agent Mode Documentation
GitHub
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
Key Findings:
89% of organizations prioritizing AI integration into applications
76% of technologists rely on AI for parts of their daily work
75% of developers report positive productivity impact from AI
What scope of tasks do you delegate to agent mode?
[1]Single file edits
[1]Multi-file refactoring
[1]Implementing new features
[1]Bug fixes with test updates
[1]Framework/library migrations
[0]I don't use agent mode
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMad Code
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
Key Findings:
19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
50+ workflows covering development scenarios
Scale-adaptive intelligence adjusts to task complexity
GitHub Copilot's agent mode enables autonomous multi-file editing, allowing AI to plan and execute complex changes across a codebase without step-by-step human approval. This capability requires careful supervision practices since agents can introduce cascading errors across multiple files. Critical for agentic_supervision dimension - assessing how organizations manage autonomous AI coding.
[5]Layered approach: AI review + human review + tests
Note: Using AI to review AI (e.g., asking Opus to review changes made by Flash) is an emerging best practice
AI Code Review and the Best AI Code Review Tools in 2025
Qodo
Comprehensive overview of AI code review tools and AI-reviewing-AI patterns. Key for agentic_supervision dimension - validates that AI reviewing AI is an emerging best practice.
Key Findings:
84% of developers now using AI tools, 41% of code is AI-generated
Leading AI review tools: CodeRabbit, Codacy Guardrails, Snyk DeepCode
AI-to-AI review is an emerging pattern (AI reviews AI-generated code)
Official guidance on multi-agent patterns including debate (multiple models reviewing each other), TDD splits (one writes tests, another implements), and Architect/Implementer separation. Research shows diverse model debate (Claude + Gemini + GPT) achieves 91% on GSM-8K vs 82% for identical models.
Key Findings:
Separate Claude instances can communicate via shared scratchpads
Multi-agent debate with diverse models outperforms single-model approaches
Writer/Reviewer and TDD splits improve output quality
How often do you intervene during an agent mode session?
[1]Never - I let it finish completely
[2]Rarely - only if obviously going wrong
[3]Sometimes - I course-correct when needed
[4]Regularly - I treat it as pair programming
[3]Frequently - I stay engaged throughout
Note: 'Pair programming' approach scores highest—active but not micromanaging
The 'Trust, But Verify' Pattern For AI-Assisted Engineering
This article provides the conceptual framework for our trust_calibration dimension. The three principles (Blind Trust is Vulnerability, Copilot Not Autopilot, Human Accountability Remains) directly inform our survey questions. The emphasis on verification over speed aligns with METR findings. Practical guidance includes starting conservatively with AI on low-stakes tasks.
Key Findings:
Blind trust in AI-generated code is a vulnerability
AI tools function as 'Copilot, Not Autopilot'
Human verification is the new development bottleneck
The 2025 DORA report introduces the 'AI Capabilities Model' identifying seven practices that amplify AI benefits. The core insight is that AI is an 'amplifier' - it magnifies existing organizational strengths AND weaknesses. Key stats: 89% of orgs prioritizing AI, 76% of devs using daily, but 39% have low trust. The trust research is critical: developers who trust AI more are more productive, but trust must be earned through organizational support (policies, training time, addressing concerns). The 451% adoption increase from acceptable-use policies is remarkable - clarity enables adoption.
Key Findings:
89% of organizations prioritizing AI integration into applications
76% of technologists rely on AI for parts of their daily work
75% of developers report positive productivity impact from AI
How often do you reject or rollback agent mode changes?
[0]Never - I always accept
[2]Rarely - less than 10% of the time
[4]Sometimes - 10-30% of the time
[3]Often - 30-50% of the time
[1]Very often - more than 50%
Note: 10-30% rollback rate indicates calibrated supervision. Never or >50% suggests miscalibration.
Research: Quantifying GitHub Copilot's impact in the enterprise with Accenture
GitHub/Accenture
This is the primary source for the 30% acceptance rate benchmark and the 88% code retention statistic. The 95% enjoyment and 90% fulfillment stats are powerful for adoption justification. The 84% increase in successful builds directly supports the claim that AI doesn't sacrifice quality for speed. Published May 2024, so represents mature Copilot usage patterns.
Key Findings:
95% of developers said they enjoyed coding more with GitHub Copilot
90% of developers felt more fulfilled with their jobs when using GitHub Copilot
Developers accepted around 30% of GitHub Copilot's suggestions
[5]I use structured multi-session workflows (BMAD, Spec Kit)
Note: Multi-session memory is a 2025 frontier capability. Beads (Steve Yegge) and BMAD represent mature approaches to context continuity.
Introducing Beads: A Coding Agent Memory System
Beads solves the 'context loss' problem in multi-session AI development. Rather than storing tasks in unstructured markdown, Beads uses Git-backed JSONL files that agents can query for 'ready' work. Key for long-horizon tasks spanning multiple days or sessions. Represents the frontier of AI workflow tooling for persistent memory.
Key Findings:
Git-backed issue tracker designed for AI coding agents
BMAD-METHOD: Breakthrough Method for Agile AI Driven Development
BMad Code
BMAD represents the multi-agent orchestration approach to AI development. Unlike simple chat-based AI assistance, BMAD uses specialized agents (Analyst, Architect, Developer, QA) coordinated by an orchestrator. Key innovation: zero context loss between tasks. Represents advanced maturity in agentic workflows.
Key Findings:
19+ specialized AI agents with distinct roles (Analyst, Architect, Developer, QA)
50+ workflows covering development scenarios
Scale-adaptive intelligence adjusts to task complexity
Do you review agent execution plans BEFORE letting agents make changes?
[1]No - I let agents work autonomously
[2]Sometimes - for larger tasks
[3]Usually - I use plan mode or similar
[4]Always - I review and approve plans before execution
[5]I edit and refine agent plans before execution
Note: Devin 2.0 introduced interactive planning: users can edit/approve execution plans before task execution. Google Antigravity uses Artifacts to make agent reasoning visible. Pre-execution review is a critical control.
Devin 2.0: Performance Review and Enterprise Metrics
Cognition
Devin 2.0 shows maturation of autonomous coding agents: 4x faster, 67% merge rate, enterprise adoption (Goldman Sachs, Nubank). The $20/month pricing democratizes access. The 'interactive planning' feature addresses human oversight concerns. Critical for understanding the enterprise autonomous coding landscape.
Key Findings:
Devin 2.0 released April 2025 with $20/month Core plan (down from $500)
4x faster problem solving, 2x more efficient resource consumption
Google Antigravity: Agent-First Development Platform
Google
Google Antigravity represents the 'agent-first IDE' paradigm: agents work autonomously while humans supervise via Manager view. The Artifacts system addresses trust by making agent reasoning visible. Multi-model support (Gemini, Claude, GPT) shows the future is model-agnostic. Critical for agentic_supervision dimension.
Key Findings:
Announced November 18, 2025 alongside Gemini 3
Agent-first IDE paradigm (vs AI-assisted coding)
Two views: Editor view (traditional) and Manager view (agent orchestration)
Official best practices for Claude Code agentic development. Key for agentic_supervision dimension - demonstrates multi-file autonomous editing capabilities and supervision approaches.
Key Findings:
Claude Code is a command line tool for agentic coding
CLAUDE.md provides project-specific context and instructions
[5]I apply layered controls based on OWASP Agentic AI guidelines
Note: OWASP Top 10 for Agentic AI (Dec 2025) identifies critical risks: goal hijacking, tool misuse, privilege escalation. 'Least Agency' principle: only grant agents the minimum autonomy required. Layered controls combine scope limits + permissions + monitoring.
OWASP Top 10 for Agentic Applications 2026
OWASP GenAI Security Project
The first OWASP security framework specifically for agentic AI systems. The 'Least Agency' principle is critical for our agentic_supervision dimension. Key risks (goal hijacking, tool misuse, rogue agents) directly inform supervision recommendations. Released December 2025, this is the authoritative security guide for AI coding agents.
How do you approach multi-agent workflows (multiple AI agents working in parallel)?
[0]Never used multi-agent workflows
[1]Tried but prefer single-agent for simplicity
[2]Use occasionally for independent tasks
[3]Regularly use with isolation strategies (git worktrees, remote machines)
[4]Advanced: orchestrate agents with Manager view or similar tooling
Note: Cursor 2.0's 8-parallel-agent architecture and Google Antigravity's Manager view represent the multi-agent frontier. Proper isolation (git worktrees) prevents agents from interfering with each other.
Cursor 2.0: Composer Model and Multi-Agent Architecture
Cursor (Anysphere)
Cursor 2.0 represents the shift to agent-first IDEs: purpose-built Composer model for low-latency coding, up to 8 parallel agents, native browser integration, sandboxed execution. The MoE architecture and RL training show vendor investment in specialized coding models. Critical for understanding the 'agentic IDE' paradigm.
Key Findings:
Cursor 2.0 released October 29, 2025
Composer: first in-house large coding model, 4x faster than comparable models
Multi-agent: up to 8 independent AI agents in parallel via git worktrees
Google Antigravity: Agent-First Development Platform
Google
Google Antigravity represents the 'agent-first IDE' paradigm: agents work autonomously while humans supervise via Manager view. The Artifacts system addresses trust by making agent reasoning visible. Multi-model support (Gemini, Claude, GPT) shows the future is model-agnostic. Critical for agentic_supervision dimension.
Key Findings:
Announced November 18, 2025 alongside Gemini 3
Agent-first IDE paradigm (vs AI-assisted coding)
Two views: Editor view (traditional) and Manager view (agent orchestration)
Under what conditions do you use autonomous/YOLO modes (skip permission prompts)?
[3]Never - I always review each action
[3]Rarely - only for trivial tasks
[5]In sandboxed/throwaway environments only (containers, VMs, temp branches)
[1]On most projects after initial setup
[5]With explicit safeguards (network disabled, no prod credentials, isolated)
Note: YOLO/autonomous modes are powerful for prototyping but dangerous in production. Anthropic guidance: only in containers/VMs with network disabled. Sandboxed + safeguards scores highest.
YOLO Mode Safety Guidelines for AI Agents
MarkTechPost
Safety guidelines for running AI agents in autonomous mode. Key rule: only skip permission prompts in sandboxed environments (containers, VMs) without network access to production. Acceptable for prototyping; never for production work.
Key Findings:
YOLO mode (--dangerously-skip-permissions) only safe in sandboxed environments
Never use with network access to sensitive systems