Trust Calibration & Verification

Weight: 17%

Sources verified Dec 22, 20252 sources changed

Why It Matters

Veracode 2025: AI-generated code introduced security flaws in 45% of controlled tests (unreviewed raw output; varies by language—Java 72%, Python 38%). Harness 2025: 67% of developers spend more time debugging AI-generated code than before. METR found experienced devs feel 20% faster while actually being 19% slower—a dangerous perception gap. Review fatigue is now a critical concern.

Learn More

ConceptUnderstand hallucination patterns and when AI fails
ConversationLearn to identify and handle AI-generated errors
ExercisePractice detecting and fixing AI mistakes
ExerciseBuild evaluation pipelines for AI output quality assurance

Assessment Questions (9)

Maximum possible score: 43 points

○ Q1 single choice 5 pts

What do you typically do before accepting a Copilot suggestion?

[0] Accept immediately if it looks roughly right

[1] Quick glance - a few seconds review

[2] Careful read-through of the code

[3] Read-through plus mental/actual execution trace

[4] Full review including running tests

[5] Security-aware review (OWASP Top 10 check)

○ Q2 single choice 4 pts

Approximately what percentage of Copilot suggestions do you accept?

[3] 0-20%

[4] 21-35%

[3] 36-50%

[2] 51-70%

[0] 71-100%

Note: 2025 benchmark: 30-33% is healthy. Healthcare/regulated at 50-60%. Startups at 75% (too high). >70% is a red flag.

○ Q3 single choice 4 pts

In the past month, how often did you discover an error in AI-generated code AFTER accepting it?

[3] Never

[4] 1-2 times

[2] 3-5 times

[1] 6-10 times

[0] More than 10 times

Note: 1-2 times scores highest—indicates you use AI enough to encounter issues but catch most before acceptance

○ Q4 single choice 5 pts

Are you aware that AI-generated code introduced security flaws in 45% of coding tests (Veracode 2025)?

[1] No, this is surprising to me

[2] I knew there were some risks but not the extent

[4] Yes, and I've adjusted my review practices

[5] Yes, we have specific security scanning for AI code

○ Q5 single choice 5 pts

CodeRabbit 2025 found AI code has 2.74x more XSS, 8x more I/O performance issues. Do you check for these patterns?

[1] No - I wasn't aware of these specific risk multipliers

[2] I know about them but don't specifically check

[3] I manually check for XSS and performance issues in AI code

[4] We use automated tools that catch these patterns

[5] Automated + manual review with focus on AI code hot spots

Note: CodeRabbit analyzed 470 PRs: AI code has 1.7x overall issues, with security (2.74x XSS) and performance (8x I/O) as top concerns

○ Q6 single choice 5 pts

GitClear found 8x increase in code duplication from AI tools. Do you monitor for this?

[1] No - I didn't know AI increases duplication

[2] I'm aware but don't actively monitor

[3] I manually review for DRY violations in AI code

[4] We use code quality tools that flag duplication

[5] We actively refactor AI-introduced duplication

Note: GitClear 2025: AI makes it easier to add new code than reuse existing (limited context). Refactoring dropped from 25% to <10% of changes.

○ Q7 single choice 5 pts

With agent mode generating larger diffs, how do you manage review fatigue?

[0] I don't use agent mode / N/A

[1] I try to review everything but often skim large diffs

[2] I focus on critical paths and trust the rest

[3] I break large changes into smaller reviewable chunks

[4] I use AI code review tools (CodeRabbit, etc.) + human spot-check

[5] Layered review: AI review + focused human review + automated tests

Note: This is a new critical question for 2025. Layered AI+human review is emerging best practice.

○ Q8 single choice 5 pts

How often do you 'Accept All' AI changes without reading the diff? (Karpathy's 'vibe coding' pattern)

[5] Never - I always read diffs before accepting

[4] Rarely - only for trivial/obvious changes

[2] Sometimes - when I'm confident in the AI

[0] Often - I trust the AI's judgment

[0] Always - I fully 'vibe code' (accept without reading)

Note: Karpathy coined 'vibe coding' (Feb 2025) for accepting without reading. Fast Company reported 'vibe coding hangover' (Sep 2025) with 'development hell' consequences. Score of 2+ is a critical risk flag.