GitHub Copilot now supports Claude (Opus/Sonnet), GPT-4, and Gemini. Cursor and Windsurf offer even more options. Mature users know that different models excel at different tasks, and the 100x cost difference between models matters. IDC predicts 70% of top enterprises will use dynamic model routing by 2028.
GitHub Copilot Multi-Model Support
GitHub
GitHub Copilot's multi-model support enables developers to choose the best model for each task. Key for model_routing dimension.
Key Findings:
GitHub Copilot supports multiple AI models from Anthropic, Google, and OpenAI
GitHub CEO: 'The era of a single model is over'
Developers can toggle between models during conversation
Claude's tiered pricing shows a 25x cost difference from Opus ($5/$25) to Haiku ($1/$5) per million tokens, with Batch API offering 50% discounts and prompt caching up to 90% savings. Understanding these tiers is fundamental to cost-aware model routing - developers must evaluate whether tasks require Opus's advanced reasoning or if Haiku's speed and efficiency suffices. Key for model_routing dimension.
Key Findings:
Claude Opus 4.5: $5/$25 per million tokens (66% price drop from Opus 4.1)
IDC predicts model routing will become standard in enterprise AI. Different models excel at different tasks, and using a single model for everything means suboptimal results. Heavy users have the most to gain from routing.
Key Findings:
By 2028, 70% of top AI-driven enterprises will use multi-tool architectures with dynamic model routing
AI models work best when somewhat specialized for targeted use cases
Even SOTA models are delivered as mixtures of experts with routing
Claude Opus 4.5 leads SWE-bench (80.9%), but GPT-5-Codex (74.5%) and Gemini Flash (faster, cheaper) each excel at different tasks. Augment Code's production data shows developers assembling 'model alloys'—matching Sonnet 4.5 to multi-file reasoning, Sonnet 4.0 to fast structured tasks, GPT-5 to explanatory contexts. The skill gap has moved from 'which is best?' to 'best for what?'
Assessment Questions (5)
Maximum possible score: 19 points
○Q1single choice4 pts
Are you aware that GitHub Copilot supports multiple AI models (Claude, GPT, Gemini)?
[0]No, I didn't know this
[1]Yes, but I always use the default
[2]Yes, I've tried different models occasionally
[4]Yes, I regularly switch based on the task
GitHub Copilot Multi-Model Support
GitHub
GitHub Copilot's multi-model support enables developers to choose the best model for each task. Key for model_routing dimension.
Key Findings:
GitHub Copilot supports multiple AI models from Anthropic, Google, and OpenAI
GitHub CEO: 'The era of a single model is over'
Developers can toggle between models during conversation
How do you select AI models for different coding tasks?
[0]I use whatever is default—I don't think about model selection
[1]I use the same model for everything (my favorite)
[2]I have rough preferences (e.g., Claude for refactoring, GPT for docs)
[3]I systematically match models to tasks, considering cost and speed tradeoffs
[4]I assemble 'model alloys'—matching cognitive styles (reasoning vs fast) to task profiles
Developers are choosing older AI models — and the data explain why
Augment Code
Data-driven analysis showing that in production environments, developers are diversifying model usage rather than consolidating around the newest option. Sonnet 4.5 excels at multi-file reasoning but introduces latency; Sonnet 4.0 is faster and more consistent for structured tasks; GPT-5 excels at explanatory contexts. This supports the need for model routing strategies rather than single-model approaches.
Key Findings:
Model adoption is diversifying, not consolidating around one 'best' model
Developers match models to specific task profiles rather than always using newest
Sonnet 4.5 share dropped from 66% to 52% while Sonnet 4.0 rose from 23% to 37%
Claude's tiered pricing shows a 25x cost difference from Opus ($5/$25) to Haiku ($1/$5) per million tokens, with Batch API offering 50% discounts and prompt caching up to 90% savings. Understanding these tiers is fundamental to cost-aware model routing - developers must evaluate whether tasks require Opus's advanced reasoning or if Haiku's speed and efficiency suffices. Key for model_routing dimension.
Key Findings:
Claude Opus 4.5: $5/$25 per million tokens (66% price drop from Opus 4.1)
IDC predicts model routing will become standard in enterprise AI. Different models excel at different tasks, and using a single model for everything means suboptimal results. Heavy users have the most to gain from routing.
Key Findings:
By 2028, 70% of top AI-driven enterprises will use multi-tool architectures with dynamic model routing
AI models work best when somewhat specialized for targeted use cases
Even SOTA models are delivered as mixtures of experts with routing
Which of the following model-task pairings do you use?
[1]Claude Opus/Sonnet for complex refactoring or architecture
[1]Gemini Flash or GPT-3.5 for simple tasks (tests, docs)
[1]Reasoning models (o3, Claude thinking) for debugging
[1]Gemini for very long context (1M+ tokens)
[1]Thinking triggers (think hard, ultrathink) for complex problems
[0]I don't think about model selection
Note: Thinking triggers (ultrathink) activate extended reasoning budgets in Claude Code. Gemini Deep Think uses parallel reasoning. Different cognitive styles for different tasks.
Claude Opus 4.5
Anthropic
Claude Opus 4.5 sets a new bar for AI coding with 80.9% SWE-bench Verified. Key for model_routing dimension - represents current state-of-the-art for complex coding tasks.
Key Findings:
80.9% on SWE-bench Verified - first AI model over 80%
Gemini's 1M token context window is among the largest available, enabling whole-codebase understanding. Key for context_curation and model_routing dimensions.
Key Findings:
Gemini models support up to 1M token context window (1,048,576 tokens)
Can process hours of video, audio, and 60,000+ lines of code in single context
Gemini 2.5 Pro, 2.5 Flash, 3.0 Pro, 3.0 Flash all support 1M tokens
Gemini 2.5 Pro (March 2025) introduced 'thinking models' with 1M context. Deep Think mode extends inference time for complex reasoning tasks, achieving Bronze IMO performance. Gemini 3 Pro announced November 2025 replaces 2.5 as the flagship. Critical for understanding the 'reasoning model' paradigm shift and extended thinking capabilities.
Key Findings:
Gemini 2.5 Pro released March 2025 with 1M token context window
Deep Think mode uses extended inference time for complex reasoning
Are you aware of the cost differences between AI models?
[0]No, I don't think about cost
[1]Vaguely—I know some are more expensive
[2]Yes, I know approximate cost ratios
[3]Yes, I factor cost into model selection decisions
Anthropic API Pricing
Anthropic
Claude's tiered pricing shows a 25x cost difference from Opus ($5/$25) to Haiku ($1/$5) per million tokens, with Batch API offering 50% discounts and prompt caching up to 90% savings. Understanding these tiers is fundamental to cost-aware model routing - developers must evaluate whether tasks require Opus's advanced reasoning or if Haiku's speed and efficiency suffices. Key for model_routing dimension.
Key Findings:
Claude Opus 4.5: $5/$25 per million tokens (66% price drop from Opus 4.1)
When selecting a model for a task, do you consider speed/latency tradeoffs?
[0]No, I don't think about latency
[1]Sometimes—I notice when a model is slow but don't switch
[2]Yes, I use faster models for simple tasks to avoid waiting
[3]Yes, I balance latency, quality, and cost based on task urgency
Developers are choosing older AI models — and the data explain why
Augment Code
Data-driven analysis showing that in production environments, developers are diversifying model usage rather than consolidating around the newest option. Sonnet 4.5 excels at multi-file reasoning but introduces latency; Sonnet 4.0 is faster and more consistent for structured tasks; GPT-5 excels at explanatory contexts. This supports the need for model routing strategies rather than single-model approaches.
Key Findings:
Model adoption is diversifying, not consolidating around one 'best' model
Developers match models to specific task profiles rather than always using newest
Sonnet 4.5 share dropped from 66% to 52% while Sonnet 4.0 rose from 23% to 37%