1. Define completion criteria and capability/regression evals.
2. Run baseline evals and capture failure signatures.
3. Decompose the task into agent-sized units (15-minute rule).
4. Route each unit to the appropriate model tier (Haiku, Sonnet, Opus).
5. Execute implementation for each unit.
6. Re-run evals and compare deltas.
7. Review code focusing on invariants, edge cases, and security.
8. Track cost metrics and escalate model tier only when necessary.