Recommended Agent Configuration for Coding and Review
19 minute read
Standard pre-commit tooling catches mechanical defects. The agent configuration described here covers what standard tooling cannot: semantic logic errors, subtle security patterns, missing timeout propagation, and concurrency anti-patterns. Both layers are required. Neither replaces the other.
For the pre-commit gate sequence this configuration enforces, see the Pipeline Reference Architecture. For the defect sources each gate addresses, see the Systemic Defect Fixes catalog.
System Architecture
The coding agent system has two levels. The orchestrator manages sessions and routes work. Specialized agents execute within a session’s boundaries. Review sub-agents run in parallel as a pre-commit gate, each responsible for exactly one defect concern.
Separation principle: The orchestrator does not write code. The implementation agent does not review code. Review agents do not modify code. Each agent has one responsibility. This is the same separation of concerns that pipeline enforcement applies at the CI level - brought to the pre-commit level.
Every agent boundary is a token budget boundary. What the orchestrator passes to the implementation agent, what it passes to the review orchestrator, and what each sub-agent receives and returns are all token cost decisions. The configuration below applies the tokenomics strategies concretely: model routing by task complexity, structured outputs between agents, prompt caching through stable system prompts placed first in each context, and minimum-necessary-context rules at every boundary.
The Orchestrator
The orchestrator manages session lifecycle and controls what context each agent receives. It does not generate implementation code. Its job is routing and context hygiene.
Recommended model tier: Small to mid. The orchestrator routes, assembles context, and writes session summaries. It does not reason about code. A frontier model here wastes tokens on a task that does not require frontier reasoning.
Responsibilities:
- Initialize each session with the correct context subset (per Small-Batch Sessions)
- Delegate implementation to the implementation agent
- Trigger the review orchestrator when the implementation agent reports completion
- Write the session summary on commit and reset context for the next session
- Enforce the pipeline-red rule (ACD constraint 8): if the pipeline is failing, route only to pipeline-restore mode; block new feature work
Rules injected into the orchestrator system prompt:
The Implementation Agent
The implementation agent generates test code and production code for the current BDD scenario. It operates within the context the orchestrator provides and does not reach outside that context.
Recommended model tier: Mid to frontier. Code generation and test-first implementation require strong reasoning. This is the highest-value task in the session - invest model capability here. Output verbosity should be controlled explicitly: the agent returns code only, not explanations or rationale, unless the orchestrator requests them.
Receives from the orchestrator:
- Intent summary
- The one BDD scenario for this session
- Feature description (constraints, architecture, performance budgets)
- Relevant existing files
- Prior session summary
Rules injected into the implementation agent system prompt:
The Review Orchestrator
The review orchestrator runs between implementation complete and commit. It invokes all four review sub-agents in parallel against the staged diff, collects their findings, and returns a single structured decision.
Recommended model tier: Small. The review orchestrator does no reasoning itself - it invokes sub-agents and aggregates their structured output. A small model handles this coordination cheaply.
Receives:
- The staged diff for this session
- The BDD scenario being implemented (for intent alignment checks)
- The feature description (for architectural constraint checks)
Returns: A JSON object so the orchestrator can parse findings without a natural language step. Structured output here eliminates ambiguity and reduces the token cost of the aggregation step.
An empty findings array with "decision": "pass" means all sub-agents passed. A
non-empty findings array always accompanies "decision": "block".
Rules injected into the review orchestrator system prompt:
Review Sub-Agents
Each sub-agent covers exactly one defect concern from the Systemic Defect Fixes catalog. They receive only the diff and the artifacts relevant to their specific check - not the full session context.
Semantic Review Agent
Recommended model tier: Mid to frontier. Logic correctness and intent alignment require genuine reasoning - a model that can follow execution paths, infer edge cases, and compare implementation against stated intent.
Defect sources addressed:
- Reliance on human review to catch preventable defects (Process & Deployment)
- Implicit domain knowledge not in code (Knowledge & Communication)
- Untested edge cases and error paths (Testing & Observability Gaps)
What it checks:
- Logic correctness: does the implementation produce the outputs the scenario specifies?
- Edge case coverage: does the implementation handle boundary values and error paths, or only the happy path the scenario explicitly describes?
- Intent alignment: does the implementation address the problem stated in the intent summary, or does it technically satisfy the test while missing the point?
- Test coupling: does the test verify observable behavior, or does it assert on implementation internals? (See Implementation Coupling Agent)
System prompt rules:
Security Review Agent
Recommended model tier: Mid to frontier. Identifying second-order injection, subtle authorization gaps, and missing audit events requires understanding data flow semantics, not just pattern matching. A smaller model will miss the cases that matter most.
Defect sources addressed:
- Injection vulnerabilities (subtle patterns beyond basic SAST) (Security & Compliance)
- Authentication and authorization gaps (Security & Compliance)
- Missing audit trails (Security & Compliance)
What it checks:
- Second-order injection and injection vectors that pattern-matching SAST rules miss
- Code paths that process user-controlled input without validation at the boundary
- State-changing operations that lack an authorization check
- State-changing operations that do not emit a structured audit event
- Privilege escalation patterns
Context it receives:
- Staged diff only; no broader system context needed
System prompt rules:
Performance Review Agent
Recommended model tier: Small to mid. Timeout and resource leak detection is primarily structural pattern recognition: find external calls, check for timeout configuration, trace resource allocations to their cleanup paths. A small to mid model handles this well and runs cheaply enough to be invoked on every commit without concern.
Defect sources addressed:
- Missing timeout and deadline enforcement (Performance & Resilience)
- Resource leaks (Performance & Resilience)
- Missing graceful degradation (Performance & Resilience)
What it checks:
- External calls (HTTP, database, queue, cache) without timeout configuration
- Timeout values that are set but not propagated through the call chain
- Resource allocations (connections, file handles, threads) without corresponding cleanup
- Calls to external dependencies with no fallback or circuit breaker when the feature description specifies a resilience requirement
Context it receives:
- Staged diff
- Feature description (for performance budgets and resilience requirements)
System prompt rules:
Concurrency Review Agent
Recommended model tier: Mid. Concurrency defects require reasoning about execution ordering and shared state - more than pattern matching but less open-ended than security semantics. A mid-tier model balances reasoning depth and cost here.
Defect sources addressed:
- Race conditions (anti-patterns beyond thread sanitizer detection) (Integration & Boundaries)
- Concurrency and ordering issues (Data & State)
What it checks:
- Shared mutable state accessed from concurrent paths without synchronization
- Operations that assume a specific ordering without enforcing it
- Anti-patterns that thread sanitizers cannot detect at static analysis time: check-then-act sequences, non-atomic read-modify-write operations, and missing idempotency in message consumers
System prompt rules:
Skills
Skills are reusable session procedures invoked by name. They encode the session discipline from Small-Batch Sessions so the orchestrator does not have to re-derive it each time.
/start-session
Loads the session context and prepares the implementation agent.
/review
Invokes the review orchestrator against all staged changes.
/end-session
Closes the session, validates all gates, writes the summary, and commits.
/fix
Enters pipeline-restore mode when the pipeline is red.
Hooks
Hooks run automatically as part of the commit process. They execute standard tooling - fast, deterministic, and free of AI cost - before the review orchestrator runs. The review orchestrator only runs if the hooks pass.
Pre-commit hook sequence:
Why the hook sequence matters: Standard tooling runs first because it is faster and cheaper than AI review. If the linter fails, there is no reason to invoke the review orchestrator. Deterministic checks fail fast; AI review runs only on changes that pass the baseline mechanical checks.
Token Budget
The tokenomics strategies apply directly to this configuration. Three decisions have the most impact on cost per session.
Model routing
Matching model tier to task complexity is the highest-leverage cost decision. Applied to this configuration:
| Agent | Recommended Tier | Why |
|---|---|---|
| Orchestrator | Small to mid | Routing and context assembly; no code reasoning required |
| Implementation Agent | Mid to frontier | Core code generation; the task that justifies frontier capability |
| Review Orchestrator | Small | Coordination only; returns structured output from sub-agents |
| Semantic Review | Mid to frontier | Logic and intent reasoning; requires genuine inference |
| Security Review | Mid to frontier | Security semantics; pattern-matching is insufficient |
| Performance Review | Small to mid | Structural pattern recognition; timeout and resource signatures |
| Concurrency Review | Mid | Concurrent execution semantics; more than patterns, less than security |
Running the implementation agent on a frontier model and routing the review orchestrator and performance review agent to smaller models cuts the token cost of a full session substantially compared to using one model for everything.
Prompt caching
Each agent’s system prompt rules block is stable across every invocation. Place it at the top of every agent’s context - before the diff, before the session summary, before any dynamic content. This structure allows the server to cache the rules prefix and amortize its input cost across repeated calls.
The /start-session and /review skills assemble context in this order:
- Agent system prompt rules (stable - cached)
- Feature description (stable within a feature - often cached)
- BDD scenario for this session (changes per session)
- Staged diff or relevant files (changes per call)
- Prior session summary (changes per session)
Measuring cost per session
Track token spend at the session level, not the call level. A session that costs 10x the average is a design problem - usually an oversized context bundle passed to the implementation agent, or a review sub-agent receiving more content than its check requires.
Metrics to track per session:
- Total input tokens (implementation agent call + review sub-agent calls)
- Total output tokens (implementation output + review findings)
- Review block rate (how often the session cannot commit on first pass)
- Tokens per retry (cost of each implementation-review-fix cycle)
A rising per-session cost with a stable block rate means context is growing unnecessarily. A rising block rate without rising cost means the review agents are finding real issues without accumulating noise. See Tokenomics for the full measurement framework.
Defect Source Coverage
This table maps each pre-commit defect source to the mechanism that covers it.
| Defect Source | Catalog Section | Covered By |
|---|---|---|
| Code style violations | Process & Deployment | Lint hook |
| Null/missing data assumptions | Data & State | Type-check hook |
| Secrets in source control | Security & Compliance | Secret-scan hook |
| Injection (pattern-matching) | Security & Compliance | SAST hook |
| Accessibility (structural) | Product & Discovery | Accessibility-lint hook |
| Race conditions (detectable) | Integration & Boundaries | Thread sanitizer (language-specific) |
| Logic errors, edge cases | Process & Deployment | Semantic review agent |
| Implicit domain knowledge | Knowledge & Communication | Semantic review agent |
| Untested paths | Testing & Observability Gaps | Semantic review agent |
| Injection (semantic/second-order) | Security & Compliance | Security review agent |
| Auth/authz gaps | Security & Compliance | Security review agent |
| Missing audit trails | Security & Compliance | Security review agent |
| Missing timeouts | Performance & Resilience | Performance review agent |
| Resource leaks | Performance & Resilience | Performance review agent |
| Missing graceful degradation | Performance & Resilience | Performance review agent |
| Race condition anti-patterns | Integration & Boundaries | Concurrency review agent |
| Non-idempotent consumers | Data & State | Concurrency review agent |
Defect sources not in this table are addressed at CI or acceptance test stages, not at pre-commit. See the Pipeline Reference Architecture for the full gate sequence.
Related Content
- Pipeline Enforcement and Expert Agents - how the same review agents operate as CI pipeline gates, not just pre-commit
- Small-Batch Sessions - the session discipline the orchestrator and skills enforce
- Tokenomics - the full optimization framework: model routing, context hygiene, structured outputs, prompt caching, and workflow-level measurement
- The Six First-Class Artifacts - the artifacts the implementation agent receives and the review agents verify against
- Pipeline Reference Architecture - the full gate sequence from pre-commit through production verification
- Systemic Defect Fixes - the defect source catalog that defines what each review agent is responsible for catching