Agentic CD
10 minute read
Phase 4 - Deliver on Demand | Adapted from MinimumCD.org
As AI coding agents become capable of generating production-ready code changes, the continuous deployment pipeline must evolve to handle agent-generated work with the same rigor applied to human-generated work - and in some cases, more rigor. Agentic CD defines the additional constraints and artifacts needed when agents contribute to the delivery pipeline.
What Is Agentic CD?
Agentic CD extends the Minimum CD framework to address a new category of contributor: AI agents that can generate, test, and propose code changes. These agents may operate autonomously (generating changes without human prompting) or collaboratively (assisting a human developer).
The core principle is simple: an agent-generated change must meet or exceed the same quality bar as a human-generated change. The pipeline does not care who wrote the code. It cares whether the code is correct, tested, and safe to deploy.
But agents introduce unique challenges that require additional constraints:
- Agents can generate changes faster than humans can review them
- Agents may lack context about organizational norms, business rules, or unstated constraints
- Agents cannot currently exercise judgment about risk in the same way humans can
- Agents may introduce subtle correctness issues that pass automated tests but violate intent
The Six First-Class Artifacts
Agentic CD defines six artifacts that must be explicitly maintained in a delivery pipeline that includes AI agents. These artifacts exist in human-driven CD as well, but they are often implicit. When agents are involved, they must be explicit.
1. Intent Description
What it is: A human-readable description of the desired change, written by a human.
Why it matters for agentic CD: The intent description is the agent’s “prompt” in the broadest sense. It defines what the change should accomplish, not how. Without a clear intent description, the agent may generate technically correct code that does not match what was needed.
Example:
Key property: The intent description is authored by a human. It is the human’s specification of what the agent should achieve. The agent does not write or modify the intent description.
2. User-Facing Behavior
What it is: A description of how the system should behave from the user’s perspective, expressed as observable outcomes.
Why it matters for agentic CD: Agents can generate code that satisfies tests but does not produce the expected user experience. User-facing behavior descriptions bridge the gap between technical correctness and user value.
Format: BDD scenarios work well here (see Small Batches):
3. Feature Description
What it is: A technical description of the feature’s architecture, dependencies, and integration points.
Why it matters for agentic CD: Agents need explicit architectural context that human developers often carry in their heads. The feature description tells the agent where the change fits in the system, what components it touches, and what constraints apply.
Example:
4. Executable Truth
What it is: Automated tests that define the correct behavior of the system. These tests are the authoritative source of truth for what the code should do.
Why it matters for agentic CD: For human developers, tests verify the code. For agent-generated code, tests also constrain the agent. If the tests are comprehensive, the agent cannot generate incorrect code that passes. If the tests are shallow, the agent can generate code that passes tests but does not satisfy the intent.
Key principle: Executable truth must be written or reviewed by a human before the agent generates the implementation. This inverts the common practice of writing tests after code. In agentic CD, the tests come first because they are the specification.
5. Implementation
What it is: The actual code that implements the feature. In agentic CD, this may be generated entirely by the agent, co-authored by agent and human, or authored by a human with agent assistance.
Why it matters for agentic CD: The implementation is the artifact most likely to be agent-generated. The key requirement is that it must satisfy the executable truth (tests), conform to the feature description (architecture), and achieve the intent description (purpose).
Review requirements: Agent-generated implementation must be reviewed by a human before merging to trunk. The review focuses on:
- Does the implementation match the intent? (Not just “does it pass tests?”)
- Does it follow the architectural constraints in the feature description?
- Does it introduce unnecessary complexity, dependencies, or security risks?
- Would a human developer on the team understand and maintain this code?
6. System Constraints
What it is: Non-functional requirements, security policies, performance budgets, and organizational rules that apply to all changes.
Why it matters for agentic CD: Human developers internalize system constraints through experience and team norms. Agents need these constraints stated explicitly.
Examples:
The Agentic CD Workflow
When an AI agent contributes to a CD pipeline, the workflow extends the standard CD pipeline:
1. HUMAN writes Intent Description
2. HUMAN writes or reviews User-Facing Behavior (BDD scenarios)
3. HUMAN writes or reviews Feature Description (architecture)
4. HUMAN writes or reviews Executable Truth (tests)
5. AGENT generates Implementation (code)
6. PIPELINE validates Implementation against Executable Truth (automated tests)
7. HUMAN reviews Implementation (code review)
8. PIPELINE deploys (same pipeline as any other change)
Key differences from standard CD:
- Steps 1-4 happen before the agent generates code (test-first is mandatory, not optional)
- Step 7 (human review) is mandatory for agent-generated code
- System constraints are checked automatically in the pipeline (Step 6)
Constraints for Agent-Generated Changes
Beyond the six artifacts, agentic CD imposes additional constraints on agent-generated changes:
Change Size Limits
Agent-generated changes must be small. Large agent-generated changes are harder to review and more likely to contain subtle issues.
Guideline: An agent-generated change should modify no more files and no more lines than a human would in a single commit. If the change is larger, break it into multiple sequential changes.
Mandatory Human Review
Every agent-generated change must be reviewed by a human before merging to trunk. This is a non-negotiable constraint. The purpose is not to check the agent’s “work” in a supervisory sense - it is to verify that the change matches the intent and fits the system.
Comprehensive Test Coverage
Agent-generated code must have higher test coverage than the team’s baseline. If the team’s baseline is 80% coverage, agent-generated code should target 90%+. This compensates for the reduced human oversight of the implementation details.
Provenance Tracking
The pipeline must record which changes were agent-generated, which agent generated them, and what prompt or intent description was used. This supports audit, debugging, and learning.
Getting Started with Agentic CD
Before jumping into agentic workflows, ensure your team has the prerequisite delivery practices in place. The AI Adoption Roadmap provides a step-by-step sequence: quality tools, clear requirements, hardened guardrails, and reduced delivery friction - all before accelerating with AI coding.
Phase 1: Agent as Assistant
The agent helps human developers write code, but the human makes all decisions and commits all changes. The pipeline does not know or care about agent involvement.
This is where most teams are today. It requires no pipeline changes.
Phase 2: Agent as Contributor
The agent generates complete changes based on intent descriptions and executable truth. A human reviews and merges. The pipeline validates.
Requires: Explicit intent descriptions, test-first workflow, human review gate.
Phase 3: Agent as Autonomous Contributor
The agent generates, tests, and proposes changes with minimal human involvement. Human review is still mandatory, but the agent handles the full cycle from intent to implementation.
Requires: All six first-class artifacts, comprehensive system constraints, provenance tracking, and high confidence in the executable truth.
Key Pitfalls
1. “We let the agent generate tests and code together”
If the agent writes both the tests and the code, the tests may be designed to pass the code rather than to verify the intent. Tests must be written or reviewed by a human before the agent generates the implementation. This is the most important constraint in agentic CD.
2. “The agent generates changes faster than we can review them”
This is a feature, not a bug - but only if you have the discipline to not merge unreviewed changes. The agent’s speed should not pressure humans to review faster. WIP limits apply: if the review queue is full, the agent stops generating new changes.
3. “We trust the agent because it passed the tests”
Passing tests is necessary but not sufficient. Tests cannot verify intent, architectural fitness, or maintainability. Human review remains mandatory.
4. “We don’t track which changes are agent-generated”
Without provenance tracking, you cannot learn from agent-generated failures, audit agent behavior, or improve the agent’s constraints over time. Track provenance from the start.
Measuring Success
| Metric | Target | Why It Matters |
|---|---|---|
| Agent-generated change failure rate | Equal to or lower than human-generated | Confirms agent changes meet the same quality bar |
| Review time for agent-generated changes | Comparable to human-generated changes | Confirms changes are reviewable, not rubber-stamped |
| Test coverage for agent-generated code | Higher than baseline | Confirms the additional coverage constraint is met |
| Agent-generated changes with complete artifacts | 100% | Confirms the six-artifact workflow is being followed |
Next Step
For real-world examples of teams that have made the full journey to continuous deployment, see Experience Reports.
This content is adapted from MinimumCD.org, licensed under CC BY 4.0.