Agentic Continuous Delivery (ACD)

Extend continuous delivery with constraints, first-class artifacts, and practices for AI agent-generated changes.

Agentic continuous delivery (ACD) defines the additional constraints and artifacts needed when AI agents contribute to the delivery pipeline. The pipeline must handle agent-generated work with the same rigor applied to human-generated work, and in some cases, more rigor. These constraints assume the team already practices continuous delivery. Without that foundation, the agentic extensions have nothing to extend.

Don't put the AI cart before the CI horse - Integrating AI is software engineering. To be great at this, you need to be great at DevOps and CI.

What Is ACD?

An agent-generated change must meet or exceed the same quality bar as a human-generated change. The pipeline does not care who wrote the code. It cares whether the code is correct, tested, and safe to deploy.

ACD is the application of continuous delivery in environments where software changes are proposed by agents. It exists to reliably constrain agent autonomy without slowing delivery.

Without additional artifacts beyond what human-driven CD requires, agent-generated code accumulates drift, quality issues, and technical debt faster than teams can detect it. By the time test coverage gaps or architectural drift surface in production incidents, the accumulated debt is too large to address incrementally. Six first-class artifacts and eight constraints address this.

Agents introduce unique challenges that require these additional constraints:

  • Agents can generate changes faster than humans can review them
  • Agents may lack context about organizational norms, business rules, or unstated constraints
  • Agents cannot exercise judgment about risk in the same way humans can
  • Agents may introduce subtle correctness issues that pass automated tests but violate intent

Before jumping into agentic workflows, ensure your team has the prerequisite delivery practices in place. The AI Adoption Roadmap provides a step-by-step sequence: quality tools, clear requirements, hardened guardrails, and reduced delivery friction, all before accelerating with AI coding.

What You’ll Find in This Section

  1. AI Adoption Roadmap - the prerequisite sequence before adopting agentic workflows
  2. Agent-Assisted Specification - how agents help sharpen intent, draft BDD scenarios, and surface gaps in the specification stages
  3. The Six First-Class Artifacts - detailed definitions with examples for each artifact that agents and humans must maintain
  4. Pipeline Enforcement and Expert Agents - how quality gates and expert validation agents enforce ACD constraints automatically
  5. Pitfalls and Metrics - common failure modes and how to measure whether ACD is working
  6. Tokenomics - how to architect agents and code to minimize unnecessary token consumption without sacrificing quality
  7. Small-Batch Sessions - how to structure agent sessions so context stays manageable and commits stay small

ACD Extensions to MinimumCD

ACD extends MinimumCD by the following constraints:

  1. Explicit, human-owned intent exists for every change
  2. Intent and architecture are represented as first-class artifacts
  3. All first-class artifacts are versioned and delivered together with the change
  4. Intended behavior is represented independently of implementation
  5. Consistency between intent, tests, implementation, and architecture is enforced
  6. Agent-generated changes must comply with all documented constraints
  7. Agents implementing changes must not be able to promote those changes to production
  8. While the pipeline is red, agents may only generate changes restoring pipeline health

These constraints are not mandatory practices. They describe the minimum conditions required to sustain delivery pace once agents are making changes to the system.

The Six First-Class Artifacts

Every first-class artifact is part of the delivery contract, not a convenience. Agents may read any or all artifacts. Agents may generate some artifacts. Agents may not redefine the authority of any artifact. Humans own the accountability.

  1. Intent Description - why the change exists (human-owned)
  2. User-Facing Behavior - what users experience (externally observable)
  3. Feature Description - architectural trade-offs and constraints (engineering-owned)
  4. Executable Truth - automated tests that make intent falsifiable (pipeline-enforced)
  5. Implementation - the code (fully constrained by other artifacts)
  6. System Constraints - global invariants (system-level rules)

These artifacts are intentionally overlapping in content but non-overlapping in authority. When an agent detects a conflict between artifacts, it cannot resolve that conflict by modifying the artifact it does not own. See The Six First-Class Artifacts for the authority hierarchy, detailed definitions, and examples.

The ACD Workflow

When an AI agent contributes to a CD pipeline, the workflow extends the standard pipeline:

Stage Actor Activity
Intent Definition Human Define Intent Description (why the change exists)
Behavior Specification Human Define User-Facing Behavior (BDD scenarios, the functional tests)
Architecture Specification Human Define Feature Description (architecture, constraints, performance budgets)
Acceptance Criteria Human Define acceptance criteria for non-functional tests (latency thresholds, security requirements, resource limits)
Test Generation Agent Generate test code from Behavior Specification, Architecture Specification, and Acceptance Criteria
Test Validation Human → Agent Validate test code is decoupled from implementation and faithful to specs
Implementation Agent Generate implementation
Pipeline Verification Pipeline Validate implementation against executable truth (automated tests)
Code Review Human → Agent Review implementation (code review)
Deployment Pipeline Deploy (same pipeline as any other change)

Behavior Specification, Architecture Specification, and Acceptance Criteria together define the complete Executable Truth specification. Behavior Specification covers what the user experiences (BDD scenarios become the functional tests). Architecture Specification and Acceptance Criteria cover what the system must satisfy beyond user-visible behavior: performance budgets, security constraints, architectural boundaries, and operational requirements.

Key differences from standard CD:

  • The four specification stages (Intent Definition through Acceptance Criteria) happen before any code generation. Specification-first should already be standard practice without agents. Every downstream stage - Test Generation, Implementation, Code Review, and Deployment - depends on the quality of these specifications. With agents, that dependency becomes absolute: an agent cannot compensate for missing or ambiguous specifications the way a human sometimes can. This is not big upfront design. You specify the next small step, not the entire feature set. See Agent-Assisted Specification for how agents make this work fast.
  • Test Generation and Test Validation separate test definition from test code. Teams often conflate the two because they happen at the same time, but they are distinct activities. Defining tests means deciding what scenarios, edge cases, and acceptance criteria to verify. Test code is the machine-runnable implementation of those decisions. Humans define the tests before development begins. Agents generate the test code, which must be validated for behavior focus and spec fidelity before implementation starts.
  • System constraints are checked automatically in the pipeline during Pipeline Verification. This is standard CD practice. The difference is that agents require these constraints to be stated explicitly as artifacts rather than carried as team knowledge.

Migrating Test Validation and Code Review to expert agents

Manual review at Test Validation and Code Review is a deliberate interim state, not the design. Every manual validation creates a batching point - failures become harder to trace, feedback loops extend, and unvalidated changes accumulate. When agents generate changes faster than humans review them, wait time dominates the delivery cycle.

The target state replaces manual review with expert validation agents using the same replacement cycle used throughout the CD migration:

Stage Starting State Target State
Test Validation Human validates test code Expert agent validates test code is decoupled from implementation and faithful to specs; human reviews exceptions
Code Review Human reviews implementation Expert agent validates architectural conformance and intent alignment; human reviews agent-flagged concerns
  1. Start with human validation only (the workflow as shown above)
  2. Deploy an expert agent that runs in parallel with the human reviewer
  3. Compare results until you are confident the agent matches or exceeds human judgment
  4. Shift the human role from “review everything” to “review what the agent flags and spot-check according to risk”

What does not migrate: The four specification stages (Intent Definition through Acceptance Criteria) remain human responsibilities. Defining intent, specifying behavior, documenting architecture, and setting acceptance criteria require judgment about what matters to the business and the user. Agents validate whether specifications are met. Humans decide what the specifications should be.

See Pipeline Enforcement and Expert Agents for the full set of expert agents and how to adopt them.


Content contributed by Michael Kusters and Bryan Finster. Image contributed by Scott Prugh.


AI Adoption Roadmap

A prescriptive guide for incorporating AI into your delivery process safely - remove friction and add safety before accelerating with AI coding.

Agent-Assisted Specification

How to use agents as collaborators during specification and why small-scope specification is not big upfront design.

The Six First-Class Artifacts

Detailed definitions and examples for the six artifacts that agents and humans must maintain in an ACD pipeline.

Pipeline Enforcement and Expert Agents

How quality gates enforce ACD constraints and how expert validation agents extend the pipeline beyond standard tooling.

Pitfalls and Metrics

Common failure modes when adopting ACD and the metrics that tell you whether it is working.

Tokenomics: Optimizing Token Usage in Agent Architecture

How to architect agents and code to minimize unnecessary token consumption without sacrificing quality or capability.

Small-Batch Agent Sessions

How to structure agent sessions so context stays manageable, commits stay small, and the pipeline stays green.

Recommended Agent Configuration for Coding and Review

A recommended orchestrator, agent, and sub-agent configuration for coding and pre-commit review, with rules, skills, and hooks mapped to the defect sources catalog.