This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Reference

Practice definitions, metrics, glossary, and other reference material.

Look up definitions, check metrics, or find resources for deeper reading.

Sections

1 - Pipeline Reference Architecture

Pipeline reference architectures for single-team, multi-team, and distributed service delivery, with quality gates sequenced by defect detection priority.

This section defines quality gates sequenced by defect detection priority and three pipeline patterns that apply them. Quality gates are derived from the Systemic Defect Fixes catalog and sequenced so the cheapest, fastest checks run first.

Gates marked with [Pre-Feature] must be in place and passing before any new feature work begins. They form the baseline safety net that every commit runs through. Adding features without these gates means defects accumulate faster than the team can detect them.

Gates marked with are enhanced by AI - the AI shifts detection earlier or catches issues that rule-based tools miss. See the Systemic Defect Fixes catalog for details.

Quality Gates in Priority Sequence

The gate sequence follows a single principle: fail fast, fail cheap. Gates that catch the most common defects with the least execution time run first. Each gate listed below maps to one or more defect sources from the catalog.

Pre-commit Gates

These run on the developer’s machine before code leaves the workstation. They provide sub-second to sub-minute feedback.

GateDefect Sources AddressedCatalog SectionPre-Feature
Linting and formattingCode style consistency, preventable review noiseProcess & DeploymentRequired
Static type checkingNull/missing data assumptions, type mismatchesData & StateRequired
Secret scanningSecrets committed to source controlSecurity & ComplianceRequired
SAST (injection patterns)Injection vulnerabilities, taint analysisSecurity & ComplianceRequired
Race condition detectionRace conditions (thread sanitizers, where language supports it)Integration & Boundaries
Accessibility lintingMissing alt text, ARIA violations, contrast failuresProduct & Discovery
Solitary and sociable unit testsLogic errors, unintended side effects, edge casesChange & ComplexityRequired
Contract testsInterface mismatches, wrong assumptions about external system boundariesIntegration & BoundariesRequired
Timeout enforcement checksMissing timeout and deadline enforcementPerformance & Resilience
AI semantic code reviewLogic errors, missing edge cases, subtle injection vectors beyond pattern matchingProcess & Deployment, Security & Compliance

CI Stage 1: Build and Fast Tests < 5 min

These run on every commit to trunk.

GateDefect Sources AddressedCatalog SectionPre-Feature
All pre-commit gatesRe-run in CI to catch anything bypassed locallySee Pre-commit GatesRequired
Compilation / buildBuild reproducibility, dependency resolutionDependency & InfrastructureRequired
Dependency vulnerability scan (SCA)Known vulnerabilities in dependenciesSecurity & ComplianceRequired
License compliance scanLicense compliance violationsSecurity & Compliance
Code complexity and duplication scoringAccumulated technical debtChange & Complexity
AI change impact analysisSemantic blast radius of changes; unintended side effects beyond syntactic dependenciesChange & Complexity
AI vulnerability reachability analysisCorrelate CVEs with actual code usage paths to prioritize exploitable risks over theoretical onesSecurity & Compliance
Stage duration warningWarn if Stage 1 exceeds 10 minutes; slow fast-feedback loops mask defects and delay trunk integrationProcess & Deployment

CD Stage 1: Contract and Boundary Validation < 10 min

These validate boundaries between components.

GateDefect Sources AddressedCatalog SectionPre-Feature
Contract testsInterface mismatches, wrong assumptions about upstream/downstreamIntegration & BoundariesRequired
Schema migration validationSchema migration and backward compatibility failuresData & StateRequired
Infrastructure-as-code drift detectionConfiguration drift, environment differencesDependency & Infrastructure
Environment parity checksTest environments not reflecting productionTesting & Observability Gaps
AI boundary coverage analysisIntegration boundaries missing contract tests; semantic service relationship mappingTesting & Observability Gaps
AI behavioral assumption detectionUndocumented assumptions at service boundaries that contract tests don’t coverIntegration & Boundaries

CD Stage 2: Broader Automated Verification < 15 min

These run in parallel where possible.

GateDefect Sources AddressedCatalog SectionPre-Feature
Mutation testingUntested edge cases and error paths, weak assertionsTesting & Observability Gaps
Performance benchmarksPerformance regressionsPerformance & Resilience
Resource leak detectionResource leaks (memory, connections)Performance & Resilience
Security integration testsAuthentication and authorization gapsSecurity & Compliance
Compliance-as-code policy checksRegulatory requirement gaps, missing audit trailsSecurity & Compliance
SBOM generationLicense compliance, dependency transparencySecurity & Compliance
Automated WCAG compliance scanFull-page rendered accessibility checks with browser automationProduct & Discovery
AI edge case test generationUntested boundaries and error conditions identified from code path analysisTesting & Observability Gaps
AI authorization path analysisMissing authorization checks and privilege escalation patterns in code pathsSecurity & Compliance
AI resilience reviewSingle points of failure and missing fallback paths in architecturePerformance & Resilience
AI regulatory mappingMap regulatory requirements to implementation artifacts; flag uncovered controlsSecurity & Compliance

Acceptance Tests < 20 min

These validate user-facing behavior in a production-like environment.

GateDefect Sources AddressedCatalog SectionPre-Feature
Acceptance testsImplementation does not match acceptance criteriaProduct & Discovery
Load and capacity testsUnknown capacity limits, slow response timesPerformance & Resilience
Chaos and resilience testsNetwork partition handling, missing graceful degradationPerformance & Resilience
Cache invalidation verificationCache invalidation errorsData & State
Feature interaction testsUnanticipated feature interactionsChange & Complexity
AI intent alignment reviewAcceptance criteria vs. user behavior data misalignment; specs that meet the letter but miss the intentProduct & Discovery

Out-of-Pipeline Verification

The following checks are non-deterministic - they depend on live environments, external systems, or real user behavior - and cannot be made into blocking pipeline gates without coupling your ability to deploy to factors outside your control. They run asynchronously or post-deployment and back up the deterministic pipeline with a continuous safety net. Failures trigger review, alerts, or rollback decisions. They never block a commit from reaching production.

Integration Tests (Post-Deploy)

Integration tests validate that the test doubles used in contract tests still match the real services they simulate. They are non-deterministic because they exercise real service boundaries and their results depend on the current state of those services. They run on a schedule or post-deployment - not on every commit - and failures trigger review, not a pipeline block.

CheckDefect Sources AddressedCatalog SectionPre-Feature
Provider verificationInterface drift between contract test doubles and real servicesIntegration & BoundariesRequired
Cross-service integration validationBreaking changes at real service boundariesIntegration & BoundariesRequired
AI boundary coverage analysisIntegration boundaries missing contract tests; semantic service relationship mappingTesting & Observability Gaps
AI behavioral assumption detectionUndocumented assumptions at service boundaries that contract tests don’t coverIntegration & Boundaries

Production Verification

These run during and after deployment. They are not optional - they close the feedback loop.

GateDefect Sources AddressedCatalog SectionPre-Feature
Health checks with auto-rollbackInadequate rollback capabilityProcess & Deployment
Canary or progressive deploymentBatching too many changes per releaseProcess & Deployment
Real user monitoring and SLO checksSlow user-facing response times, product-market misalignmentPerformance & Resilience
Structured audit logging verificationMissing audit trailsSecurity & Compliance
AI change risk scoringAutomated risk assessment from change diff, deployment history, and blast radius analysisProcess & Deployment

Pre-Feature Baseline


Pipeline Patterns

These three patterns apply the quality gates above to progressively more complex team and deployment topologies. Most organizations start with Pattern 1 and evolve toward Pattern 3 as team count and deployment independence requirements grow.

  1. Single Team, Single Deployable - one team owns one modular monolith with a linear pipeline
  2. Multiple Teams, Single Deployable - multiple teams own sub-domain modules within a shared modular monolith, each with its own sub-pipeline feeding a thin integration pipeline
  3. Independent Teams, Independent Deployables - each team owns an independently deployable service with its own full pipeline and API contract verification

Mapping to the Defect Sources Catalog

Each quality gate above is derived from the Systemic Defect Fixes catalog. The catalog organizes defects by origin - product and discovery, integration, knowledge, change and complexity, testing gaps, process, data, dependencies, security, and performance. The pipeline gates are the automated enforcement points for the systemic prevention strategies described in the catalog.

Gates marked with correspond to catalog entries where AI shifts detection earlier than current rule-based automation. For expert agent patterns that implement these gates in an agentic CD context, see ACD Pipeline Enforcement.

When adding or removing gates, consult the catalog to ensure that no defect category loses its detection point. A gate that seems redundant may be the only automated check for a specific defect source.

Further Reading

For a deeper treatment of pipeline design, stage sequencing, and deployment strategies, see Dave Farley’s Continuous Delivery Pipelines which covers pipeline architecture patterns in detail.

1.1 - Single Team, Single Deployable

A linear pipeline pattern for a single team owning a modular monolith.

This architecture suits a team of up to 8-10 people owning a modular monolith - a single deployable application with well-defined internal module boundaries. The codebase is organized by domain, not by technical layer. Each module encapsulates its own data, logic, and interfaces, communicating with other modules through explicit internal APIs. The application deploys as one unit, but its internal structure makes it possible to reason about, test, and change one module without understanding the entire codebase. The pipeline is linear with parallel stages where dependencies allow.

Pre-Feature Gate CI Stage Parallel Verification Acceptance Production
graph TD
    classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
    classDef ci fill:#224968,stroke:#1a3a54,color:#fff
    classDef parallel fill:#30648e,stroke:#224968,color:#fff
    classDef accept fill:#6c757d,stroke:#565e64,color:#fff
    classDef prod fill:#a63123,stroke:#8a2518,color:#fff

    A["Pre-commit Gates<br/><small>Lint, Types, Secrets, SAST</small>"]:::prefeature
    B["Build + Unit Tests"]:::prefeature
    C["Contract + Schema Tests"]:::prefeature
    D["Security Scans"]:::parallel
    E["Performance Benchmarks"]:::parallel
    F["Acceptance Tests<br/><small>Production-Like Env</small>"]:::accept
    G["Create Immutable Artifact"]:::ci
    H["Deploy Canary / Progressive"]:::prod
    I["Health Checks + SLO Monitors<br/>Auto-Rollback"]:::prod

    A -->|"commit to trunk"| B
    B --> C
    C --> D & E
    D --> F
    E --> F
    F --> G
    G --> H
    H --> I

Key Characteristics

  • One pipeline, one artifact: The entire application builds and deploys as a single immutable artifact. There is no fan-out or fan-in.
  • Linear with parallel branches: Security scans and performance benchmarks run in parallel because neither depends on the other. Everything else is sequential.
  • Trunk-based development: All developers commit to trunk at least daily. The pipeline runs on every commit.
  • Total target time: Under 15 minutes from commit to production-ready artifact. Acceptance tests may extend this to 20 minutes for complex applications.
  • Ownership: The team owns the pipeline definition, which lives in the same repository as the application code.

When This Architecture Breaks Down

This architecture stops working when:

  • The system becomes too large for a single team to manage.
  • Build times extend along with the ability to respond quickly even after optimization
  • Different parts of the application need different deployment cadences

When these symptoms appear, consider splitting into the multi-team architecture or decomposing the application into independently deployable services with their own pipelines.

1.2 - Multiple Teams, Single Deployable

A sub-pipeline pattern for multiple teams contributing domain modules to a shared modular monolith.

This architecture suits organizations where multiple teams contribute to a single deployable modular monolith - a common pattern for large applications, mobile apps, or platforms where the final artifact must be assembled from team contributions.

The modular monolith structure is what makes multi-team ownership possible. Each team owns a specific module representing a bounded sub-domain of the application. Team A might own checkout and payments, Team B owns inventory and fulfillment, Team C owns user accounts and authentication. Modules communicate through explicit internal APIs, not by reaching into each other’s database tables or calling private methods. Each team’s sub-pipeline validates only their module. A shared integration pipeline assembles and verifies the combined result.

This ownership model is critical. Without clear module boundaries, teams step on each other’s code, sub-pipelines trigger on unrelated changes, and merge conflicts replace pipeline contention as the bottleneck. The module split must follow the application’s domain boundaries, not its technical layers. A team that owns “the database layer” or “the API controllers” will always be coupled to every other team. A team that owns “payments” can change its database, API, and UI independently. If the codebase is not yet structured as a modular monolith, restructure it before adopting this architecture

  • otherwise the sub-pipelines will constantly interfere with each other.
graph TD
    classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
    classDef team fill:#224968,stroke:#1a3a54,color:#fff
    classDef integration fill:#30648e,stroke:#224968,color:#fff
    classDef prod fill:#a63123,stroke:#8a2518,color:#fff

    subgraph teamA ["Payments Sub-Domain (Team A)"]
        A1["Pre-commit Gates"]:::prefeature
        A2["Build + Unit Tests"]:::prefeature
        A3["Contract Tests"]:::prefeature
        A4["Security + Perf"]:::team
        A1 --> A2 --> A3 --> A4
    end

    subgraph teamB ["Inventory Sub-Domain (Team B)"]
        B1["Pre-commit Gates"]:::prefeature
        B2["Build + Unit Tests"]:::prefeature
        B3["Contract Tests"]:::prefeature
        B4["Security + Perf"]:::team
        B1 --> B2 --> B3 --> B4
    end

    subgraph teamC ["Accounts Sub-Domain (Team C)"]
        C1["Pre-commit Gates"]:::prefeature
        C2["Build + Unit Tests"]:::prefeature
        C3["Contract Tests"]:::prefeature
        C4["Security + Perf"]:::team
        C1 --> C2 --> C3 --> C4
    end

    subgraph integ ["Integration Pipeline"]
        I1["Assemble Combined Artifact"]:::integration
        I2["Integration Contract Tests"]:::integration
        I3["Acceptance Tests<br/><small>Production-Like Env</small>"]:::integration
        I4["Create Immutable Artifact"]:::integration
        I1 --> I2 --> I3 --> I4
    end

    A4 --> I1
    B4 --> I1
    C4 --> I1

    I4 --> D1["Deploy Canary / Progressive"]:::prod
    D1 --> D2["Health Checks + SLO Monitors<br/>Auto-Rollback"]:::prod

Key Characteristics

  • Module ownership by domain: Each team owns a bounded module of the application’s functionality. Ownership is defined by domain, not by technical layer. The team is responsible for all code, tests, and pipeline configuration within their module.
  • Team-owned sub-pipelines: Each team runs their own pre-commit, build, unit test, contract test, and security gates independently. A team’s sub-pipeline validates only their module and is their fast feedback loop.
  • Contract tests at both levels: Teams run contract tests in their sub-pipeline to catch boundary issues at the module edges. The integration pipeline runs cross-module contract tests to verify the assembled result.
  • Integration pipeline is thin: The integration pipeline does not re-run each team’s tests. It validates only what cannot be validated in isolation - cross-module integration, the assembled artifact, and end-to-end acceptance tests.
  • Sub-pipeline target time: Under 10 minutes. This is the team’s primary feedback loop and must stay fast.
  • Integration pipeline target time: Under 15 minutes. If it grows beyond this, the integration test suite needs decomposition or the application needs architectural changes to enable independent deployment.
  • Trunk-based development with path filters: All teams commit to the same trunk. Sub-pipelines trigger based on path filters aligned to module boundaries, so a change to the payments module does not trigger the inventory sub-pipeline.

Preventing the Integration Pipeline from Becoming a Bottleneck

The integration pipeline is a shared resource and the most likely bottleneck in this architecture. To keep it fast:

  1. Move tests left into sub-pipelines: Every test that can run in a sub-pipeline should run there. The integration pipeline should only contain tests that require the full assembled artifact.
  2. Use contract tests aggressively: Contract tests in sub-pipelines catch most integration issues without needing the full system. The integration pipeline’s contract tests are a verification layer, not the primary detection point.
  3. Run the integration pipeline on every commit to trunk: Do not batch. Batching creates large changesets that are harder to debug when they fail.
  4. Parallelize acceptance tests: Group acceptance tests by feature area and run groups in parallel.
  5. Monitor integration pipeline duration: Set an alert if it exceeds 15 minutes. Treat this the same as a failing test - fix it immediately.

When to Move Away from This Architecture

This architecture is a pragmatic pattern for organizations that cannot yet decompose their monolith into independently deployable services. The long-term goal is loose coupling - independent services with independent pipelines that do not need a shared integration step.

Signs you are ready to decompose:

  • Contract tests catch virtually all integration issues in sub-pipelines
  • The integration pipeline adds little value beyond what sub-pipelines already verify
  • Teams are blocked by integration pipeline queuing more than once per week
  • Different parts of the application need different deployment cadences

1.3 - Independent Teams, Independent Deployables

A fully independent pipeline pattern for teams deploying their own services in any order, with API contract verification replacing integration testing.

This is the target architecture for continuous delivery at scale. Each team owns an independently deployable service with its own pipeline, its own release cadence, and its own path to production. No team waits for another team to deploy. No integration pipeline serializes their work. The only shared infrastructure is the API contract layer that defines how services communicate.

This architecture demands disciplined API management. Without it, independent deployment is an illusion - teams deploy whenever they want, but they break each other constantly.

graph TD
    classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
    classDef team fill:#224968,stroke:#1a3a54,color:#fff
    classDef contract fill:#30648e,stroke:#224968,color:#fff
    classDef prod fill:#a63123,stroke:#8a2518,color:#fff
    classDef api fill:#6c757d,stroke:#565e64,color:#fff

    subgraph svcA ["Service A Pipeline (Team A)"]
        A1["Pre-commit Gates"]:::prefeature
        A2["Build + Unit Tests"]:::prefeature
        A3["Contract<br/>Verification"]:::prefeature
        A4["Security + Perf"]:::team
        A5["Acceptance Tests"]:::team
        A6["Create Immutable Artifact"]:::team
        A1 --> A2 --> A3 --> A4 --> A5 --> A6
    end

    subgraph svcB ["Service B Pipeline (Team B)"]
        B1["Pre-commit Gates"]:::prefeature
        B2["Build + Unit Tests"]:::prefeature
        B3["Contract<br/>Verification"]:::prefeature
        B4["Security + Perf"]:::team
        B5["Acceptance Tests"]:::team
        B6["Create Immutable Artifact"]:::team
        B1 --> B2 --> B3 --> B4 --> B5 --> B6
    end

    subgraph svcC ["Service C Pipeline (Team C)"]
        C1["Pre-commit Gates"]:::prefeature
        C2["Build + Unit Tests"]:::prefeature
        C3["Contract<br/>Verification"]:::prefeature
        C4["Security + Perf"]:::team
        C5["Acceptance Tests"]:::team
        C6["Create Immutable Artifact"]:::team
        C1 --> C2 --> C3 --> C4 --> C5 --> C6
    end

    subgraph apis ["API Schema Registry"]
        R1["Published API Schemas<br/><small>OpenAPI, AsyncAPI, Protobuf</small>"]:::api
        R2["Backward Compatibility<br/>Checks"]:::api
        R3["Consumer Pacts<br/><small>where available</small>"]:::api
        R1 --- R2 --- R3
    end

    A3 <-..->|"verify"| R3
    B3 <-..->|"verify"| R3
    C3 <-..->|"verify"| R3

    A6 --> A7["Deploy + Canary"]:::prod
    A7 --> A8["Health + SLOs"]:::prod

    B6 --> B7["Deploy + Canary"]:::prod
    B7 --> B8["Health + SLOs"]:::prod

    C6 --> C7["Deploy + Canary"]:::prod
    C7 --> C8["Health + SLOs"]:::prod
Pre-Feature Gate Team Pipeline API Schema Registry Production

Key Characteristics

  • Fully independent deployment: Each team deploys on its own schedule. Team A can deploy ten times a day while Team C deploys once a week. No coordination is required.
  • No shared integration pipeline: There is no fan-in step. Each pipeline goes straight from artifact creation to production. This eliminates the integration bottleneck entirely.
  • Contract tests replace integration tests: Instead of testing all services together, each team verifies its API contracts independently. The level of contract verification depends on how much coordination is possible between teams (see contract verification approaches below).
  • Each team owns its full pipeline: From pre-commit to production monitoring. No shared pipeline definitions, no central platform team gating deployments.

Why API Management Is Critical

Independent deployment only works when teams can change their service without breaking others. This requires a shared understanding of API boundaries that is enforced automatically, not through meetings or documents that drift.

Without API management, independent pipelines create independent failures. Teams deploy incompatible changes, discover the breakage in production, and revert to coordinated releases to stop the bleeding. This is worse than the multi-team architecture because it creates the illusion of independence while delivering the reliability of chaos.

What API Management Requires

  1. Published API schemas: Every service publishes its API contract (OpenAPI, AsyncAPI, Protobuf, or equivalent) as a versioned artifact. The schema is the source of truth for what the service provides.

  2. Contract verification (see approaches below): At minimum, providers verify backward compatibility against their own published schema. Where cross-team coordination is feasible, consumer-driven contracts add stronger guarantees.

  3. Backward compatibility enforcement: Every API change is checked for backward compatibility against the published schema. Breaking changes require a new API version using the expand-then-contract pattern:

    • Deploy the new version alongside the old
    • Migrate consumers to the new version
    • Remove the old version only after all consumers have migrated
  4. Schema registry: A central registry (Confluent Schema Registry, a simple artifact repository, or a Pact Broker where consumer-driven contracts are used) stores published schemas. Pipelines pull from this registry to run compatibility checks. The registry is shared infrastructure, but it does not gate deployments - it provides data that each team’s pipeline uses to make its own go/no-go decision.

  5. API versioning strategy: Teams agree on a versioning convention (URL path versioning, header versioning, or semantic versioning for message schemas) and enforce it through pipeline gates. The convention must be simple enough that every team follows it without deliberation.

Contract Verification Approaches

Not all teams can coordinate on shared contract tooling. The right approach depends on the relationship between provider and consumer teams. These approaches are listed from least to most coordination required. Use the strongest approach your context supports.

ApproachHow It WorksCoordination RequiredBest When
Provider schema compatibilityProvider’s pipeline checks every change for backward compatibility against its own published schema (e.g., OpenAPI diff). No consumer involvement needed.None between teamsTeams are in different organizations, or consumers are external/unknown
Provider-maintained consumer testsProvider team writes tests that exercise known consumer usage patterns based on API analytics, documentation, or past breakage.Minimal - provider observes consumersProvider can see consumer traffic patterns but cannot require consumer participation
Consumer-driven contractsConsumers publish pacts describing the subset of the provider API they depend on. Provider runs these pacts in its pipeline. See Contract Tests.High - shared tooling, broker, and agreement to maintain pactsTeams are in the same organization with shared tooling and willingness to maintain pacts

Most organizations use a mix. Internal teams with shared tooling can adopt consumer-driven contracts. Teams consuming third-party or cross-organization APIs use provider schema compatibility checks and provider-maintained consumer tests.

The critical requirement is not which approach you use but that every provider pipeline verifies backward compatibility before deployment. The minimum viable contract verification is an automated schema diff against the published API - if the diff contains a breaking change, the pipeline fails.

Additional Quality Gates for Distributed Architectures

GateDefect Sources AddressedCatalog Section
Provider schema backward compatibilityInterface mismatches from provider changesIntegration & Boundaries
Consumer-driven contract verification (where feasible)Wrong assumptions about upstream/downstreamIntegration & Boundaries
API schema backward compatibility checkSchema migration and backward compatibility failuresData & State
Cross-service timeout propagation checkMissing timeout and deadline enforcement across boundariesPerformance & Resilience
Circuit breaker and fallback verificationNetwork partitions and partial failures handled wrongDependency & Infrastructure
Distributed tracing validationMissing observability across service boundariesTesting & Observability Gaps

When This Architecture Works

This architecture is the goal for organizations with:

  • Multiple teams that need different deployment cadences
  • Services with well-defined, stable API boundaries
  • Teams mature enough to own their full delivery pipeline
  • Investment in contract testing tooling and API governance

When This Architecture Fails

  • Shared database schemas: Multiple services can share a database engine without problems. The failure mode is shared schemas - when Service A and Service B both read from and write to the same tables, a schema migration by one service can break the other’s queries. Each service must own its own schema. If two services need the same data, expose it through an API or event, not through direct table access.
  • Synchronous dependency chains: If Service A calls Service B which calls Service C in the request path, a deployment of C can break A through B. Circuit breakers and fallbacks are required at every boundary, and contract tests must cover failure modes, not just success paths.
  • No contract verification discipline: If teams skip backward compatibility checks or let contract test failures slide, breakage shifts from the pipeline to production. The architecture degrades into uncoordinated deployments with production as the integration environment. At minimum, every provider must run automated schema compatibility checks - even without consumer-driven contracts.
  • Missing observability: When services deploy independently, debugging production issues requires distributed tracing, correlated logging, and SLO monitoring across service boundaries. Without this, independent deployment means independent troubleshooting with no way to trace cause and effect.

Relationship to the Other Architectures

Architecture 3 is where Architecture 2 teams evolve to. The progression is:

  1. Single team, single deployable - one team, one pipeline, one artifact
  2. Multiple teams, single deployable - multiple teams, sub-pipelines, shared integration step
  3. Independent teams, independent deployables - multiple teams, fully independent pipelines, contract-based integration

The move from 2 to 3 happens incrementally. Extract one service at a time. Give it its own pipeline. Establish contract tests between it and the monolith. When the contract tests are reliable, stop running the extracted service’s code through the integration pipeline. Repeat until the integration pipeline is empty.

2 - Systemic Defect Fixes

A catalog of defect sources across the delivery value stream with earliest detection points, AI shift-left opportunities, and systemic prevention strategies.

Defects do not appear randomly. They originate from specific, predictable sources in the delivery value stream. This reference catalogs those sources so teams can shift detection left, automate where possible, and apply AI where it adds real value to the feedback loop.

The goal is systems thinking: detect issues as early as possible in the value stream so feedback informs continuous improvement in how we work, not just reactive fixes to individual defects.

  • AI shifts detection earlier than current automation alone
  • Dark cells = current automation is sufficient; AI adds no additional value
  • No marker = AI assists at the current detection point but does not shift it earlier

How to Use This Catalog

  1. Pick your pain point. Find the category where your team loses the most time to defects or rework. Start there, not at the top.
  2. Focus on the Systemic Prevention column. Automated detection catches defects faster, but systemic prevention eliminates entire categories. Prioritize the prevention fix for each issue you selected.
  3. Measure before and after. Track defect escape rate by category and time-to-detection. If the systemic fix is working, both metrics improve within weeks.

Categories

CategoryWhat it covers
Product & DiscoveryWrong features, misaligned requirements, accessibility gaps - defects born before coding begins
Integration & BoundariesInterface mismatches, behavioral assumptions, race conditions at service boundaries
Knowledge & CommunicationImplicit domain knowledge, ambiguous requirements, tribal knowledge loss, divergent mental models
Change & ComplexityUnintended side effects, technical debt, feature interactions, configuration drift
Testing & Observability GapsUntested edge cases, missing contract tests, insufficient monitoring, environment parity
Process & DeploymentLong-lived branches, manual steps, large batches, inadequate rollback, work stacking
Data & StateSchema migration failures, null assumptions, concurrency issues, cache invalidation
Dependency & InfrastructureThird-party breaking changes, environment differences, network partition handling
Security & ComplianceVulnerabilities, secrets in source, auth gaps, injection, regulatory requirements, audit trails
Performance & ResilienceRegressions, resource leaks, capacity limits, missing timeouts, graceful degradation

2.1 - Product & Discovery Defects

Defects that originate before a single line of code is written - the most expensive category because they compound through every downstream phase.

These defects originate before a single line of code is written. They are the most expensive to fix because they compound through every downstream phase.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Building the wrong thingDiscoveryProduct analytics platforms, usage trend alerts Synthesize user feedback, support tickets, and usage data to surface misalignment earlier than production metricsValidated user research before backlog entry; dual-track agile
Solving a problem nobody hasDiscoverySupport ticket clustering tools, feature adoption tracking Semantic analysis of interview transcripts, forums, and support tickets to identify real vs. assumed painProblem validation as a stage gate; publish problem brief before solution
Correct problem, wrong solutionDiscoveryA/B testing frameworks, feature flag cohort comparisonEvaluate prototypes against problem definitions; generate alternative approachesPrototype multiple approaches; measurable success criteria first
Meets spec but misses user intentRequirementsSession replay tools, rage-click and error-loop detection Review acceptance criteria against user behavior data to flag misalignmentAcceptance criteria focused on user outcomes, not checklists
Over-engineering beyond needDesignStatic analysis for dead code and unused abstractions Flag unnecessary abstraction layers and premature optimization in code reviewYAGNI principle; justify every abstraction layer
Prioritizing wrong workDiscoveryDORA metrics versus business outcomes, WSJF scoringSynthesize roadmap, customer data, and market signals to surface opportunity costsWSJF prioritization with outcome data
Inaccessible UI excludes usersPre-commitaxe-core, pa11y, Lighthouse accessibility auditsCurrent tooling sufficientWCAG compliance as acceptance criteria; automated accessibility checks in pipeline

2.2 - Integration & Boundaries Defects

Defects at system boundaries that are invisible to unit tests and often survive until production. Contract testing and deliberate boundary design are the primary defenses.

Defects at system boundaries are invisible to unit tests and often survive until production. Contract testing and deliberate boundary design are the primary defenses.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Interface mismatchesCIConsumer-driven contract tests, API schema validatorsPredict which consumers break from API changes based on usage patternsMandatory contract tests per boundary; API-first with generated clients
Wrong assumptions about upstream/downstreamDesignChaos engineering platforms, synthetic transactions, fault injection Review code and docs to identify undocumented behavioral assumptionsDocument behavioral contracts; defensive coding at boundaries
Race conditionsPre-commitThread sanitizers, race detectors, formal verification tools, fuzz testingFlag concurrency anti-patterns but cannot replace formal detection toolsIdempotent design; queues over shared mutable state

2.3 - Knowledge & Communication Defects

Defects that emerge from gaps between what people know and what the code expresses - the hardest to detect with automated tools and the easiest to prevent with team practices.

These defects emerge from gaps between what people know and what the code expresses. They are the hardest to detect with automated tools and the easiest to prevent with team practices.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Implicit domain knowledge not in codeCodingMagic number detection, code ownership analytics Identify undocumented business rules and knowledge gaps from code and test analysisDomain-Driven Design with ubiquitous language; embed rules in code
Ambiguous requirementsRequirementsFlag stories without acceptance criteria, BDD spec coverage tracking Review requirements for ambiguity, missing edge cases, and contradictions; generate test scenariosThree Amigos before work; example mapping; executable specs
Tribal knowledge lossCodingBus factor analysis from commit history, single-author concentration alerts Generate documentation from code and tests; flag documentation drift from implementationPair/mob programming as default; rotate on-call; living docs
Divergent mental models across teamsDesignDivergent naming detection, contract test failures Compare terminology and domain models across codebases to detect semantic mismatchesShared domain models; explicit bounded contexts

2.4 - Change & Complexity Defects

Defects caused by the act of changing existing code. The larger the change and the longer it lives outside trunk, the higher the risk.

These defects are caused by the act of changing existing code. The larger the change and the longer it lives outside trunk, the higher the risk.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Unintended side effectsCIAutomated test suites, mutation testing frameworks, change impact analysis Reason about semantic change impact beyond syntactic dependencies; automated blast radius analysisSmall focused commits; trunk-based development; feature flags
Accumulated technical debtCIComplexity trends, duplication scoring, dependency cycle detection, quality gates Identify architectural drift, abstraction decay, and calcified workaroundsRefactoring as part of every story; dedicated debt budget
Unanticipated feature interactionsAcceptance TestsCombinatorial and pairwise testing, feature flag interaction matrixReason about feature interactions semantically; flag conflicts testing matrices missFeature flags with controlled rollout; modular design; canary deployments
Configuration driftCIInfrastructure-as-code drift detection, environment diffingCurrent tooling sufficientInfrastructure as code; immutable infrastructure; GitOps

2.5 - Testing & Observability Gap Defects

Defects that survive because the safety net has holes. The fix is not more testing - it is better-targeted testing and observability that closes the specific gaps.

These defects survive because the safety net has holes. The fix is not more testing: it is better-targeted testing and observability that closes the specific gaps.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Untested edge cases and error pathsCIMutation testing frameworks, branch coverage thresholds Analyze code paths and generate tests for untested boundaries and error conditionsProperty-based testing as standard; boundary value analysis
Missing contract tests at boundariesCIBoundary inventory versus contract test inventory Identify boundaries lacking tests by understanding semantic service relationshipsMandatory contract tests per new boundary
Insufficient monitoringDesignObservability coverage scoring, health endpoint checks, structured logging verificationCurrent tooling sufficientObservability as non-functional requirement; SLOs for every user-facing path
Test environments don’t reflect productionCIAutomated environment parity checks, synthetic transaction comparison, infrastructure-as-code diff toolsCurrent tooling sufficientProduction-like data in staging; test in production with flags

2.6 - Process & Deployment Defects

Defects caused by the delivery process itself. Manual steps, large batches, and slow feedback loops create the conditions for failure.

These defects are caused by the delivery process itself. Manual steps, large batches, and slow feedback loops create the conditions for failure.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Long-lived branchesPre-commitBranch age alerts, merge conflict frequency, CI dashboard for branch countProcess change, not AITrunk-based development; merge at least daily
Manual pipeline stepsCIPipeline audit for manual gates, deployment lead time analysisAutomation, not AIAutomate every step commit-to-production
Batching too many changes per releaseCIChanges-per-deploy metrics, deployment frequency trackingCD practice, not AIEvery commit is a release candidate; single-piece flow
Inadequate rollback capabilityCIAutomated rollback testing in CI, mean time to rollback measurementDeployment patterns, not AIBlue/green or canary deployments; auto-rollback on health failure
Reliance on human review to catch preventable defectsCodingLinters, static analysis security testing, type systems, complexity scoring Semantic code review for logic errors and missing edge cases that automated rules cannot expressReserve human review for knowledge transfer and design decisions
Manual review of risks and compliance (CAB)DesignChange lead time analysis, CAB effectiveness metrics Automated change risk scoring from change diff and deployment history; blast radius analysisReplace CAB with automated progressive delivery
Work stacking on individuals; everything started, nothing finished; PRs waiting days for review; uneven workloads; blocked work sits idle; completed work misses the intentCIIssue tracker reports where individuals have multiple items assigned simultaneouslyProcess change, not AIPush-Based Work Assignment anti-pattern

2.7 - Data & State Defects

Data defects are particularly dangerous because they can corrupt persistent state. Unlike code defects, data corruption often cannot be fixed by deploying a new version.

Data defects are particularly dangerous because they can corrupt persistent state. Unlike code defects, data corruption often cannot be fixed by deploying a new version.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Schema migration and backward compatibility failuresCISchema compatibility validators, migration dry-runsPredict downstream impact by understanding consumer usage patternsExpand-then-contract schema migrations; never breaking changes
Null or missing data assumptionsPre-commitNull safety static analyzers, strict type systemsFlag code where optional fields are used without null checksNull-safe type systems; Option/Maybe as default; validate at boundaries
Concurrency and ordering issuesCIThread sanitizers, load tests with randomized timingDesign patterns, not AIDesign for out-of-order delivery; idempotent consumers
Cache invalidation errorsAcceptance TestsCache consistency monitoring, TTL verification, stale data detectionReview cache invalidation logic for incomplete paths or mismatchesShort TTLs; event-driven invalidation

2.8 - Dependency & Infrastructure Defects

Defects that originate outside your codebase but break your system. The fix is to treat external dependencies as untrusted boundaries.

These defects originate outside your codebase but break your system. The fix is to treat external dependencies as untrusted boundaries.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Third-party library breaking changesCIDependency update automation, software composition analysis for breaking versionsReview changelogs and API diffs to assess breaking change risk; predict compatibility issuesPin dependencies; automated upgrade PRs with test gates
Infrastructure differences across environmentsCIInfrastructure-as-code drift detection, config comparison, environment parity scoringIaC and GitOps, not AISingle source of truth for all environments; containerization
Network partitions and partial failures handled wrongAcceptance TestsChaos engineering platforms, synthetic transaction monitoringReview architectures for missing failure handling patternsCircuit breakers; retries; bulkheads as defaults; test failure modes explicitly

2.9 - Security & Compliance Defects

Security and compliance defects are silent until they are catastrophic. The gap between what the code does and what policy requires is invisible without deliberate, automated verification at every stage.

Security and compliance defects are silent until they are catastrophic. They share a pattern: the gap between what the code does and what policy requires is invisible without deliberate, automated verification at every stage.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Known vulnerabilities in dependenciesCISoftware composition analysis, CVE database scanning, dependency lock file auditing Correlate vulnerability advisories with actual usage paths to prioritize exploitable risks over theoretical onesAutomated dependency updates with test gates; pin and audit all transitive dependencies
Secrets committed to source controlPre-commitPre-commit secret scanners, entropy-based detection, git history auditing toolsFlag patterns that resemble credentials in code, config, and documentationSecrets management platform; inject at runtime, never store in repo
Authentication and authorization gapsDesignSecurity-focused integration tests, RBAC policy validators, access matrix verification Review code paths for missing authorization checks and privilege escalation patternsCentralized auth framework; deny-by-default access policies; automated access matrix tests
Injection vulnerabilitiesPre-commitSAST tools, taint analysis, parameterized query enforcement Identify subtle injection vectors that pattern-matching rules miss, including second-order injectionInput validation at boundaries; parameterized queries as default; content security policies
Regulatory requirement gapsRequirementsCompliance-as-code policy engines, automated control mapping Map regulatory requirements to implementation artifacts and flag uncovered controlsCompliance requirements as acceptance criteria; automated evidence collection
Missing audit trailsDesignStructured logging verification, audit event coverage scoringReview code for state-changing operations that lack audit loggingAudit logging as a framework default; every state change emits a structured event
License compliance violationsCILicense scanning tools, SBOM generation and policy evaluationReview license compatibility across the full dependency graphApproved license allowlist enforced in CI; SBOM generated on every build

2.10 - Performance & Resilience Defects

Performance defects degrade gradually, often hiding behind averages until a threshold tips and the system fails under real load. Detection requires baselines, budgets, and automated enforcement - not periodic manual testing.

Performance defects are rarely binary. They degrade gradually, often hiding behind averages until a threshold tips and the system fails under real load. Detection requires baselines, budgets, and automated enforcement - not periodic manual testing.

IssueEarliest Detection
(Automation)
Automated
Detection
Earlier Detection
with AI
Systemic
Prevention
Performance regressionsCIAutomated benchmark suites, performance budget enforcement in CI Identify code changes likely to degrade performance from structural analysis before benchmarks runPerformance budgets enforced in CI; benchmark suite runs on every commit
Resource leaksCIMemory and connection pool profilers, leak detection in automated test runsFlag allocation patterns without corresponding cleanup in code reviewResource management via language-level constructs (try-with-resources, RAII, using); pool size alerts
Unknown capacity limitsAcceptance TestsLoad testing frameworks, capacity threshold monitoring, saturation alertsPredict capacity bottlenecks from architecture and traffic patternsRegular automated load tests; capacity model updated with every architecture change
Missing timeout and deadline enforcementPre-commitStatic analysis for unbounded calls, integration test timeout verification Identify call chains with missing or inconsistent timeout propagationDefault timeouts on all external calls; deadline propagation across service boundaries
Slow user-facing response timesCIReal user monitoring, synthetic transaction baselines, web vitals trackingCorrelate frontend and backend telemetry to pinpoint latency sourcesResponse time SLOs per user-facing path; performance budgets for page weight and API latency
Missing graceful degradationDesignChaos engineering platforms, failure injection, circuit breaker verification Review architectures for single points of failure and missing fallback pathsDesign for partial failure; circuit breakers and fallbacks as defaults; game day exercises

3 - CD Practices

Concise definitions of the core continuous delivery practices from MinimumCD.

These pages define the minimum practices required for continuous delivery. Each page covers what the practice is, why it matters, and what the minimum criteria are. For migration guidance and tactical how-to content, follow the links to the corresponding phase pages.

Core Practices

3.1 - Continuous Integration

Integrate work to trunk at least daily with automated testing to maintain a releasable codebase.

Definition

Continuous Integration (CI) is the activity of each developer integrating work to the trunk of version control at least daily and verifying that the work is, to the best of our knowledge, releasable.

CI is not just about tooling - it is fundamentally about team workflow and working agreements.

Minimum Activities Required

  1. Trunk-based development - all work integrates to trunk
  2. Work integrates to trunk at a minimum daily (each developer, every day)
  3. Work has automated testing before merge to trunk
  4. Work is tested with other work automatically on merge
  5. All feature work stops when the build is red
  6. New work does not break delivered work

Why This Matters

Without CI, Teams Experience

  • Integration hell: Weeks or months of painful merge conflicts
  • Late defect detection: Bugs found after they are expensive to fix
  • Reduced collaboration: Developers work in isolation, losing context
  • Deployment fear: Large batches of untested changes create risk
  • Slower delivery: Time wasted on merge conflicts and rework
  • Quality erosion: Without rapid feedback, technical debt accumulates

With CI, Teams Achieve

  • Rapid feedback: Know within minutes if changes broke something
  • Smaller changes: Daily integration forces better work breakdown
  • Better collaboration: Team shares ownership of the codebase
  • Lower risk: Small, tested changes are easier to diagnose and fix
  • Faster delivery: No integration delays blocking deployment
  • Higher quality: Continuous testing catches issues early

What Is Improved

Teamwork

CI requires strong teamwork to function correctly. Key improvements:

  • Pull workflow: Team picks next important work instead of working from assignments
  • Code review cadence: Quick reviews (< 4 hours) keep work flowing
  • Pair programming: Real-time collaboration eliminates review delays
  • Shared ownership: Everyone maintains the codebase together
  • Team goals over individual tasks: Focus shifts from “my work” to “our progress”

Work Breakdown

CI forces better work decomposition:

  • Definition of Ready: Every story has testable acceptance criteria before work starts
  • Small batches: If the team can complete work in < 2 days, it is refined enough
  • Vertical slicing: Each change delivers a thin, tested slice of functionality
  • Incremental delivery: Features built incrementally, each step integrated daily

Testing

CI requires a shift in testing approach:

  • From writing tests after code is “complete” to writing tests before/during coding (TDD/BDD)
  • From testing implementation details to testing behavior and outcomes
  • From manual testing before deployment to automated testing on every commit
  • From separate QA phase to quality built into development

Migration Guidance

For detailed guidance on adopting CI practices during your CD migration, see:

Additional Resources

3.2 - Trunk-Based Development

All changes integrate into a single shared trunk with no intermediate branches.

“Trunk-based development has been shown to be a predictor of high performance in software development and delivery. It is characterized by fewer than three active branches in a code repository; branches and forks having very short lifetimes (e.g., less than a day) before being merged; and application teams rarely or never having ‘code lock’ periods when no one can check in code or do pull requests due to merging conflicts, code freezes, or stabilization phases.”

  • Accelerate by Nicole Forsgren Ph.D., Jez Humble & Gene Kim

Definition

Trunk-based development (TBD) is a team workflow where changes are integrated into the trunk with no intermediate integration (develop, test, etc.) branch. The two common workflows are making changes directly to the trunk or using very short-lived branches that branch from the trunk and integrate back into the trunk.

Release branches are an intermediate step that some choose on their path to continuous delivery while improving their quality processes in the pipeline. True CD releases from the trunk.

Minimum Activities Required

  • All changes integrate into the trunk
  • If branches from the trunk are used:
    • They originate from the trunk
    • They re-integrate to the trunk
    • They are short-lived and removed after the merge

What Is Improved

  • Smaller changes: TBD emphasizes small, frequent changes that are easier for the team to review and more resistant to impactful merge conflicts. Conflicts become rare and trivial.
  • We must test: TBD requires us to implement tests as part of the development process.
  • Better teamwork: We need to work more closely as a team. This has many positive impacts, not least we will be more focused on getting the team’s highest priority done.
  • Better work definition: Small changes require us to decompose the work into a level of detail that helps uncover things that lack clarity or do not make sense. This provides much earlier feedback on potential quality issues.
  • Replaces process with engineering: Instead of creating a process where we control the release of features with branches, we can control the release of features with engineering techniques called evolutionary coding methods. These techniques have additional benefits related to stability that cannot be found when replaced by process.
  • Reduces risk: Long-lived branches carry two common risks. First, the change will not integrate cleanly and the merge conflicts result in broken or lost features. Second, the branch will be abandoned, usually because of the first reason.

Migration Guidance

For detailed guidance on adopting TBD during your CD migration, see:

Additional Resources

3.3 - Single Path to Production

All deployments flow through one automated pipeline - no exceptions.

Definition

The deployment pipeline is the single, standardized path for all changes to reach any environment - development, testing, staging, or production. No manual deployments, no side channels, no “quick fixes” bypassing the pipeline. If it is not deployed through the pipeline, it does not get deployed.

Key Principles

  1. Single path: All deployments flow through the same pipeline
  2. No exceptions: Even hotfixes and rollbacks go through the pipeline
  3. Automated: Deployment is triggered automatically after pipeline validation
  4. Auditable: Every deployment is tracked and traceable
  5. Consistent: The same process deploys to all environments

What Is Improved

  • Reliability: Every deployment is validated the same way
  • Traceability: Clear audit trail from commit to production
  • Consistency: Environments stay in sync
  • Speed: Automated deployments are faster than manual
  • Safety: Quality gates are never bypassed
  • Confidence: Teams trust that production matches what was tested
  • Recovery: Rollbacks are as reliable as forward deployments

Migration Guidance

For detailed guidance on establishing a single path to production, see:

Additional Resources

3.4 - Deterministic Pipeline

The same inputs to the pipeline always produce the same outputs.

Definition

A deterministic pipeline produces consistent, repeatable results. Given the same inputs (code, configuration, dependencies), the pipeline will always produce the same outputs and reach the same pass/fail verdict. The pipeline’s decision on whether a change is releasable is definitive - if it passes, deploy it; if it fails, fix it.

Key Principles

  1. Repeatable: Running the pipeline twice with identical inputs produces identical results
  2. Authoritative: The pipeline is the final arbiter of quality, not humans
  3. Immutable: No manual changes to artifacts or environments between pipeline stages
  4. Trustworthy: Teams trust the pipeline’s verdict without second-guessing

What Makes a Pipeline Deterministic

  • Version control everything: Source code, IaC, pipeline definitions, test data, dependency lockfiles, tool versions
  • Lock dependency versions: Always use lockfiles. Never rely on latest or version ranges.
  • Eliminate environmental variance: Containerize builds, pin image tags, install exact tool versions
  • Remove human intervention: No manual approvals in the critical path, no manual environment setup
  • Fix flaky tests immediately: Quarantine, fix, or delete. Never allow a “just re-run it” culture.

What Is Improved

  • Quality increases: Real issues are never dismissed as “flaky tests”
  • Speed increases: No time wasted on test reruns or manual verification
  • Trust increases: Teams rely on the pipeline instead of adding manual gates
  • Debugging improves: Failures are reproducible, making root cause analysis easier
  • Delivery improves: Faster, more reliable path from commit to production

Migration Guidance

For detailed guidance on building a deterministic pipeline, see:

  • Deterministic Pipeline - Phase 2 pipeline practice with anti-pattern/good-pattern examples and getting started steps

Additional Resources

3.5 - Definition of Deployable

Automated criteria that determine when a change is ready for production.

Definition

The “definition of deployable” is your organization’s agreed-upon set of non-negotiable quality criteria that every artifact must pass before it can be deployed to any environment. This definition should be automated, enforced by the pipeline, and treated as the authoritative verdict on whether a change is ready for deployment.

Key Principles

  1. Pipeline is definitive: If the pipeline passes, the artifact is deployable - no exceptions
  2. Automated validation: All criteria are checked automatically, not manually
  3. Consistent across environments: The same standards apply whether deploying to test or production
  4. Fails fast: The pipeline rejects artifacts that do not meet the standard immediately

What Should Be in Your Definition

Your definition of deployable should include automated checks for:

  • Security: SAST scans, dependency vulnerability scans, secret detection
  • Functionality: Unit tests, integration tests, end-to-end tests, regression tests
  • Compliance: Audit trails, policy as code, change documentation
  • Performance: Response time thresholds, load test baselines, resource utilization
  • Reliability: Health check validation, graceful degradation tests, rollback verification
  • Code quality: Linting, static analysis, complexity metrics

What Is Improved

  • Removes bottlenecks: No waiting for manual approval meetings
  • Increases quality: Automated checks catch more issues than manual reviews
  • Reduces cycle time: Deployable artifacts are identified in minutes, not days
  • Improves collaboration: Shared understanding of quality standards
  • Enables continuous delivery: Trust in the pipeline makes frequent deployments safe

Migration Guidance

For detailed guidance on defining what “deployable” means for your organization, see:

  • Deployable Definition - Phase 2 pipeline practice with progressive quality gates, context-specific definitions, and getting started steps

Additional Resources

3.6 - Immutable Artifacts

Build once, deploy everywhere. The artifact is never modified after creation.

Definition

Central to CD is that we are validating the artifact with the pipeline. It is built once and deployed to all environments. A common anti-pattern is building an artifact for each environment. The pipeline should generate immutable, versioned artifacts.

  • Immutable Pipeline: Failures should be addressed by changes in version control so that two executions with the same configuration always yield the same results. Never go to the failure point, make adjustments in the environment, and re-start from that point.

  • Immutable Artifacts: Some package management systems allow the creation of release candidate versions. For example, it is common to find -SNAPSHOT versions in Java. However, this means the artifact’s behavior can change without modifying the version. Version numbers are cheap. If we are to have an immutable pipeline, it must produce an immutable artifact. Never use or produce -SNAPSHOT versions.

Immutability provides the confidence to know that the results from the pipeline are real and repeatable.

What Is Improved

  • Everything must be version controlled: source code, environment configurations, application configurations, and even test data. This reduces variability and improves the quality process.
  • Confidence in testing: The artifact validated in pre-production is byte-for-byte identical to what runs in production.
  • Faster rollback: Previous artifacts are unchanged in the artifact repository, ready to be redeployed.
  • Audit trail: Every artifact is traceable to a specific commit and pipeline run.

Migration Guidance

For detailed guidance on implementing immutable artifacts, see:

  • Immutable Artifacts - Phase 2 pipeline practice with anti-patterns, good patterns, and getting started steps

Additional Resources

3.7 - Production-Like Environments

Test in environments that mirror production to catch environment-specific issues early.

Definition

It is crucial to leverage pre-production environments in your CD pipeline to run all of your tests (unit, integration, UAT, manual QA, E2E) early and often. Test environments increase interaction with new features and exposure to bugs - both of which are important prerequisites for reliable software.

Types of Pre-Production Environments

Most organizations employ both static and short-lived environments and utilize them for case-specific stages of the SDLC:

  • Staging environment: The last environment that teams run automated tests against prior to deployment, particularly for testing interaction between all new features after a merge. Its infrastructure reflects production as closely as possible.

  • Ephemeral environments: Full-stack, on-demand environments spun up on every code change. Each ephemeral environment is leveraged in your pipeline to run E2E, unit, and integration tests on every code change. These environments are defined in version control, created and destroyed automatically on demand. They are short-lived by definition but should closely resemble production. They replace long-lived “static” environments and the maintenance required to keep those stable.

What Is Improved

  • Infrastructure is kept consistent: Test environments deliver results that reflect real-world performance. Fewer unprecedented bugs reach production since using prod-like data and dependencies allows you to run your entire test suite earlier.
  • Test against latest changes: These environments rebuild upon code changes with no manual intervention.
  • Test before merge: Attaching an ephemeral environment to every PR enables E2E testing in your CI before code changes get deployed to staging.

Migration Guidance

For detailed guidance on implementing production-like environments, see:

Additional Resources

3.8 - Rollback

Fast, automated recovery from any deployment.

Definition

Rollback on-demand means the ability to quickly and safely revert to a previous working version of your application at any time, without requiring special approval, manual intervention, or complex procedures. It should be as simple and reliable as deploying forward.

Key Principles

  1. Fast: Rollback completes in minutes, not hours. Target < 5 minutes.
  2. Automated: No manual steps or special procedures. Single command or click.
  3. Safe: Rollback is validated just like forward deployment.
  4. Simple: Any team member can execute it without specialized knowledge.
  5. Tested: Rollback mechanism is regularly tested, not just used in emergencies.

What Is Improved

  • Mean Time To Recovery (MTTR): Drops from hours to minutes
  • Deployment frequency: Increases due to reduced risk
  • Team confidence: Higher willingness to deploy
  • Customer satisfaction: Faster incident resolution
  • On-call burden: Reduced stress for on-call engineers

Migration Guidance

For detailed guidance on implementing rollback capability, see:

  • Rollback - Phase 2 pipeline practice with blue-green, canary, feature flag, and database-safe rollback patterns

Additional Resources

3.9 - Application Configuration

Separate what varies between environments from what does not.

Definition

Application configuration defines the internal behavior of your application and is bundled with the artifact. It does not vary between environments. This is distinct from environment configuration (secrets, URLs, credentials) which varies by deployment.

We embrace The Twelve-Factor App config definitions:

  • Application Configuration: Internal to the app, does NOT vary by environment (feature flags, business rules, UI themes, default settings)
  • Environment Configuration: Varies by deployment (database URLs, API keys, service endpoints, credentials)

Key Principles

Application configuration should be:

  1. Version controlled with the source code
  2. Deployed as part of the immutable artifact
  3. Testable in the CI pipeline
  4. Unchangeable after the artifact is built

What Is Improved

  • Immutability: The artifact tested in staging is identical to what runs in production
  • Traceability: You can trace any behavior back to a specific commit
  • Testability: Application behavior can be validated in the pipeline before deployment
  • Reliability: No configuration drift between environments caused by manual changes
  • Faster rollback: Rolling back an artifact rolls back all application configuration changes

Migration Guidance

For detailed guidance on managing application configuration, see:

Additional Resources

4 - Metrics

Detailed definitions for key delivery metrics. Understand what to measure and why.

These metrics help you assess your current delivery performance and track improvement over time. Not all metrics are equally useful at every stage of a CD migration.

Leading Indicators

Leading indicators reflect the current state of team behaviors. They move immediately when those behaviors change, making them the most useful metrics for driving improvement during a CD migration. When a leading indicator is unhealthy, the cause is visible and addressable today.

MetricWhat It Measures
Integration FrequencyHow often code is integrated to trunk
Build DurationTime from commit to artifact creation
Development Cycle TimeTime from starting work to delivery
Work in ProgressAmount of started but unfinished work

DORA Outcome Metrics

The four DORA key metrics are lagging indicators drawn from the DORA research program. They reflect the cumulative effect of many upstream behaviors and confirm that improvement work is having the expected systemic effect. Because they are outcome measures, they move slowly: changes in leading indicator behaviors take weeks or months to surface in these numbers. Use them to validate the direction of improvement, not to drive it.

MetricWhat It Measures
Lead TimeTime from commit to production
Change Fail RatePercentage of changes requiring remediation
Mean Time to RepairTime to restore service after failure
Release FrequencyHow often releases reach production

4.1 - Integration Frequency

How often developers integrate code changes to the trunk. A leading indicator of CI maturity and small batch delivery.

Definition

Integration Frequency measures the average number of production-ready pull requests a team merges to trunk per day, normalized by team size. On a team of five developers, healthy continuous integration practice produces at least five integrations per day, roughly one per developer.

This metric is a direct indicator of how well a team practices Continuous Integration. Teams that integrate frequently work in small batches, receive fast feedback, and reduce the risk associated with large, infrequent merges.

Integration Frequency formula
integrationFrequency = mergedPullRequests / day / numberOfDevelopers

A value of 1.0 or higher per developer per day indicates that work is being decomposed into small, independently deliverable increments.

How to Measure

  1. Count trunk merges. Track the number of pull requests (or direct commits) merged to main or trunk each day.
  2. Normalize by team size. Divide the daily count by the number of developers actively contributing that day.
  3. Calculate the rolling average. Use a 5-day or 10-day rolling window to smooth daily variation and surface meaningful trends.

Most source control platforms expose this data through their APIs:

  • GitHub: list merged pull requests via the REST or GraphQL API.
  • GitLab: query merged merge requests per project.
  • Bitbucket: use the pull request activity endpoint.

Alternatively, count commits to the default branch if pull requests are not used.

Targets

LevelIntegration Frequency (per developer per day)
LowLess than 1 per week
MediumA few times per week
HighOnce per day
EliteMultiple times per day

The elite target aligns with trunk-based development, where developers push small changes to the trunk multiple times daily and rely on automated testing and feature flags to manage risk.

Common Pitfalls

  • Meaningless commits. Teams may inflate the count by integrating trivial or empty changes. Pair this metric with code review quality and defect rate.
  • Breaking the trunk. Pushing faster without adequate test coverage leads to a red build and slows the entire team. Always pair Integration Frequency with build success rate and Change Fail Rate.
  • Counting the wrong thing. Merges to long-lived feature branches do not count. Only merges to the trunk or main integration branch reflect true CI practice.
  • Ignoring quality. If defect rates rise as integration frequency increases, the team is skipping quality steps. Use defect rate as a guardrail metric.

Connection to CD

Integration Frequency is the foundational metric for Continuous Delivery. Without frequent integration, every downstream metric suffers:

  • Smaller batches reduce risk. Each integration carries less change, making failures easier to diagnose and fix.
  • Faster feedback loops. Frequent integration means the CI pipeline runs more often, catching issues within minutes instead of days.
  • Enables trunk-based development. High integration frequency is incompatible with long-lived branches. Teams naturally move toward short-lived branches or direct trunk commits.
  • Reduces merge conflicts. The longer code stays on a branch, the more likely it diverges from trunk. Frequent integration keeps the delta small.
  • Prerequisite for deployment frequency. You cannot deploy more often than you integrate. Improving this metric directly unblocks improvements to Release Frequency.

To improve Integration Frequency:

4.2 - Build Duration

Time from code commit to a deployable artifact. A leading indicator of feedback speed and the floor for mean time to repair.

Definition

Build Duration measures the elapsed time from when a developer pushes a commit until the CI pipeline produces a deployable artifact and all automated quality gates have passed. This includes compilation, unit tests, integration tests, static analysis, security scans, and artifact packaging.

Build Duration represents the minimum possible time between deciding to make a change and having that change ready for production. It sets a hard floor on Lead Time and directly constrains how quickly a team can respond to production incidents.

Build Duration formula
buildDuration = artifactReadyTimestamp - commitPushTimestamp

This metric is sometimes referred to as “pipeline cycle time” or “CI cycle time.” The book Accelerate references it as part of “hard lead time.”

How to Measure

  1. Record the commit timestamp. Capture when the commit arrives at the CI server (webhook receipt or pipeline trigger time).
  2. Record the artifact-ready timestamp. Capture when the final pipeline stage completes successfully and the deployable artifact is published.
  3. Calculate the difference. Subtract the commit timestamp from the artifact-ready timestamp.
  4. Track the median and p95. The median shows typical performance. The 95th percentile reveals worst-case builds that block developers.

Most CI platforms expose build duration natively:

  • GitHub Actions: createdAt and updatedAt on workflow runs.
  • GitLab CI: pipeline created_at and finished_at.
  • Jenkins: build start time and duration fields.
  • CircleCI: workflow duration in the Insights dashboard.

Set up alerts when builds exceed your target threshold so the team can investigate regressions immediately.

Targets

LevelBuild Duration
LowMore than 30 minutes
Medium10 to 30 minutes
High5 to 10 minutes
EliteLess than 5 minutes

The ten-minute threshold is a widely recognized guideline. Builds longer than ten minutes break developer flow, discourage frequent integration, and increase the cost of fixing failures.

Common Pitfalls

  • Removing tests to hit targets. Reducing test count or skipping test types (integration, security) lowers build duration but degrades quality. Always pair this metric with Change Fail Rate and defect rate.
  • Ignoring queue time. If builds wait in a queue before execution, the developer experiences the queue time as part of the feedback delay even though it is not technically “build” time. Measure wall-clock time from commit to result.
  • Optimizing the wrong stage. Profile the pipeline before optimizing. Often a single slow test suite or a sequential step that could run in parallel dominates the total duration.
  • Flaky tests. Tests that intermittently fail cause retries, effectively doubling or tripling build duration. Track flake rate alongside build duration.

Connection to CD

Build Duration is a critical bottleneck in the Continuous Delivery pipeline:

  • Constrains Mean Time to Repair. When production is down, the build pipeline is the minimum time to get a fix deployed. A 30-minute build means at least 30 minutes of downtime for any fix, no matter how small. Reducing build duration directly improves MTTR.
  • Enables frequent integration. Developers are unlikely to integrate multiple times per day if each integration takes 30 minutes to validate. Short builds encourage higher Integration Frequency.
  • Shortens feedback loops. The sooner a developer learns that a change broke something, the less context they have lost and the cheaper the fix. Builds under ten minutes keep developers in flow.
  • Supports continuous deployment. Automated deployment pipelines cannot deliver changes rapidly if the build stage is slow. Build duration is often the largest component of Lead Time.

To improve Build Duration:

  • Parallelize stages. Run unit tests, linting, and security scans concurrently rather than sequentially.
  • Replace slow end-to-end tests. Move heavyweight end-to-end tests to an asynchronous post-deploy verification stage. Use contract tests and service virtualization in the main pipeline.
  • Decompose large services. Smaller codebases compile and test faster. If build duration is stubbornly high, consider breaking the service into smaller domains.
  • Cache aggressively. Cache dependencies, Docker layers, and compilation artifacts between builds.
  • Set a build time budget. Alert the team whenever a new test or step pushes the build past your target, so test efficiency is continuously maintained.

4.3 - Development Cycle Time

Average time from when work starts until it is running in production. A leading indicator of batch size and delivery flow.

Definition

Development Cycle Time measures the elapsed time from when a developer begins work on a story or task until that work is deployed to production and available to users. It captures the full construction phase of delivery: coding, code review, testing, integration, and deployment.

Development Cycle Time formula
developmentCycleTime = productionDeployTimestamp - workStartedTimestamp

This is distinct from Lead Time, which includes the time a request spends waiting in the backlog before work begins. Development Cycle Time focuses exclusively on the active delivery phase.

The Accelerate research uses “lead time for changes” (measured from commit to production) as a key DORA metric. Development Cycle Time extends this slightly further back to when work starts, capturing the full development process including any time between starting work and the first commit.

How to Measure

  1. Record when work starts. Capture the timestamp when a story moves to “In Progress” in your issue tracker, or when the first commit for the story appears.
  2. Record when work reaches production. Capture the timestamp of the production deployment that includes the completed story.
  3. Calculate the difference. Subtract the start time from the production deploy time.
  4. Report the median and distribution. The median provides a typical value. The distribution (or a control chart) reveals variability and outliers that indicate process problems.

Sources for this data include:

  • Issue trackers (Jira, GitHub Issues, Azure Boards): status transition timestamps.
  • Source control: first commit timestamp associated with a story.
  • Deployment logs: timestamp of production deployments linked to stories.

Linking stories to deployments is essential. Use commit message conventions (e.g., story IDs in commit messages) or deployment metadata to create this connection.

Targets

LevelDevelopment Cycle Time
LowMore than 2 weeks
Medium1 to 2 weeks
High2 to 7 days
EliteLess than 2 days

Elite teams deliver completed work to production within one to two days of starting it. This is achievable only when work is decomposed into small increments, the pipeline is fast, and deployment is automated.

Common Pitfalls

  • Marking work “Done” before it reaches production. If “Done” means “code complete” rather than “deployed,” the metric understates actual cycle time. The Definition of Done must include production deployment.
  • Skipping the backlog. Moving items from “Backlog” directly to “Done” after deploying hides the true wait time and development duration. Ensure stories pass through the standard workflow stages.
  • Splitting work into functional tasks. Breaking a story into separate “development,” “testing,” and “deployment” tasks obscures the end-to-end cycle time. Measure at the story or feature level.
  • Ignoring variability. A low average can hide a bimodal distribution where some stories take hours and others take weeks. Use a control chart or histogram to expose the full picture.
  • Optimizing for speed without quality. If cycle time drops but Change Fail Rate rises, the team is cutting corners. Use quality metrics as guardrails.

Connection to CD

Development Cycle Time is the most comprehensive measure of delivery flow and sits at the heart of Continuous Delivery:

  • Exposes bottlenecks. A long cycle time reveals where work gets stuck: waiting for code review, queued for testing, blocked by a manual approval, or delayed by a slow pipeline. Each bottleneck is a target for improvement.
  • Drives smaller batches. The only way to achieve a cycle time under two days is to decompose work into very small increments. This naturally leads to smaller changes, less risk, and faster feedback.
  • Reduces waste from changing priorities. Long cycle times mean work in progress is exposed to priority changes, context switches, and scope creep. Shorter cycles reduce the window of vulnerability.
  • Improves feedback quality. The sooner a change reaches production, the sooner the team gets real user feedback. Short cycle times enable rapid learning and course correction.
  • Subsumes other metrics. Cycle time is affected by Integration Frequency, Build Duration, and Work in Progress. Improving any of these upstream metrics will reduce cycle time.

To improve Development Cycle Time:

  • Decompose work into stories that can be completed and deployed within one to two days.
  • Remove handoffs between teams (e.g., separate dev and QA teams).
  • Automate the build and deploy pipeline to eliminate manual steps.
  • Improve test design so the pipeline runs faster without sacrificing coverage.
  • Limit Work in Progress so the team focuses on finishing work rather than starting new items.

4.4 - Lead Time

Total time from when a change is committed until it is running in production. A DORA lagging outcome metric for pipeline efficiency.

Definition

Lead Time measures the total elapsed time from when a code change is committed to the version control system until that change is successfully running in production. This is one of the four key metrics identified by the DORA (DevOps Research and Assessment) team as a predictor of software delivery performance. Lead Time is a lagging outcome metric: it reflects the cumulative effect of pipeline automation, work decomposition, and integration practices. Improving Build Duration and Integration Frequency are the leading indicators to address first.

Lead Time formula
leadTime = productionDeployTimestamp - commitTimestamp

In the broader value stream, “lead time” can also refer to the time from a customer request to delivery. The DORA definition focuses specifically on the segment from commit to production, which the Accelerate research calls “lead time for changes.” This narrower definition captures the efficiency of your delivery pipeline and deployment process.

Lead Time includes Build Duration plus any additional time for deployment, approval gates, environment provisioning, and post-deploy verification. It is a superset of build time and a subset of Development Cycle Time, which also includes the coding phase before the first commit.

How to Measure

  1. Record the commit timestamp. Use the timestamp of the commit as recorded in source control (not the local author timestamp, but the time it was pushed or merged to the trunk).
  2. Record the production deployment timestamp. Capture when the deployment containing that commit completes successfully in production.
  3. Calculate the difference. Subtract the commit time from the deploy time.
  4. Aggregate across commits. Report the median lead time across all commits deployed in a given period (daily, weekly, or per release).

Data sources:

  • Source control: commit or merge timestamps from Git, GitHub, GitLab, etc.
  • Pipeline platform: pipeline completion times from Jenkins, GitHub Actions, GitLab CI, etc.
  • Deployment tooling: production deployment timestamps from Argo CD, Spinnaker, Flux, or custom scripts.

For teams practicing continuous deployment, lead time may be nearly identical to build duration. For teams with manual approval gates or scheduled release windows, lead time will be significantly longer.

Targets

LevelLead Time for Changes
LowMore than 6 months
Medium1 to 6 months
High1 day to 1 week
EliteLess than 1 hour

These levels are drawn from the DORA State of DevOps research. Elite performers deliver changes to production in under an hour from commit, enabled by fully automated pipelines and continuous deployment.

Common Pitfalls

  • Measuring only build time. Lead time includes everything after the commit, not just the CI pipeline. Manual approval gates, scheduled deployment windows, and environment provisioning delays must all be included.
  • Ignoring waiting time. A change may sit in a queue waiting for a release train, a change advisory board (CAB) review, or a deployment window. This wait time is part of lead time and often dominates the total.
  • Tracking requests instead of commits. Some teams measure from customer request to delivery. While valuable, this conflates backlog prioritization with delivery efficiency. Keep this metric focused on the commit-to-production segment.
  • Hiding items from the backlog. Requests tracked in spreadsheets or side channels before entering the backlog distort lead time measurements. Ensure all work enters the system of record promptly.
  • Reducing quality to reduce lead time. Shortening approval processes or skipping test stages reduces lead time at the cost of quality. Pair this metric with Change Fail Rate as a guardrail.

Connection to CD

Lead Time is one of the four DORA metrics and a direct measure of your delivery pipeline’s end-to-end efficiency:

  • Reveals pipeline bottlenecks. A large gap between build duration and lead time points to manual processes, approval queues, or deployment delays that the team can target for automation.
  • Measures the cost of failure recovery. When production breaks, lead time is the minimum time to deliver a fix (unless you roll back). This makes lead time a direct input to Mean Time to Repair.
  • Drives automation. The primary way to reduce lead time is to automate every step between commit and production: build, test, security scanning, environment provisioning, deployment, and verification.
  • Reflects deployment strategy. Teams using continuous deployment have lead times measured in minutes. Teams using weekly release trains have lead times measured in days. The metric makes the cost of batching visible.
  • Connects speed and stability. The DORA research shows that elite performers achieve both low lead time and low Change Fail Rate. Speed and quality are not trade-offs. They reinforce each other when the delivery system is well-designed.

To improve Lead Time:

  • Automate the deployment pipeline end to end, eliminating manual gates.
  • Replace change advisory board (CAB) reviews with automated policy checks and peer review.
  • Deploy on every successful build rather than batching changes into release trains.
  • Reduce Build Duration to shrink the largest component of lead time.
  • Monitor and eliminate environment provisioning delays.

4.5 - Change Fail Rate

Percentage of production deployments that cause a failure or require remediation. A DORA lagging outcome metric for delivery stability.

Definition

Change Fail Rate measures the percentage of deployments to production that result in degraded service, negative customer impact, or require immediate remediation such as a rollback, hotfix, or patch.

Change Fail Rate formula
changeFailRate = failedChangeCount / totalChangeCount * 100

A “failed change” includes any deployment that:

  • Is rolled back.
  • Requires a hotfix deployed within a short window (commonly 24 hours).
  • Triggers a production incident attributed to the change.
  • Requires manual intervention to restore service.

This is one of the four DORA key metrics. It measures the stability side of delivery performance, complementing the throughput metrics of Lead Time and Release Frequency. Change Fail Rate is a lagging outcome metric: it reflects the cumulative quality of your test coverage, change size practices, and pipeline gates. The leading indicator to improve first is Integration Frequency, since smaller batches fail less often and are easier to diagnose.

How to Measure

  1. Count total production deployments over a defined period (weekly, monthly).
  2. Count deployments classified as failures using the criteria above.
  3. Divide failures by total deployments and express as a percentage.

Data sources:

  • Deployment logs: total deployment count from your CD platform.
  • Incident management: incidents linked to specific deployments (PagerDuty, Opsgenie, ServiceNow).
  • Rollback records: deployments that were reverted, either manually or by automated rollback.
  • Hotfix tracking: deployments tagged as hotfixes or emergency changes.

Automate the classification where possible. For example, if a deployment is followed by another deployment of the same service within a defined window (e.g., one hour), flag the original as a potential failure for review.

Targets

LevelChange Fail Rate
Low46 to 60%
Medium16 to 45%
High0 to 15%
Elite0 to 5%

These levels are drawn from the DORA State of DevOps research. Elite performers maintain a change fail rate below 5%, meaning fewer than 1 in 20 deployments causes a problem.

Common Pitfalls

  • Not recording failures. Deploying fixes without logging the original failure understates the true rate. Ensure every incident and rollback is tracked.
  • Reclassifying defects. Creating review processes that reclassify production defects as “feature requests” or “known limitations” hides real failures.
  • Inflating deployment count. Re-deploying the same working version to increase the denominator artificially lowers the rate. Only count deployments that contain new changes.
  • Pursuing zero defects at the cost of speed. An obsessive focus on eliminating all failures can slow Release Frequency to a crawl. A small failure rate with fast recovery is preferable to near-zero failures with monthly deployments.
  • Ignoring near-misses. Changes that cause degraded performance but do not trigger a full incident are still failures. Define clear criteria for what constitutes a failed change and apply them consistently.

Connection to CD

Change Fail Rate is the primary quality signal in a Continuous Delivery pipeline:

  • Validates pipeline quality gates. A rising change fail rate indicates that the automated tests, security scans, and quality checks in the pipeline are not catching enough defects. Each failure is an opportunity to add or improve a quality gate.
  • Enables confidence in frequent releases. Teams will only deploy frequently if they trust the pipeline. A low change fail rate builds this trust and supports higher Release Frequency.
  • Smaller changes fail less. The DORA research consistently shows that smaller, more frequent deployments have lower failure rates than large, infrequent releases. Improving Integration Frequency naturally improves this metric.
  • Drives root cause analysis. Each failed change should trigger a blameless investigation: what automated check could have caught this? The answers feed directly into pipeline improvements.
  • Balances throughput metrics. Change Fail Rate is the essential guardrail for Lead Time and Release Frequency. If those metrics improve while change fail rate worsens, the team is trading quality for speed.

To improve Change Fail Rate:

  • Deploy smaller changes more frequently to reduce the blast radius of failures.
  • Identify the root cause of each failure and add automated checks to prevent recurrence.
  • Strengthen the test suite, particularly integration and contract tests that validate interactions between services.
  • Implement progressive delivery (canary releases, feature flags) to limit the impact of defective changes before they reach all users.
  • Conduct blameless post-incident reviews and feed learnings back into the delivery pipeline.

4.6 - Mean Time to Repair

Average time from when a production incident is detected until service is restored. A DORA lagging outcome metric for recovery capability.

Definition

Mean Time to Repair (MTTR) measures the average elapsed time between when a production incident is detected and when it is fully resolved and service is restored to normal operation.

Mean Time to Repair formula
mttr = sum(resolvedTimestamp - detectedTimestamp) / incidentCount

MTTR reflects an organization’s ability to recover from failure. It encompasses detection, diagnosis, fix development, build, deployment, and verification. A short MTTR depends on the entire delivery system working well: fast builds, automated deployments, good observability, and practiced incident response.

The Accelerate research identifies MTTR as one of the four key DORA metrics and notes that “software delivery performance is a combination of lead time, release frequency, and MTTR.” It is the stability counterpart to the throughput metrics. MTTR is a lagging outcome metric: it reflects the combined effectiveness of observability, rollback capability, pipeline speed, and incident response practices. The leading indicators to address first are Build Duration (which sets the floor on how fast a fix can be deployed) and Release Frequency (teams that deploy often have well-rehearsed recovery procedures).

How to Measure

  1. Record the detection timestamp. This is when the team first becomes aware of the incident, typically when an alert fires, a customer reports an issue, or monitoring detects an anomaly.
  2. Record the resolution timestamp. This is when the incident is resolved and service is confirmed to be operating normally. Resolution means the customer impact has ended, not merely that a fix has been deployed.
  3. Calculate the duration for each incident.
  4. Compute the average across all incidents in a given period.

Data sources:

  • Incident management platforms: PagerDuty, Opsgenie, ServiceNow, or Statuspage provide incident lifecycle timestamps.
  • Monitoring and alerting: alert trigger times from Datadog, Prometheus Alertmanager, CloudWatch, or equivalent.
  • Deployment logs: timestamps of rollbacks or hotfix deployments.

Report both the mean and the median. The mean can be skewed by a single long outage, so the median gives a better sense of typical recovery time. Also track the maximum MTTR per period to highlight worst-case incidents.

Targets

LevelMean Time to Repair
LowMore than 1 week
Medium1 day to 1 week
HighLess than 1 day
EliteLess than 1 hour

Elite performers restore service in under one hour. This requires automated rollback or roll-forward capability, fast build pipelines, and well-practiced incident response processes.

Common Pitfalls

  • Closing incidents prematurely. Marking an incident as resolved before the customer impact has actually ended artificially deflates MTTR. Define “resolved” clearly and verify that service is truly restored.
  • Not counting detection time. If the team discovers a problem informally (e.g., a developer notices something odd) and fixes it before opening an incident, the time is not captured. Encourage consistent incident reporting.
  • Ignoring recurring incidents. If the same issue keeps reappearing, each individual MTTR may be short, but the cumulative impact is high. Track recurrence as a separate quality signal.
  • Conflating MTTR with MTTD. Mean Time to Detect (MTTD) and Mean Time to Repair overlap but are distinct. If you only measure from alert to resolution, you miss the detection gap, the time between when the problem starts and when it is detected. Both matter.
  • Optimizing MTTR without addressing root causes. Getting faster at fixing recurring problems is good, but preventing those problems in the first place is better. Pair MTTR with Change Fail Rate to ensure the number of incidents is also decreasing.

Connection to CD

MTTR is a direct measure of how well the entire Continuous Delivery system supports recovery:

  • Pipeline speed is the floor. The minimum possible MTTR for a roll-forward fix is the Build Duration plus deployment time. A 30-minute build means you cannot restore service via a code fix in less than 30 minutes. Reducing build duration directly reduces MTTR.
  • Automated deployment enables fast recovery. Teams that can deploy with one click or automatically can roll back or roll forward in minutes. Manual deployment processes add significant time to every incident.
  • Feature flags accelerate mitigation. If a failing change is behind a feature flag, the team can disable it in seconds without deploying new code. This can reduce MTTR from minutes to seconds for flag-protected changes.
  • Observability shortens detection and diagnosis. Good logging, metrics, and tracing help the team identify the cause of an incident quickly. Without observability, diagnosis dominates the repair timeline.
  • Practice improves performance. Teams that deploy frequently have more experience responding to issues. High Release Frequency correlates with lower MTTR because the team has well-rehearsed recovery procedures.
  • Trunk-based development simplifies rollback. When trunk is always deployable, the team can roll back to the previous commit. Long-lived branches and complex merge histories make rollback risky and slow.

To improve MTTR:

  • Keep the pipeline always deployable so a fix can be deployed at any time.
  • Reduce Build Duration to enable faster roll-forward.
  • Implement feature flags for large changes so they can be disabled without redeployment.
  • Invest in observability: structured logging, distributed tracing, and meaningful alerting.
  • Practice incident response regularly, including deploying rollbacks and hotfixes.
  • Conduct blameless post-incident reviews and feed learnings back into the pipeline and monitoring.

4.7 - Release Frequency

How often changes are deployed to production. A DORA lagging outcome metric that confirms delivery throughput.

Definition

Release Frequency (also called Deployment Frequency) measures how often a team successfully deploys changes to production. It is expressed as deployments per day, per week, or per month, depending on the team’s current cadence.

Release Frequency formula
releaseFrequency = productionDeployments / timePeriod

This is one of the four DORA key metrics and a lagging outcome metric. It reflects the cumulative effect of upstream behaviors: work decomposition, integration practices, test quality, and pipeline automation. Higher release frequency is a consequence of those behaviors improving, not a lever to pull directly. To improve release frequency, improve Integration Frequency and Development Cycle Time first.

Each deployment should deliver a meaningful change. Re-deploying the same artifact or deploying empty changes does not count.

How to Measure

  1. Count production deployments. Record each successful deployment to the production environment over a defined period.
  2. Exclude non-changes. Do not count re-deployments of unchanged artifacts, infrastructure-only changes (unless relevant), or deployments to non-production environments.
  3. Calculate frequency. Divide the count by the time period. Express as deployments per day (for high performers) or per week/month (for teams earlier in their journey).

Data sources:

  • CD platforms: Argo CD, Spinnaker, Flux, Octopus Deploy, or similar tools track every deployment.
  • Pipeline logs: GitHub Actions, GitLab CI, Jenkins, and CircleCI record deployment job executions.
  • Cloud provider logs: AWS CodeDeploy, Azure DevOps, GCP Cloud Deploy, and Kubernetes audit logs.
  • Custom deployment scripts: Add a logging line that records the timestamp, service name, and version to a central log or metrics system.

Targets

LevelRelease Frequency
LowLess than once per 6 months
MediumOnce per month to once per 6 months
HighOnce per week to once per month
EliteMultiple times per day

These levels are drawn from the DORA State of DevOps research. Elite performers deploy on demand, multiple times per day, with each deployment containing a small set of changes.

Common Pitfalls

  • Counting empty deployments. Re-deploying the same artifact or building artifacts that contain no changes inflates the metric without delivering value. Count only deployments with meaningful changes.
  • Ignoring failed deployments. If you count deployments that are immediately rolled back, the frequency looks good but the quality is poor. Pair with Change Fail Rate to get the full picture.
  • Equating frequency with value. Deploying frequently is a means, not an end. Deploying 10 times a day delivers no value if the changes do not meet user needs. Release Frequency measures capability, not outcome.
  • Batch releasing to hit a target. Combining multiple changes into a single release to deploy “more often” defeats the purpose. The goal is small, individual changes flowing through the pipeline independently.
  • Focusing on speed without quality. If release frequency increases but Change Fail Rate also increases, the team is releasing faster than its quality processes can support. Slow down and improve the pipeline.

Connection to CD

Release Frequency is the ultimate output metric of a Continuous Delivery pipeline:

  • Validates the entire delivery system. High release frequency is only possible when the pipeline is fast, tests are reliable, deployment is automated, and the team has confidence in the process. It is the end-to-end proof that CD is working.
  • Reduces deployment risk. Each deployment carries less change when deployments are frequent. Less change means less risk, easier rollback, and simpler debugging when something goes wrong.
  • Enables rapid feedback. Frequent releases get features and fixes in front of users sooner. This shortens the feedback loop and allows the team to course-correct before investing heavily in the wrong direction.
  • Exercises recovery capability. Teams that deploy frequently practice the deployment process daily. When a production incident occurs, the deployment process is well-rehearsed and reliable, directly improving Mean Time to Repair.
  • Decouples deploy from release. At high frequency, teams separate the act of deploying code from the act of enabling features for users. Feature flags, progressive delivery, and dark launches become standard practice.

To improve Release Frequency:

  • Reduce Development Cycle Time by decomposing work into smaller increments.
  • Remove manual handoffs to other teams (e.g., ops, QA, change management).
  • Automate every step of the deployment process, from build through production verification.
  • Replace manual change approval boards with automated policy checks and peer review.
  • Convert hard dependencies on other teams or services into soft dependencies using feature flags and service virtualization.
  • Adopt Trunk-Based Development so that trunk is always in a deployable state.

4.8 - Work in Progress

Number of work items started but not yet completed. A leading indicator of flow problems, context switching, and delivery delays.

Definition

Work in Progress (WIP) is the total count of work items that have been started but not yet completed and delivered to production. This includes all types of work: stories, defects, tasks, spikes, and any other items that a team member has begun but not finished.

Work in Progress formula
wip = countOf(items where status is between "started" and "done")

WIP is a leading indicator from Lean manufacturing. Unlike trailing metrics such as Development Cycle Time or Lead Time, WIP tells you about problems that are happening right now. High WIP predicts future delivery delays, increased cycle time, and lower quality.

Little’s Law provides the mathematical relationship:

Little’s Law: cycle time as a function of WIP
cycleTime = wip / throughput

If throughput (the rate at which items are completed) stays constant, increasing WIP directly increases cycle time. The only way to reduce cycle time without working faster is to reduce WIP.

How to Measure

  1. Count all in-progress items. At a regular cadence (daily or at each standup), count the number of items in any active state on your team’s board. Include everything between “To Do” and “Done.”
  2. Normalize by team size. Divide WIP by the number of team members to get a per-person ratio. This makes the metric comparable across teams of different sizes.
  3. Track over time. Record the WIP count daily and observe trends. A rising WIP count is an early warning of delivery problems.

Data sources:

  • Kanban boards: Jira, Azure Boards, Trello, GitHub Projects, or physical boards. Count cards in any column between the backlog and done.
  • Issue trackers: Query for items with an “In Progress,” “In Review,” “In QA,” or equivalent active status.
  • Manual count: At standup, ask: “How many things are we actively working on right now?”

The simplest and most effective approach is to make WIP visible by keeping the team board up to date and counting active items daily.

Targets

LevelWIP per Team
LowMore than 2x team size
MediumBetween 1x and 2x team size
HighEqual to team size
EliteLess than team size (ideally half)

The guiding principle is that WIP should never exceed team size. A team of five should have at most five items in progress at any time. Elite teams often work in pairs, bringing WIP to roughly half the team size.

Common Pitfalls

  • Hiding work. Not moving items to “In Progress” when working on them keeps WIP artificially low. The board must reflect reality. If someone is working on it, it should be visible.
  • Marking items done prematurely. Moving items to “Done” before they are deployed to production understates WIP. The Definition of Done must include production deployment.
  • Creating micro-tasks. Splitting a single story into many small tasks (development, testing, code review, deployment) and tracking each separately inflates the item count without changing the actual work. Measure WIP at the story or feature level.
  • Ignoring unplanned work. Production support, urgent requests, and interruptions consume capacity but are often not tracked on the board. If the team is spending time on it, it is WIP and should be visible.
  • Setting WIP limits but not enforcing them. WIP limits only work if the team actually stops starting new work when the limit is reached. Treat WIP limits as a hard constraint, not a suggestion.

Connection to CD

WIP is the most actionable flow metric and directly impacts every aspect of Continuous Delivery:

  • Predicts cycle time. Per Little’s Law, WIP and cycle time are directly proportional. Reducing WIP is the fastest way to reduce Development Cycle Time without changing anything else about the delivery process.
  • Reduces context switching. When developers juggle multiple items, they lose time switching between contexts. Research consistently shows that each additional item in progress reduces effective productivity. Low WIP means more focus and faster completion.
  • Exposes blockers. When WIP limits are in place and an item gets blocked, the team cannot simply start something new. They must resolve the blocker first. This forces the team to address systemic problems rather than working around them.
  • Enables continuous flow. CD depends on a steady flow of small changes moving through the pipeline. High WIP creates irregular, bursty delivery. Low WIP creates smooth, predictable flow.
  • Improves quality. When teams focus on fewer items, each item gets more attention. Code reviews happen faster, testing is more thorough, and defects are caught sooner. This naturally reduces Change Fail Rate.
  • Supports trunk-based development. High WIP often correlates with many long-lived branches. Reducing WIP encourages developers to complete and integrate work before starting something new, which aligns with Integration Frequency goals.

To reduce WIP:

  • Set explicit WIP limits for the team and enforce them. Start with a limit equal to team size and reduce it over time.
  • Prioritize finishing work over starting new work. At standup, ask “What can I help finish?” before “What should I start?”
  • Prioritize code review and pairing to unblock teammates over picking up new items.
  • Make the board visible and accurate. Use it as the single source of truth for what the team is working on.
  • Identify and address recurring blockers that cause items to stall in progress.

5 - DORA Recommended Practices

The practices that drive software delivery performance, as identified by DORA research.

The DevOps Research and Assessment (DORA) research program has identified practices that predict high software delivery performance. These practices are not tools or technologies. They are cultural conditions and behaviors that enable teams to deliver software quickly, reliably, and sustainably.

This page organizes the DORA recommended practices by their relevance to each migration phase. Use it as a reference to understand which practices you are building at each stage of your journey and which ones to focus on next.

Practice Maturity by Phase

PracticePhase 0Phase 1Phase 2Phase 3Phase 4
Version controlPrerequisite
Continuous integrationPrimary
Deployment automationPrimary
Trunk-based developmentPrimary
Test automationPrimaryExpanded
Test data managementPrimary
Shift left on securityPrimary
Loosely coupled architecturePrimary
Empowered teamsOngoingOngoingOngoingOngoingOngoing
Customer feedbackPrimary
Value stream visibilityPrimaryRevisited
Working in small batchesStartedPrimary
Team experimentationOngoingOngoingOngoingOngoingOngoing
Limit WIPPrimary
Visual managementStartedOngoingOngoingOngoingOngoing
Monitoring and observabilityStartedExpandedPrimary
Proactive notificationPrimary
Generative cultureOngoingOngoingOngoingOngoingOngoing
Learning cultureOngoingOngoingOngoingOngoingOngoing
Collaboration among teamsStartedPrimary
Job satisfactionOngoingOngoingOngoingOngoingOngoing
Transformational leadershipOngoingOngoingOngoingOngoingOngoing

Continuous Delivery Practices

These practices directly support the mechanics of getting software from commit to production. They are the primary focus of Phases 1 and 2 of the migration.

Version Control

All production artifacts (application code, test code, infrastructure configuration, deployment scripts, and database schemas) are stored in version control and can be reproduced from a single source of truth.

Migration relevance: This is a prerequisite for Phase 1. If any part of your delivery process depends on files stored on a specific person’s machine or a shared drive, address that before beginning the migration.

Continuous Integration

Developers integrate their work to trunk at least daily. Each integration triggers an automated build and test process. Broken builds are fixed within minutes.

Migration relevance: Phase 1: Foundations. CI is the gateway practice. Without it, none of the pipeline practices in Phase 2 can function. See Build Automation and Trunk-Based Development.

Deployment Automation

Deployments are fully automated and can be triggered by anyone on the team. No manual steps are required between a green pipeline and production.

Migration relevance: Phase 2: Pipeline. Specifically, Single Path to Production and Rollback.

Trunk-Based Development

Developers work in small batches and merge to trunk at least daily. Branches, if used, are short-lived (less than one day). There are no long-lived feature branches.

Migration relevance: Phase 1: Trunk-Based Development. This is one of the first practices to establish because it enables CI.

Test Automation

A comprehensive suite of automated tests provides confidence that the software is deployable. Tests are reliable, fast, and maintained as carefully as production code.

Migration relevance: Phase 1: Testing Fundamentals. Also see the Testing reference section for guidance on specific test types.

Test Data Management

Test data is managed in a way that allows automated tests to run independently, repeatably, and without relying on shared mutable state. Tests can create and clean up their own data.

Migration relevance: Becomes critical during Phase 2 when you need production-like environments and deterministic pipeline results.

Shift Left on Security

Security is integrated into the development process rather than added as a gate at the end. Automated security checks run in the pipeline. Security requirements are part of the definition of deployable.

Migration relevance: Integrated during Phase 2: Pipeline Architecture as automated quality gates rather than manual review steps.

Architecture Practices

These practices address the structural characteristics of your system that enable or prevent independent, frequent deployment.

Loosely Coupled Architecture

Teams can deploy their services independently without coordinating with other teams. Changes to one service do not require changes to other services. APIs have well-defined contracts.

Migration relevance: Phase 3: Architecture Decoupling. This practice becomes critical when optimizing for deployment frequency and small batch sizes.

Product and Process Practices

These practices address how work is planned, prioritized, and delivered.

Customer Feedback

Product decisions are informed by direct feedback from customers. Teams can observe how features are used in production and adjust accordingly.

Migration relevance: Becomes fully enabled in Phase 4: Deliver on Demand when every change reaches production quickly enough for real customer feedback to inform the next change.

Value Stream Visibility

The team has a clear view of the entire delivery process from request to production, including wait times, handoffs, and rework loops.

Migration relevance: Phase 0: Value Stream Mapping. This is the first activity in the migration because it informs every decision that follows.

Working in Small Batches

Work is broken down into small increments that can be completed, tested, and deployed independently. Each increment delivers measurable value or validated learning.

Migration relevance: Begins in Phase 1: Work Decomposition and is optimized in Phase 3: Small Batches.

Limit Work in Progress

Teams have explicit WIP limits that constrain the number of items in any stage of the delivery process. WIP limits are enforced and respected.

Migration relevance: Phase 3: Limiting WIP. Reducing WIP is one of the most effective ways to improve lead time and delivery predictability.

Visual Management

The state of all work is visible to the entire team through dashboards, boards, or other visual tools. Anyone can see what is in progress, what is blocked, and what has been deployed.

Migration relevance: All phases. Visual management supports the identification of constraints in Phase 0 and the enforcement of WIP limits in Phase 3.

Monitoring and Observability

Teams have access to production metrics, logs, and traces that allow them to understand system behavior, detect issues, and diagnose problems quickly.

Migration relevance: Critical for Phase 4: Progressive Rollout where automated health checks determine whether a deployment proceeds or rolls back. Also supports fast mean time to restore.

Proactive Notification

Teams are alerted to problems before customers are affected. Monitoring thresholds and anomaly detection trigger notifications that enable rapid response.

Migration relevance: Becomes critical in Phase 4 when deployments are continuous and automated. Proactive notification is what makes continuous deployment safe.

Collaboration Among Teams

Development, operations, security, and product teams work together rather than in silos. Handoffs are minimized. Shared responsibility replaces blame.

Migration relevance: All phases, but especially Phase 2: Pipeline where the pipeline must encode the quality criteria from all disciplines (security, testing, operations) into automated gates.

Practices Relevant in Every Phase

The following practices are not tied to a specific migration phase. They are conditions that support every phase and should be cultivated continuously throughout the migration.

Empowered Teams. Teams choose their own tools, technologies, and approaches within organizational guardrails. Teams that cannot make local decisions about their pipeline, test strategy, or deployment approach will be unable to iterate quickly enough to make progress.

Team Experimentation. Teams can try new ideas, tools, and approaches without requiring lengthy approval. Failed experiments are treated as learning, not waste. The migration itself is an experiment that requires psychological safety and organizational support.

Generative Culture. Following Ron Westrum’s typology, a generative culture is characterized by high cooperation, shared risk, and focus on the mission. Teams in pathological or bureaucratic cultures will struggle with every phase because practices like TBD and CI require trust and psychological safety.

Learning Culture. The organization invests in learning. Teams have time for experimentation, training, and knowledge sharing. The CD migration is a learning journey that requires time and space to learn new practices, make mistakes, and improve.

Job Satisfaction. Team members find their work meaningful and have the autonomy and resources to do it well. The migration should improve job satisfaction by reducing toil and giving teams faster feedback. If the migration is experienced as a burden, something is wrong with the approach.

Transformational Leadership. Leaders support the migration with vision, resources, and organizational air cover. Without leadership support, the migration will stall when it encounters the first organizational blocker.

6 - CD Dependency Tree

Visual guide showing how CD practices depend on and build upon each other.

The full interactive dependency tree is at practices.minimumcd.org. This page summarizes the key dependency chains and how they map to the migration phases in this guide.

Continuous delivery is not a single practice you adopt. It is a system of interdependent practices where each one supports and enables others. Understanding these dependencies helps you plan your migration in the right order, addressing foundational practices before building on them.

Using the Tree to Diagnose Problems

When something in your delivery process is not working, trace it through the dependency tree to find the root cause.

Deployments keep failing. Look at what feeds CD in the tree. Is your pipeline deterministic? Are you using immutable artifacts? Is your application config externalized? The failure is likely in one of the pipeline practices.

CI builds are constantly broken. Look at what feeds CI. Are developers actually practicing TBD (integrating daily)? Is the test suite reliable, or is it full of flaky tests? Is the build automated end-to-end? The broken builds are a symptom of a problem in the development practices layer.

You cannot reduce batch size. Look at what feeds small batches. Is work being decomposed into vertical slices? Are feature flags available so partial work can be deployed safely? Is the architecture decoupled enough to allow independent deployment? The batch size problem originates in one of these upstream practices.

Every feature requires cross-team coordination to deploy. Look at team structure. Are teams organized around domains they can deliver independently, or around technical layers that force handoffs for every feature? If deploying a feature requires the frontend team, backend team, and DBA team to coordinate a release window, the team structure is preventing independent delivery. No amount of pipeline automation fixes this. The team boundaries need to change.

Mapping to Migration Phases

The dependency tree directly informs the sequencing of migration phases:

Dependency LayerMigration PhaseWhy This Order
Development practices (BDD, trunk-based development)Phase 1 - FoundationsThese are prerequisites for CI, which is a prerequisite for everything else
Build and test infrastructure (build automation, automated testing, test environments)Phase 1 and Phase 2You need reliable build and test infrastructure before you can build a reliable pipeline
Pipeline practices (application pipeline, immutable artifacts, configuration management, rollback)Phase 2 - PipelineThe pipeline depends on solid CI and development practices
Flow optimization (small batches, feature flags, WIP limits, metrics)Phase 3 - OptimizeOptimization requires a working pipeline to optimize
Organizational practices (cross-functional teams, component ownership, developer-driven support)All phasesThese cross-cutting practices support every phase. Team structure should be addressed early because it constrains architecture and work decomposition

Understanding the Dependency Model

How Dependencies Work

CD sits at the top of the tree. It depends directly on many practices, each of which has its own dependencies. When practice A depends on practice B, it means B is a prerequisite or enabler for A. You cannot reliably adopt A without B in place.

For example, continuous delivery depends directly on:

CategoryDirect Dependencies
PipelineApplication pipeline, immutable artifacts, on-demand rollback, configuration management
TestingContinuous testing, automated database changes, test environments
IntegrationContinuous integration
EnvironmentAutomated environment provisioning, monitoring and alerting
OrganizationalCross-functional product teams, developer-driven support, prioritized features
DevelopmentATDD, modular system design

Each of these has its own dependency chain. The application pipeline alone depends on automated testing, deployment automation, automated artifact versioning, and quality gates. Automated testing in turn depends on build automation. Build automation depends on version control and dependency management. The chain runs deep.

Key Dependency Chains

BDD enables testing enables CI enables CD

Behavior-Driven Development produces clear, testable acceptance criteria. Those criteria drive component testing and acceptance test-driven development. A comprehensive, fast test suite enables Continuous Integration with confidence. And CI is the foundational prerequisite for CD.

If your team skips BDD, stories are ambiguous. If stories are ambiguous, tests are incomplete or wrong. If tests are unreliable, CI is unreliable. And if CI is unreliable, CD is impossible.

Trunk-Based Development enables CI

CI requires that all developers integrate to a shared trunk at least once per day. If your team uses long-lived feature branches, you are not doing CI regardless of how often your build server runs. TBD is not optional for CD. It is a prerequisite.

Cross-functional teams enable component ownership enables modular systems

How teams are organized determines what they can deliver independently. A team organized around a domain (owning the services, data, and interfaces for that domain) can decompose work into vertical slices within their boundary and deploy without coordinating with other teams. A team organized around a technical layer (the “frontend team,” the “DBA team”) cannot. Every feature requires handoffs across layer teams, and deployment requires coordinating all of them.

Conway’s Law makes this structural: the system’s architecture will mirror the team structure. In the dependency tree, cross-functional product teams enable component ownership, which enables the modular system design that CD requires.

Version control is the root of everything

Nearly every automation practice traces back to version control. Build automation, configuration management, infrastructure automation, and component ownership all depend on it. If your version control practices are weak (infrequent commits, poor branching discipline, configuration stored outside version control), the entire tree above it is compromised.

7 - Glossary

Key terms and definitions used throughout this guide.

This glossary defines the terms used across every phase of the CD migration guide. Where a term has a specific meaning within a migration phase, the relevant phase is noted.

For terms related to agentic continuous delivery, AI agents, and LLMs, see the Agentic CD Glossary.

A

Acceptance Criteria

Concrete expectations for a change, expressed as observable outcomes that can be used as fitness functions - executed as deterministic tests or evaluated by review agents. In ACD, acceptance criteria include a done definition (what “done” looks like from an observer’s perspective) and an evaluation design (test cases with known-good outputs). They constrain the agent: comprehensive criteria prevent incorrect code from passing, while shallow criteria allow code that passes tests but violates intent. See Acceptance Criteria.

Referenced in: Agent-Assisted Specification, Agent Delivery Contract, AI Adoption Roadmap, AI-Generated Code Ships Without Developer Understanding, AI Is Generating Technical Debt Faster Than the Team Can Absorb It, AI Tooling Slows You Down Instead of Speeding You Up, CD Dependency Tree, Find Your Symptom, Pipeline Enforcement and Expert Agents, Pitfalls and Metrics, Rubber-Stamping AI-Generated Code, Small-Batch Agent Sessions, Testing Fundamentals, The Four Prompting Disciplines, Tokenomics: Optimizing Token Usage in Agent Architecture, Work Decomposition, Working Agreements

ACD (Agentic Continuous Delivery)

See Agentic CD Glossary.

Agent (AI)

See Agentic CD Glossary.

Agent Loop

See Agentic CD Glossary.

Agent Session

See Agentic CD Glossary.

Artifact

A packaged, versioned output of a build process (e.g., a container image, JAR file, or binary). In a CD pipeline, artifacts are built once and promoted through environments without modification. See Immutable Artifacts.

Referenced in: Agent-Assisted Specification, Agentic Architecture Patterns, Agentic Continuous Delivery (ACD), Build Automation, Build Duration, CD for Greenfield Projects, Coding and Review Agent Configuration, Data Pipelines and ML Models Have No Deployment Automation, Deployable Definition, Deployments Are One-Way Doors, Deterministic Pipeline, Developers Cannot Run the Pipeline Locally, DORA Recommended Practices, End-to-End Tests, Every Change Requires a Ticket and Approval Chain, Experience Reports, Component Tests, Independent Teams, Independent Deployables, Merge Freezes Before Deployments, Metrics-Driven Improvement, Missing Deployment Pipeline, Multiple Teams, Single Deployable, No Contract Testing Between Services, No Evidence of What Was Deployed or When, Pipeline Enforcement and Expert Agents, Pitfalls and Metrics, Rollback, Single Team, Single Deployable, Small-Batch Agent Sessions, The Agentic Development Learning Curve, The Build Runs Again for Every Environment, Agent Delivery Contract, The Team Ignores Alerts Because There Are Too Many, The Team Is Afraid to Deploy, Tightly Coupled Monolith, Tokenomics: Optimizing Token Usage in Agent Architecture, Working Agreements

B

Black Box Testing

See Testing Glossary.

Baseline Metrics

The set of delivery measurements taken before beginning a migration, used as the benchmark against which improvement is tracked. See Phase 0 - Baseline Metrics.

Referenced in: Phase 0: Assess

Batch Size

The amount of change included in a single deployment. Smaller batches reduce risk, simplify debugging, and shorten feedback loops. Reducing batch size is a core focus of Phase 3 - Small Batches.

Referenced in: CD Dependency Tree, DORA Recommended Practices, FAQ, Hardening Sprints Are Needed Before Every Release, Metrics-Driven Improvement, Missing Deployment Pipeline, New Releases Introduce Regressions in Previously Working Functionality, Phase 2: Pipeline, Releases Are Infrequent and Painful, Small Batches

BDD (Behavior-Driven Development)

A collaboration practice where developers, testers, and product representatives define expected behavior using structured examples before code is written. BDD produces executable specifications that serve as both documentation and automated tests. BDD supports effective work decomposition by forcing clarity about what a story actually means before development begins.

Referenced in: Agent-Assisted Specification, Agentic Continuous Delivery (ACD), AI Tooling Slows You Down Instead of Speeding You Up, CD Dependency Tree, Coding and Review Agent Configuration, Getting Started: Where to Put What, Knowledge & Communication Defects, Pipeline Enforcement and Expert Agents, Pitfalls and Metrics, Small Batches, Small-Batch Agent Sessions, TBD Migration Guide, Agent Delivery Contract, Work Decomposition

Blue-Green Deployment

A deployment strategy that maintains two identical production environments. New code is deployed to the inactive environment, verified, and then traffic is switched. See Progressive Rollout.

Referenced in: Every Deployment Is Immediately Visible to All Users, Process & Deployment Defects

Branch Lifetime

The elapsed time between creating a branch and merging it to trunk. CD requires branch lifetimes measured in hours, not days or weeks. Long branch lifetimes are a symptom of poor work decomposition or slow code review. See Trunk-Based Development.

Referenced in: AI Adoption Roadmap, FAQ, Feedback Takes Hours Instead of Minutes, Long-Lived Feature Branches, Merging Is Painful and Time-Consuming, Metrics-Driven Improvement, TBD Migration Guide

C

Canary Deployment

A deployment strategy where a new version is rolled out to a small subset of users or servers before full rollout. If the canary shows no issues, the deployment proceeds to 100%. See Progressive Rollout.

Referenced in: Change & Complexity Defects, Pipeline Enforcement and Expert Agents, Process & Deployment Defects, Progressive Rollout

CD (Continuous Delivery)

The practice of ensuring that every change to the codebase is always in a deployable state and can be released to production at any time through a fully automated pipeline. Continuous delivery does not require that every change is deployed automatically, but it requires that every change could be deployed automatically. This is the primary goal of this migration guide.

Referenced in: Agent-Assisted Specification, AI Adoption Roadmap, Agentic Continuous Delivery (ACD), CD Dependency Tree, CD for Greenfield Projects, Change Advisory Board Gates, Data Pipelines and ML Models Have No Deployment Automation, Deterministic Pipeline, DORA Recommended Practices, Experience Reports, FAQ, Feature Flags, Horizontal Slicing, Independent Teams, Independent Deployables, Inverted Test Pyramid, Knowledge Silos, Leadership Sees CD as a Technical Nice-to-Have, Learning Paths, Long-Lived Feature Branches, Manual Testing Only, Metrics-Driven Improvement, Missing Deployment Pipeline, Phase 0: Assess, Phase 1: Foundations, Phase 2: Pipeline, Phase 3: Optimize, Pipeline Enforcement and Expert Agents, Pipeline Reference Architecture, Process & Deployment Defects, Push-Based Work Assignment, Retrospectives, Rubber-Stamping AI-Generated Code, Small Batches, Team Membership Changes Constantly, Test Doubles, Testing Fundamentals, The Deployment Target Does Not Support Modern CI/CD Tooling, Thin-Spread Teams, Tightly Coupled Monolith, Unit Tests, Work Decomposition

Change Failure Rate (CFR)

The percentage of deployments to production that result in a degraded service and require remediation (e.g., rollback, hotfix, or patch). One of the four DORA metrics. See Metrics - Change Fail Rate.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Change Advisory Board Gates, Experience Reports, FAQ, Metrics-Driven Improvement, Phase 0: Assess, Pitfalls and Metrics, Retrospectives

CI (Continuous Integration)

The practice of integrating code changes to a shared trunk at least once per day, where each integration is verified by an automated build and test suite. CI is a prerequisite for CD, not a synonym. A team that runs automated builds on feature branches but merges weekly is not doing CI. See Build Automation.

Referenced in: Architecture Decoupling, CD Dependency Tree, CD for Greenfield Projects, Change & Complexity Defects, Data & State Defects, Data Pipelines and ML Models Have No Deployment Automation, Dependency & Infrastructure Defects, Deterministic Pipeline, Developers Cannot Run the Pipeline Locally, Experience Reports, FAQ, Feedback Takes Hours Instead of Minutes, Component Tests, Integration & Boundaries Defects, Inverted Test Pyramid, It Works on My Machine, Long-Lived Feature Branches, Manual Testing Only, Merge Freezes Before Deployments, Merging Is Painful and Time-Consuming, Metrics-Driven Improvement, Missing Deployment Pipeline, No Evidence of What Was Deployed or When, Performance & Resilience Defects, Pipeline Enforcement and Expert Agents, Pipeline Reference Architecture, Process & Deployment Defects, Coding and Review Agent Configuration, Agentic Architecture Patterns, Security & Compliance Defects, Security Review Is a Gate, Not a Guardrail, Services Reach Production with No Health Checks or Alerting, Small-Batch Agent Sessions, Symptoms for Developers, Test Suite Is Too Slow to Run, Testing & Observability Gap Defects, Tests Pass in One Environment but Fail in Another, Tests Randomly Pass or Fail, The Development Workflow Has Friction at Every Step, Unit Tests

Constraint

In the Theory of Constraints, the single factor most limiting the throughput of a system. During a CD migration, your job is to find and fix constraints in order of impact. See Identify Constraints.

Referenced in: Agent-Assisted Specification, Agent Delivery Contract, AI Is Generating Technical Debt Faster Than the Team Can Absorb It, Baseline Metrics, Build Automation, Current State Checklist, DORA Recommended Practices, Experience Reports, FAQ, Identify Constraints, Knowledge Silos, Learning Paths, Migrate to CD, Migrating Brownfield to CD, Multiple Services Must Be Deployed Together, Phase 0: Assess, Push-Based Work Assignment, Releases Are Infrequent and Painful, Releases Depend on One Person, Security Review Is a Gate, Not a Guardrail, Sprint Planning Is Dominated by Dependency Negotiation, The Agentic Development Learning Curve, The Four Prompting Disciplines, Untestable Architecture, Value Stream Mapping

Context (LLM)

See Agentic CD Glossary.

Context Window

See Agentic CD Glossary.

Context Engineering

See Agentic CD Glossary.

Continuous Deployment

An extension of continuous delivery where every change that passes the automated pipeline is deployed to production without manual intervention. Continuous delivery ensures every change can be deployed; continuous deployment ensures every change is deployed. See Phase 4 - Deliver on Demand.

Referenced in: AI Adoption Roadmap, Architecture Decoupling, Change Advisory Board Gates, DORA Recommended Practices, Experience Reports, FAQ, Feature Flags, Tightly Coupled Monolith

D

Deployable

A change that has passed all automated quality gates defined by the team and is ready for production deployment. The definition of deployable is codified in the pipeline, not decided by a person at deployment time. See Deployable Definition.

Referenced in: CD for Greenfield Projects, DORA Recommended Practices, Deployable Definition, Everything Started, Nothing Finished, Experience Reports, FAQ, Component Tests, Horizontal Slicing, Independent Teams, Independent Deployables, Long-Lived Feature Branches, Merge Freezes Before Deployments, Monolithic Work Items, Multiple Services Must Be Deployed Together, Multiple Teams, Single Deployable, Releases Are Infrequent and Painful, Rubber-Stamping AI-Generated Code, Small Batches, Team Alignment to Code, Trunk-Based Development, Work Decomposition, Work Items Take Days or Weeks to Complete, Working Agreements

Deployment Frequency

How often an organization successfully deploys to production. One of the four DORA metrics. See Metrics - Release Frequency.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Change Advisory Board Gates, DORA Recommended Practices, Experience Reports, Integration Frequency, Leadership Sees CD as a Technical Nice-to-Have, Metrics-Driven Improvement, Missing Deployment Pipeline, No Contract Testing Between Services, Phase 0: Assess, Process & Deployment Defects, Release Frequency, Retrospectives, Single Path to Production, TBD Migration Guide, The Team Is Caught Between Shipping Fast and Not Breaking Things, Tightly Coupled Monolith, Untestable Architecture

Development Cycle Time

The elapsed time from the first commit on a change to that change being deployable. This measures the efficiency of your development and pipeline process, excluding upstream wait times. See Metrics - Development Cycle Time.

Dependency

Code, service, or resource whose behavior is not defined in the current module. Dependencies vary by location and ownership:

  • Internal dependency - code in another file or module within the same repository, or in another repository your team controls. Internal dependencies share your release cycle and your team can change them directly.
  • External dependency - a third-party library, external API, or managed service outside your team’s direct control.

The distinction matters for testing. Internal dependencies are part of your own codebase and should be exercised through real code paths in tests. Replacing them with test doubles couples your tests to implementation details and causes rippling failures during routine refactoring. Reserve test doubles for external dependencies and runtime connections where real invocation is impractical or non-deterministic.

See also: Hard Dependency, Soft Dependency.

Referenced in: Defect Feedback Loop, Testing Fundamentals, The Agentic Development Learning Curve, Work Decomposition

Declarative Agent

See Agentic CD Glossary.

Delivery Contract

See Agentic CD Glossary.

Done Definition

The observable outcomes portion of acceptance criteria. A done definition describes what “done” looks like from an independent observer’s perspective - someone who was not involved in the implementation. Combined with an evaluation design, done definitions form the testable boundary of a delivery contract. See Agent Delivery Contract.

Referenced in: Agent Delivery Contract, Agent-Assisted Specification

DORA Metrics

The four key metrics identified by the DORA (DevOps Research and Assessment) research program as predictive of software delivery performance: deployment frequency, lead time for changes, change failure rate, and mean time to restore service. See DORA Recommended Practices.

Referenced in: CD for Greenfield Projects, Change Fail Rate, Development Cycle Time, DORA Recommended Practices, Experience Reports, FAQ, Lead Time, Mean Time to Repair, Metrics-Driven Improvement, Phase 3: Optimize, Product & Discovery Defects, Release Frequency, Retrospectives, Small Batches, Work Decomposition

E

External Dependency

A dependency on code or services outside your team’s direct control. External dependencies include third-party libraries, public APIs, managed cloud services, and any resource whose release cycle and availability your team cannot influence.

External dependencies are the primary case where test doubles add value. A test double for an external API verifies your integration logic without relying on network availability or third-party rate limits. By contrast, mocking internal code - another class in the same repository or a module your team owns - creates fragile tests that break whenever the internal implementation changes, even when the behavior is correct.

When evaluating whether to mock something, ask: “Can my team change this code and release it in our pipeline?” If yes, it is an internal dependency and should be tested through real code paths. If no, it is an external dependency and a test double is appropriate.

See also: Dependency, Hard Dependency.

Referenced in: Testing Fundamentals

Evaluation Design

See Agentic CD Glossary.

Expert Agent

See Agentic CD Glossary.

F

Feature Team

A team organized around user-facing features or customer journeys rather than owned product subdomains. A feature team is cross-functional - it contains the skills to deliver a feature end-to-end - but it does not own a stable domain of code. Multiple feature teams may modify the same components, with no single team accountable for quality or consistency within them.

In practice: feature teams must re-orient on code they do not continuously maintain each time a feature requires it; quality agreements cannot be enforced within the team because other teams also modify the same code; and while feature teams appear to minimize inter-team dependencies, they produce the opposite - everyone who can change a component is effectively on the same large, loosely communicating team. Feature teams are structurally equivalent to long-lived project teams.

Contrast with full-stack product team and subdomain product team, which achieve cross-functional delivery through stable domain ownership rather than feature-by-feature assembly.

Referenced in: Team Alignment to Code

Feature Flag

A mechanism that allows code to be deployed to production with new functionality disabled, then selectively enabled for specific users, percentages of traffic, or environments. Feature flags decouple deployment from release. See Feature Flags.

Referenced in: Architecture Decoupling, CD Dependency Tree, CD for Greenfield Projects, Change & Complexity Defects, Change Advisory Board Gates, Change Fail Rate, Database Migrations Block or Break Deployments, Deploying Stateful Services Causes Outages, Every Change Requires a Ticket and Approval Chain, Every Deployment Is Immediately Visible to All Users, Experience Reports, FAQ, Feature Flags, Hard-Coded Environment Assumptions, Horizontal Slicing, Integration Frequency, Long-Lived Feature Branches, Mean Time to Repair, Monolithic Work Items, Phase 3: Optimize, Pipeline Enforcement and Expert Agents, Product & Discovery Defects, Progressive Rollout, Rollback, Single Path to Production, Small Batches, TBD Migration Guide, Teams Cannot Change Their Own Pipeline Without Another Team, The Team Resists Merging to the Main Branch, Trunk-Based Development, Vendor Release Cycles Constrain the Team’s Deployment Frequency, Work Decomposition, Work Requires Sign-Off from Teams Not Involved in Delivery, Working Agreements

Flow Efficiency

The ratio of active work time to total elapsed time in a delivery process. A flow efficiency of 15% means that for every hour of actual work, roughly 5.7 hours are spent waiting. Value stream mapping reveals your flow efficiency. See Value Stream Mapping.

Referenced in: Value Stream Mapping

Full-Stack Product Team

A team that owns every layer of a user-facing capability - UI, API, and data store - and whose public interface is designed for human users. A vertical slice for a full-stack product team delivers one observable behavior from the user interface through to the database. The slice is done when a user can observe the behavior through that interface. Contrast with subdomain product team.

Referenced in: Horizontal Slicing, Small Batches, Work Decomposition

G

Guardrail

A safety constraint encoded in a pipeline, system prompt, or hook that limits what an agent can do. Guardrails are deterministic boundaries, not suggestions. Examples include pre-commit hooks that block secrets from being committed, pipeline gates that reject changes exceeding a complexity threshold, and system prompt rules that prevent an agent from modifying test specifications. Guardrails protect against both agent errors and hallucinations without requiring human intervention on every change. See Pipeline Enforcement and Expert Agents.

Referenced in: AI Adoption Roadmap, Coding and Review Agent Configuration, Pipeline Enforcement and Expert Agents, The Four Prompting Disciplines

GitFlow

A branching model created by Vincent Driessen in 2010 that uses multiple long-lived branches (main, develop, release/*, hotfix/*, feature/*) with specific merge rules and directions. GitFlow was designed for infrequent, scheduled releases and is fundamentally incompatible with continuous delivery because it defers integration, creates multiple paths to production, and adds merge complexity. See the TBD Migration Guide for a step-by-step path from GitFlow to trunk-based development.

Referenced in: Single Path to Production, TBD Migration Guide, Trunk-Based Development

H

Hard Dependency

A dependency that must be resolved before work can proceed. In delivery, hard dependencies include things like waiting for another team’s API, a shared database migration, or an infrastructure provisioning request. Hard dependencies create queues and increase lead time. Eliminating hard dependencies is a focus of Architecture Decoupling.

Referenced in: Team Alignment to Code

Hallucination

See Agentic CD Glossary.

Hardening Sprint

A sprint dedicated to stabilizing and fixing defects before a release. The existence of hardening sprints is a strong signal that quality is not being built in during regular development. Teams practicing CD do not need hardening sprints because every commit is deployable. See Testing Fundamentals.

Referenced in: Hardening Sprints Are Needed Before Every Release

Hook (Agent)

See Agentic CD Glossary.

Hypothesis-Driven Development

An approach that frames every change as an experiment with a predicted outcome. Instead of specifying a change as a requirement to implement, the team states a hypothesis: “We believe [this change] will produce [this outcome] because [this reason].” After deployment, the team validates whether the predicted outcome occurred. Changes that confirm the hypothesis build confidence. Changes that refute it produce learning that informs the next hypothesis. This creates a feedback loop where every deployed change generates a signal, whether it “succeeds” or not. See Hypothesis-Driven Development for the full lifecycle and Agent Delivery Contract for how hypotheses integrate with specification artifacts.

Referenced in: Metrics-Driven Improvement, Agent Delivery Contract, Agent-Assisted Specification

I

Immutable Artifact

A build artifact that is never modified after creation. The same artifact that is tested in the pipeline is the exact artifact that is deployed to production. Configuration differences between environments are handled externally. See Immutable Artifacts.

Referenced in: CD Dependency Tree, FAQ, Merge Freezes Before Deployments

Intent Engineering

See Agentic CD Glossary.

Integration Frequency

How often a developer integrates code to the shared trunk. CD requires at least daily integration. See Metrics - Integration Frequency.

Referenced in: The Team Has No Shared Agreements About How to Work

L

Lead Time for Changes

The elapsed time from when a commit is made to when it is successfully running in production. One of the four DORA metrics. See Metrics - Lead Time.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Development Cycle Time, FAQ, Lead Time, Leadership Sees CD as a Technical Nice-to-Have, Manual Testing Only, Metrics-Driven Improvement, Phase 0: Assess, Retrospectives, Security Review Is a Gate, Not a Guardrail, Working Agreements

M

Mean Time to Restore (MTTR)

The elapsed time from when a production incident is detected to when service is restored. One of the four DORA metrics. Teams practicing CD have short MTTR because deployments are small, rollback is automated, and the cause of failure is easy to identify. See Metrics - Mean Time to Repair.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Metrics-Driven Improvement, Retrospectives

Model Routing

See Agentic CD Glossary.

Modular Monolith

A single deployable application whose codebase is organized into well-defined modules with explicit boundaries. Each module encapsulates a bounded domain and communicates with other modules through defined interfaces, not by reaching into shared database tables or calling internal methods directly. The application deploys as one unit, but its internal structure allows teams to reason about, test, and change one module independently. See Pipeline Reference Architecture and Premature Microservices.

Referenced in: Multiple Teams, Single Deployable, Pipeline Reference Architecture, Single Team, Single Deployable, Team Alignment to Code

O

Orchestrator

See Agentic CD Glossary.

P

Pipeline

The automated sequence of build, test, and deployment stages that every change passes through on its way to production. See Phase 2 - Pipeline.

Referenced in: Agentic Continuous Delivery (ACD), AI Adoption Roadmap, CD Dependency Tree, CD for Greenfield Projects, Change Advisory Board Gates, Data Pipelines and ML Models Have No Deployment Automation, Database Migrations Block or Break Deployments, Deploying Stateful Services Causes Outages, Deployments Are One-Way Doors, Deterministic Pipeline, Developers Cannot Run the Pipeline Locally, DORA Recommended Practices, Each Language Has Its Own Ad Hoc Pipeline, Every Change Rebuilds the Entire Repository, Every Change Requires a Ticket and Approval Chain, Every Deployment Is Immediately Visible to All Users, Experience Reports, Feedback Takes Hours Instead of Minutes, Component Tests, Getting a Test Environment Requires Filing a Ticket, Getting Started: Where to Put What, High Coverage but Tests Miss Defects, Horizontal Slicing, Independent Teams, Independent Deployables, Inverted Test Pyramid, Leadership Sees CD as a Technical Nice-to-Have, Long-Lived Feature Branches, Manual Testing Only, Merge Freezes Before Deployments, Metrics-Driven Improvement, Missing Deployment Pipeline, No Evidence of What Was Deployed or When, Phase 1: Foundations, Phase 2: Pipeline, Phase 3: Optimize, Pipeline Enforcement and Expert Agents, Pipeline Reference Architecture, Pipelines Take Too Long, Pitfalls and Metrics, Process & Deployment Defects, Product & Discovery Defects, Production Issues Discovered by Customers, Production Problems Are Discovered Hours or Days Late, Push-Based Work Assignment, Retrospectives, Rubber-Stamping AI-Generated Code, Coding and Review Agent Configuration, Agentic Architecture Patterns, Recommended Patterns for Agentic Workflow Architecture, Releases Are Infrequent and Painful, Releases Depend on One Person, Security Review Is a Gate, Not a Guardrail, Services in the Same Portfolio Have Wildly Different Maturity Levels, Services Reach Production with No Health Checks or Alerting, Small-Batch Agent Sessions, Testing Fundamentals, Staging Passes but Production Fails, Symptoms for Developers, TBD Migration Guide, Team Alignment to Code, Teams Cannot Change Their Own Pipeline Without Another Team, Test Doubles, Test Environments Take Too Long to Reset Between Runs, Test Suite Is Too Slow to Run, Tests Pass in One Environment but Fail in Another, Tests Randomly Pass or Fail, The Agentic Development Learning Curve, The Build Runs Again for Every Environment, The Deployment Target Does Not Support Modern CI/CD Tooling, The Development Workflow Has Friction at Every Step, Agent Delivery Contract, The Team Ignores Alerts Because There Are Too Many, The Team Is Afraid to Deploy, The Team Is Caught Between Shipping Fast and Not Breaking Things, The Team Resists Merging to the Main Branch, Thin-Spread Teams, Tightly Coupled Monolith, Tokenomics: Optimizing Token Usage in Agent Architecture, Vendor Release Cycles Constrain the Team’s Deployment Frequency, Work Requires Sign-Off from Teams Not Involved in Delivery, Your Migration Journey

Production-Like Environment

A test or staging environment that matches production in configuration, infrastructure, and data characteristics. Testing in environments that differ from production is a common source of deployment failures. See Production-Like Environments.

Referenced in: CD for Greenfield Projects, DORA Recommended Practices, FAQ, Hard-Coded Environment Assumptions, Pipeline Enforcement and Expert Agents, Pipeline Reference Architecture, Progressive Rollout, Stakeholders See Working Software Only at Release Time, TBD Migration Guide

Prompt

See Agentic CD Glossary.

Prompt Caching

See Agentic CD Glossary.

Prompt Craft

See Agentic CD Glossary.

Prompting Discipline

See Agentic CD Glossary.

Programmatic Agent

See Agentic CD Glossary.

R

Rollback

The ability to revert a production deployment to a previous known-good state. CD requires automated rollback that takes minutes, not hours. See Rollback.

Referenced in: CD Dependency Tree, CD for Greenfield Projects, Change Advisory Board Gates, Change Fail Rate, Data Pipelines and ML Models Have No Deployment Automation, Database Migrations Block or Break Deployments, Deployable Definition, Deployments Are One-Way Doors, Every Change Requires a Ticket and Approval Chain, Experience Reports, Feature Flags, Horizontal Slicing, Mean Time to Repair, Metrics-Driven Improvement, Missing Deployment Pipeline, No Deployment Health Checks, Phase 2: Pipeline, Pipeline Reference Architecture, Pitfalls and Metrics, Process & Deployment Defects, Production Problems Are Discovered Hours or Days Late, Progressive Rollout, Release Frequency, Releases Depend on One Person, Single Path to Production, Symptoms for Developers, Systemic Defect Fixes, TBD Migration Guide, The Team Is Caught Between Shipping Fast and Not Breaking Things, Tightly Coupled Monolith, Work Decomposition

Repository Readiness

See Agentic CD Glossary.

S

Skill (Agent)

See Agentic CD Glossary.

Soft Dependency

A dependency that can be worked around or deferred. Unlike hard dependencies, soft dependencies do not block work but may influence sequencing or design decisions. Feature flags can turn many hard dependencies into soft dependencies by allowing incomplete integrations to be deployed in a disabled state.

Specification Engineering

See Agentic CD Glossary.

Story Points

A relative estimation unit used by some teams to forecast effort. Story points are frequently misused as a productivity metric, which creates perverse incentives to inflate estimates and discourages the small work decomposition that CD requires. If your organization uses story points as a velocity target, see Metrics-Driven Improvement.

Referenced in: Leadership Sees CD as a Technical Nice-to-Have, Some Developers Are Overloaded While Others Wait for Work, Team Burnout and Unsustainable Pace, Velocity as Individual Metric

Sub-agent

See Agentic CD Glossary.

Subdomain Product Team

A team that owns a bounded subdomain within a larger distributed system - full-stack within their service (API, business logic, data store) but not directly user-facing. Their public interface is designed for machines: other services or teams consume it through a defined API contract. A vertical slice for a subdomain product team delivers one observable behavior through that contract. The slice is done when the API satisfies the agreed behavior for its service consumers. Contrast with full-stack product team.

Referenced in: Horizontal Slicing, Small Batches, Work Decomposition

System Prompt

See Agentic CD Glossary.

T

TBD (Trunk-Based Development)

A source-control branching model where all developers integrate to a single shared branch (trunk) at least once per day. Short-lived feature branches (less than a day) are acceptable. Long-lived feature branches are not. TBD is a prerequisite for CI, which is in turn a prerequisite for CD. See Trunk-Based Development.

Referenced in: Build Automation, CD Dependency Tree, CD for Greenfield Projects, Change & Complexity Defects, DORA Recommended Practices, FAQ, Feature Flags, Integration Frequency, Long-Lived Feature Branches, Metrics-Driven Improvement, Multiple Teams, Single Deployable, Phase 1: Foundations, Process & Deployment Defects, Retrospectives, Single Team, Single Deployable, TBD Migration Guide, Team Membership Changes Constantly, The Team Resists Merging to the Main Branch, Trunk-Based Development, Work Decomposition, Work in Progress, Work Items Take Days or Weeks to Complete, Working Agreements

TDD (Test-Driven Development)

See Testing Glossary.

Referenced in: Testing Fundamentals

Token

See Agentic CD Glossary.

Tokenomics

See Agentic CD Glossary.

Tool Use

See Agentic CD Glossary.

Toil

Repetitive, manual work related to maintaining a production service that is automatable, has no lasting value, and scales linearly with service size. Examples include manual deployments, manual environment provisioning, and manual test execution. Eliminating toil is a primary benefit of building a CD pipeline.

Referenced in: AI Adoption Roadmap, Architecture Decoupling, Build Duration, CD Dependency Tree, Change Advisory Board Gates, Deployable Definition, DORA Recommended Practices, Experience Reports, Feature Flags, Lead Time, Progressive Rollout, Tightly Coupled Monolith, Your Migration Journey

U

Unplanned Work

Work that arrives outside the planned backlog - production incidents, urgent bug fixes, ad hoc requests. High levels of unplanned work indicate systemic quality or operational problems. Teams with high change failure rates generate their own unplanned work through failed deployments. Reducing unplanned work is a natural outcome of improving change failure rate through CD practices.

Referenced in: Team Burnout and Unsustainable Pace, Thin-Spread Teams

V

Virtual Service

See Testing Glossary.

Referenced in: Test Environments Take Too Long to Reset Between Runs

Value Stream Map

A visual representation of every step required to deliver a change from request to production, showing process time, wait time, and percent complete and accurate at each step. The foundational tool for Phase 0 - Assess.

Referenced in: FAQ, Phase 0: Assess

Vertical Sliced Story

A user story that delivers a thin slice of functionality across all layers of the system (UI, API, database, etc.) rather than a horizontal slice that implements one layer completely. Vertical slices are independently deployable and testable, which is essential for CD. Vertical slicing is a core technique in Work Decomposition.

Referenced in: Agent-Assisted Specification, CD Dependency Tree, CD for Greenfield Projects, Horizontal Slicing, Long-Lived Feature Branches, Monolithic Work Items, Small Batches, Small-Batch Agent Sessions, Sprint Planning Is Dominated by Dependency Negotiation, Stakeholders See Working Software Only at Release Time

W

WIP (Work in Progress)

The number of work items that have been started but not yet completed. High WIP increases lead time, reduces focus, and increases context-switching overhead. Limiting WIP is a key practice in Phase 3 - Limiting WIP.

Referenced in: Architecture Decoupling, CD Dependency Tree, Development Cycle Time, DORA Recommended Practices, Everything Started, Nothing Finished, Experience Reports, Feature Flags, Metrics-Driven Improvement, Phase 3: Optimize, Pitfalls and Metrics, Push-Based Work Assignment, Retrospectives, Retrospectives Produce No Real Change, Small Batches, Symptoms for Managers, TBD Migration Guide, Team Burnout and Unsustainable Pace, Team Membership Changes Constantly, The Team Has No Shared Agreements About How to Work, Tokenomics: Optimizing Token Usage in Agent Architecture, Work Decomposition, Work in Progress, Working Agreements

White Box Testing

See Testing Glossary.

Working Agreement

An explicit, documented set of team norms covering how work is defined, reviewed, tested, and deployed. Working agreements create shared expectations and reduce friction. See Working Agreements.

Referenced in: AI Tooling Slows You Down Instead of Speeding You Up, Pull Requests Sit for Days Waiting for Review, Rubber-Stamping AI-Generated Code, The Team Has No Shared Agreements About How to Work

8 - FAQ

Frequently asked questions about continuous delivery and this migration guide.

About This Guide

Why does this migration guide exist?

Many teams say they want to adopt continuous delivery but do not know where to start. The CD landscape is full of tools, frameworks, and advice, but there is no clear, sequenced path from “we deploy monthly” to “we can deploy any change at any time.” This guide provides that path.

It is built on the MinimumCD definition of continuous delivery and draws on practices from the Dojo Consortium and the DORA research. The content is organized as a phased migration journey from your current state to continuous delivery rather than as a description of what CD looks like when you are already there.

Who is this guide for?

This guide is for development teams, tech leads, and engineering managers who want to improve their software delivery practices. It is designed for teams that are currently deploying infrequently (monthly, quarterly, or less) and want to reach a state where any change can be deployed to production at any time.

You do not need to be starting from zero. If your team already has CI in place, you can begin with Phase 2: Pipeline. If you have a pipeline but deploy infrequently, start with Phase 3: Optimize. Use the Phase 0 assessment to find your starting point.

Should we adopt this guide as an organization or as a team?

Start with a single team. CD adoption works best when a team can experiment, learn, and iterate without waiting for organizational consensus. Once one team demonstrates results (shorter lead times, lower change failure rate, more frequent deployments), other teams will have a concrete example to follow.

Organizational adoption comes after team adoption, not before. The role of organizational leadership is to create the conditions for teams to succeed: stable team composition, tool funding, policy flexibility for deployment processes, and protection from pressure to cut corners on quality.

How do we use this guide for improvement?

Start with Phase 0: Assess. Map your value stream, measure your current performance, and identify your top constraints. Then work through the phases in order, focusing on one constraint at a time.

The guide is not a checklist to complete in sequence. It is a reference that helps you decide what to work on next. Some teams will spend months in Phase 1 building testing fundamentals. Others will move quickly to Phase 2 because they already have strong development practices. Your value stream map and metrics tell you where to invest.

Revisit your assessment periodically. As you improve, new constraints will emerge. The phases give you a framework for addressing them.

Continuous Delivery Concepts

What is the difference between continuous delivery and continuous deployment?

Continuous delivery means every change to the codebase is always in a deployable state and can be released to production at any time through a fully automated pipeline. The decision to deploy may still be made by a human, but the capability to deploy is always present.

Continuous deployment is an extension of continuous delivery where every change that passes the automated pipeline is deployed to production without manual intervention.

This migration guide takes you through continuous delivery (Phases 0-3) and then to continuous deployment (Phase 4). Continuous delivery is the prerequisite. You cannot safely automate deployment decisions until your pipeline reliably determines what is deployable.

Is continuous delivery the same as having a CD pipeline?

No. Many teams have a CD pipeline tool (Jenkins, GitHub Actions, GitLab CI, etc.) but are not practicing continuous delivery. A pipeline tool is necessary but not sufficient. Continuous delivery also requires trunk-based development, comprehensive test automation, a single path to production, immutable artifacts, and the ability to deploy any green build. If your team has a pipeline but uses long-lived feature branches, deploys only at the end of a sprint, or requires manual testing before a release, you have a pipeline tool but you are not practicing continuous delivery. The current-state checklist in Phase 0 helps you assess the gap.

What does “the pipeline is the only path to production” mean?

It means there is exactly one way for any change to reach production: through the automated pipeline. No one can SSH into a server and make a change. No one can skip the test suite for an “urgent” fix. No one can deploy from their local machine.

This constraint is what gives you confidence. If every change in production has been through the same build, test, and deployment process, you know what is running and how it got there. If exceptions are allowed, you lose that guarantee, and your ability to reason about production state degrades.

During your migration, establishing this single path is a key milestone in Phase 2.

What does “application configuration” mean in the context of CD?

Application configuration refers to values that change between environments but are not part of the application code: database connection strings, API endpoints, feature flag states, logging levels, and similar settings.

In a CD pipeline, configuration is externalized. It lives outside the artifact and is injected at deployment time. This is what makes immutable artifacts possible. You build the artifact once and deploy it to any environment by providing the appropriate configuration.

If configuration is embedded in the artifact (for example, hardcoded URLs or environment-specific config files baked into a container image), you must rebuild the artifact for each environment, which means the artifact you tested is not the artifact you deploy. This breaks the immutability guarantee. See Application Config.

What is an “immutable artifact” and why does it matter?

An immutable artifact is a build output (container image, binary, package) that is never modified after it is created. The exact artifact that passes your test suite is the exact artifact that is deployed to staging, and then to production. Nothing is recompiled, repackaged, or patched between environments.

This matters because it eliminates an entire category of deployment failures: “it worked in staging but not in production” caused by differences in the build. If the same bytes are deployed everywhere, build-related discrepancies are impossible.

Immutability requires externalizing configuration (see above) and storing artifacts in a registry or repository. See Immutable Artifacts.

What does “deployable” mean?

A change is deployable when it has passed all automated quality gates defined in the pipeline. The definition is codified in the pipeline itself, not decided by a person at deployment time.

A typical deployable definition includes:

  • All unit tests pass
  • All integration tests pass
  • All acceptance tests pass
  • Static analysis checks pass (linting, security scanning)
  • The artifact is built and stored in the artifact registry
  • Deployment to a production-like environment succeeds
  • Smoke tests in the production-like environment pass

If any of these gates fail, the change is not deployable. The pipeline makes this determination automatically and consistently. See Deployable Definition.

What is the difference between deployment and release?

Deployment is the act of putting code into a production environment.

Release is the act of making functionality available to users.

These are different events, and decoupling them is one of the most powerful techniques in CD. You can deploy code to production without releasing it to users by using feature flags. The code is running in production, but the new functionality is disabled. When you are ready, you enable the flag and the feature is released.

This decoupling is important because it separates the technical risk (will the deployment succeed?) from the business risk (will users like the feature?). You can manage each risk independently. Deployments become routine technical events. Releases become deliberate business decisions.

Migration Questions

How long does the migration take?

It depends on where you start and how much organizational support you have. As a rough guide:

  • Phase 0 (Assess): 1-2 weeks
  • Phase 1 (Foundations): 1-6 months, depending on current testing and TBD maturity
  • Phase 2 (Pipeline): 1-3 months
  • Phase 3 (Optimize): 2-6 months
  • Phase 4 (Deliver on Demand): 1-3 months

These ranges assume a single team working on the migration alongside regular delivery work. The biggest variable is Phase 1: teams with no test automation or TBD practice will spend longer building foundations than teams that already have these in place.

Do not treat these timelines as commitments. The migration is an iterative improvement process, not a project with a deadline.

Do we stop delivering features during the migration?

No. The migration is done alongside regular delivery work, not instead of it. Each migration practice is adopted incrementally: you do not stop the world to rewrite your test suite or redesign your pipeline.

For example, in Phase 1 you adopt trunk-based development by reducing branch lifetimes gradually: from two weeks to one week to two days to same-day. You add automated tests incrementally, starting with the highest-risk code paths. You decompose work into smaller stories one sprint at a time.

The migration practices themselves improve your delivery speed, so the investment pays off as you go. Teams that have completed Phase 1 typically report delivering features faster than before, not slower.

What if our organization requires manual change approval (CAB)?

Many organizations have Change Advisory Board (CAB) processes that require manual approval before production deployments. This is one of the most common organizational blockers for CD. The path forward is to replace the manual approval with automated evidence: a mature CD pipeline provides stronger safety guarantees than a committee meeting, and your DORA metrics can demonstrate this. Most CAB processes were designed for monthly releases with hundreds of changes per batch; when you deploy daily with one or two changes, the risk profile is fundamentally different. See CAB Gates for a detailed approach to this transition.

What if we have a monolithic architecture?

You can practice continuous delivery with a monolith. CD does not require microservices. Many of the highest-performing teams in the DORA research deploy monolithic applications multiple times per day.

What matters is that your architecture supports independent testing and deployment. A well-structured monolith with a comprehensive test suite and a reliable pipeline can achieve CD. A poorly structured collection of microservices with shared databases and coordinated releases cannot.

Architecture decoupling is addressed in Phase 3, but it is about enabling independent deployment and reducing coordination costs, not about adopting any particular architectural style.

What if our tests are slow or unreliable?

This is one of the most common starting conditions. A slow or flaky test suite undermines every CD practice: developers stop trusting the tests, broken builds are ignored, and the pipeline becomes a bottleneck rather than an enabler. The fix is incremental: quarantine flaky tests, parallelize execution, rebalance toward fast unit tests, and set a pipeline time budget (under 10 minutes). See Testing Fundamentals and the Testing reference section for detailed guidance.

Where do I start if I am not sure which phase applies to us?

Start with Phase 0: Assess. Complete the value stream mapping exercise, take baseline metrics, and fill out the current-state checklist. These activities will tell you exactly where you stand and which phase to begin with.

If you do not have time for a full assessment, ask yourself these questions:

  • Do all developers integrate to trunk at least daily? If no, start with Phase 1.
  • Do you have a single automated pipeline that every change goes through? If no, start with Phase 2.
  • Can you deploy any green build to production on demand? If no, focus on the gap between your current state and Phase 2 completion criteria.
  • Do you deploy at least weekly? If no, look at Phase 3 for batch size and flow optimization.

Is CD about speed or quality?

Quality. The purpose of the pipeline is to validate that an artifact is production-worthy or reject it. Do not chase daily deployments without first building confidence in your ability to detect failure. Move validation as close to the developer as possible: run it on the desktop, run it again on merge to trunk, run it again when the trunk changes.

Testing is not limited to component tests. You need to test for security, compliance, performance, and everything else required in your context. Set error budgets and do not exceed them. When your error budget is spent, stop shipping features and invest in pipeline hardening. When something breaks in production, harden the pipeline. When exploratory testing uncovers an edge case, harden the pipeline. The primary goal is to build efficient and effective quality gates. Only then can you move quickly.

9 - Resources

Books, videos, and further reading on continuous delivery and deployment.

This page collects the books, websites, and videos that inform the practices in this migration guide. Resources are organized by topic and annotated with which migration phase they are most relevant to.

Books

Continuous Delivery and Deployment

Modern Software Engineering by Dave Farley
Farley’s broader take on what it means to do software engineering well. Covers the principles behind CD - iterating toward a goal, getting fast feedback, working in small steps - and connects them to test-driven development, managing complexity, and designing for testability. Useful for teams that want to understand the why behind CD practices, not just the how.
Most relevant to: All phases
Continuous Delivery Pipelines by Dave Farley
A practical, focused guide to building CD pipelines. Farley covers pipeline design, testing strategies, and deployment patterns in a direct, implementation-oriented style. Start here if you want a concise guide to the pipeline practices in Phase 2.
Most relevant to: Phase 2: Pipeline
Continuous Delivery by Jez Humble and Dave Farley
The foundational text on CD. Published in 2010, it remains the most comprehensive treatment of the principles and practices that make continuous delivery work. Covers version control patterns, build automation, testing strategies, deployment pipelines, and release management. If you read one book before starting your migration, read this one.
Most relevant to: All phases
Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim
Presents the DORA research findings that link technical practices to organizational performance. Covers the four key metrics (deployment frequency, lead time, change failure rate, MTTR) and the capabilities that predict high performance. Essential reading for anyone who needs to make the business case for a CD migration.
Most relevant to: Phase 0: Assess and Phase 3: Metrics-Driven Improvement
Engineering the Digital Transformation by Gary Gruver
Addresses the organizational and leadership challenges of large-scale delivery transformation. Gruver draws on his experience leading transformations at HP and other large enterprises. Particularly valuable for leaders sponsoring a migration who need to understand the change management, communication, and sequencing challenges ahead.
Most relevant to: Organizational leadership across all phases
Release It! by Michael T. Nygard
Covers the design and architecture patterns that make production systems resilient. Topics include stability patterns (circuit breakers, bulkheads, timeouts), deployment patterns, and the operational realities of running software at scale. Essential reading before entering Phase 4, where the team has the capability to deploy any change on demand.
Most relevant to: Phase 4: Deliver on Demand and Phase 2: Rollback
The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis
A practical companion to The Phoenix Project. Covers the Three Ways (flow, feedback, and continuous learning) and provides detailed guidance on implementing DevOps practices. Useful as a reference throughout the migration.
Most relevant to: All phases
The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford
A novel that illustrates DevOps principles through the story of a fictional IT organization in crisis. Useful for building organizational understanding of why delivery improvement matters, especially for stakeholders who will not read a technical book.
Most relevant to: Building organizational buy-in during Phase 0

Testing

Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce
The definitive guide to test-driven development in practice. Goes beyond unit testing to cover acceptance testing, test doubles, and how TDD drives design. Essential reading for Phase 1 testing fundamentals.
Most relevant to: Phase 1: Testing Fundamentals
Working Effectively with Legacy Code by Michael Feathers
Practical techniques for adding tests to untested code, breaking dependencies, and incrementally improving code that was not designed for testability. Indispensable if your migration starts with a codebase that has little or no automated testing.
Most relevant to: Phase 1: Testing Fundamentals

Work Decomposition and Flow

User Story Mapping by Jeff Patton
A practical guide to breaking features into deliverable increments using story maps. Patton’s approach directly supports the vertical slicing discipline required for small batch delivery.
Most relevant to: Phase 1: Work Decomposition
The Principles of Product Development Flow by Donald Reinertsen
A rigorous treatment of flow economics in product development. Covers queue theory, batch size economics, WIP limits, and the cost of delay. Dense but transformative. Reading this book will change how you think about every aspect of your delivery process.
Most relevant to: Phase 3: Optimize
Making Work Visible by Dominica DeGrandis
Focuses on identifying and eliminating the “time thieves” that steal productivity: too much WIP, unknown dependencies, unplanned work, conflicting priorities, and neglected work. A practical companion to the WIP limiting practices in Phase 3.
Most relevant to: Phase 3: Limiting WIP

Databases

Refactoring Databases: Evolutionary Database Design by Scott Ambler and Pramod Sadalage
The definitive guide to managing database schema changes incrementally. Covers expand-contract migrations, backward-compatible schema changes, and techniques for evolving databases without downtime. Essential reading for teams whose deployment pipeline includes database changes.
Most relevant to: Phase 2: Pipeline and Phase 3: Small Batches

Architecture

Building Microservices by Sam Newman
Covers the architectural patterns that enable independent deployment, including service boundaries, API design, data management, and testing strategies for distributed systems.
Most relevant to: Phase 3: Architecture Decoupling
Team Topologies by Matthew Skelton and Manuel Pais
Addresses the relationship between team structure and software architecture (Conway’s Law in practice). Covers team types, interaction modes, and how to evolve team structures to support fast flow. Valuable for addressing the organizational blockers that surface throughout the migration.
Most relevant to: Organizational design across all phases

Websites

MinimumCD.org
Defines the minimum set of practices required to claim you are doing continuous delivery. This migration guide uses the MinimumCD definition as its target state. Start here to understand what CD actually requires.
Dojo Consortium
A community-maintained collection of CD practices, metrics definitions, and improvement patterns. Many of the definitions and frameworks in this guide are adapted from the Dojo Consortium’s work.
DORA (dora.dev)
The DevOps Research and Assessment site, which publishes the annual State of DevOps report and provides resources for measuring and improving delivery performance.
Trunk-Based Development
The comprehensive reference for trunk-based development patterns. Covers short-lived feature branches, feature flags, branch by abstraction, and release branching strategies.
Martin Fowler’s blog (martinfowler.com)
Martin Fowler’s site contains authoritative articles on continuous integration, continuous delivery, microservices, refactoring, and software design. Key articles include “Continuous Integration” and “Continuous Delivery.”
Google Cloud Architecture Center: DevOps
Google’s public documentation of the DORA capabilities, including self-assessment tools and implementation guidance.

Videos

“Modern Software Engineering” by Dave Farley (YouTube channel)
Dave Farley’s YouTube channel provides weekly videos covering CD practices, pipeline design, testing strategies, and software engineering principles. Accessible and practical.
Most relevant to: All phases
“Continuous Delivery” by Jez Humble (various conference talks)
Jez Humble’s conference presentations cover the principles and research behind CD. His talk “Why Continuous Delivery?” is an excellent introduction for teams and stakeholders who are new to the concept.
Most relevant to: Building understanding during Phase 0
“Refactoring” and “TDD” talks by Martin Fowler and Kent Beck
Foundational talks on the development practices that support CD. Understanding TDD and refactoring is essential for Phase 1 testing fundamentals.
Most relevant to: Phase 1: Foundations
“The Smallest Thing That Could Possibly Work” by Bryan Finster
Covers the work decomposition and small batch delivery practices that are central to this migration guide. Focuses on practical techniques for breaking work into vertical slices.
Most relevant to: Phase 1: Work Decomposition and Phase 3: Small Batches
“Real Example of a Deployment Pipeline in the Fintech Industry” by Dave Farley
A concrete walkthrough of a production deployment pipeline in a regulated financial services environment. Demonstrates that CD practices are compatible with compliance requirements.
Most relevant to: Phase 2: Pipeline

Blog Posts and Articles

Continuous Integration Certification by Martin Fowler
A short, practical test for whether your team is actually practicing continuous integration. Useful as a self-assessment during Phase 1.
Most relevant to: Phase 1: Foundations
Continuous Delivery: Anatomy of the Deployment Pipeline by Dave Farley
An article-length overview of deployment pipeline structure, covering commit stage, acceptance testing, and release stages. A good companion to the pipeline phase of this guide.
Most relevant to: Phase 2: Pipeline

If you are starting your migration and want to read in the most useful order:

  1. Accelerate, to understand the research and build the business case
  2. Continuous Delivery (Humble & Farley), to understand the full picture
  3. Continuous Delivery Pipelines (Farley), for practical pipeline implementation
  4. Working Effectively with Legacy Code, if your codebase lacks tests
  5. The Principles of Product Development Flow, to understand flow optimization
  6. Release It!, before moving to continuous deployment