Reference
Practice definitions, metrics, glossary, and other reference material.
Look up definitions, check metrics, or find resources for deeper reading.
Sections
1 - Pipeline Reference Architecture
Pipeline reference architectures for single-team, multi-team, and distributed service delivery, with quality gates sequenced by defect detection priority.
This section defines quality gates sequenced by defect detection priority and three
pipeline patterns that apply them. Quality gates are derived from the
Systemic Defect Fixes catalog and sequenced so the cheapest, fastest
checks run first.
Gates marked with [Pre-Feature] must be in place and passing before any new feature
work begins. They form the baseline safety net that every commit runs through. Adding
features without these gates means defects accumulate faster than the team can detect them.
Gates marked with ▲ are enhanced by AI - the AI shifts
detection earlier or catches issues that rule-based tools miss. See the
Systemic Defect Fixes catalog for details.
Quality Gates in Priority Sequence
The gate sequence follows a single principle: fail fast, fail cheap. Gates that catch
the most common defects with the least execution time run first. Each gate listed below
maps to one or more defect sources from the catalog.
Pre-commit Gates
These run on the developer’s machine before code leaves the workstation. They provide
sub-second to sub-minute feedback.
| Gate | Defect Sources Addressed | Catalog Section | Pre-Feature |
|---|
| Linting and formatting | Code style consistency, preventable review noise | Process & Deployment | Required |
| Static type checking | Null/missing data assumptions, type mismatches | Data & State | Required |
| Secret scanning | Secrets committed to source control | Security & Compliance | Required |
| SAST (injection patterns) | Injection vulnerabilities, taint analysis | Security & Compliance | Required |
| Race condition detection | Race conditions (thread sanitizers, where language supports it) | Integration & Boundaries | |
| Accessibility linting | Missing alt text, ARIA violations, contrast failures | Product & Discovery | |
| Solitary and sociable unit tests | Logic errors, unintended side effects, edge cases | Change & Complexity | Required |
| Contract tests | Interface mismatches, wrong assumptions about external system boundaries | Integration & Boundaries | Required |
| Timeout enforcement checks | Missing timeout and deadline enforcement | Performance & Resilience | |
| ▲ AI semantic code review | Logic errors, missing edge cases, subtle injection vectors beyond pattern matching | Process & Deployment, Security & Compliance | |
CI Stage 1: Build and Fast Tests < 5 min
These run on every commit to trunk.
| Gate | Defect Sources Addressed | Catalog Section | Pre-Feature |
|---|
| All pre-commit gates | Re-run in CI to catch anything bypassed locally | See Pre-commit Gates | Required |
| Compilation / build | Build reproducibility, dependency resolution | Dependency & Infrastructure | Required |
| Dependency vulnerability scan (SCA) | Known vulnerabilities in dependencies | Security & Compliance | Required |
| License compliance scan | License compliance violations | Security & Compliance | |
| Code complexity and duplication scoring | Accumulated technical debt | Change & Complexity | |
| ▲ AI change impact analysis | Semantic blast radius of changes; unintended side effects beyond syntactic dependencies | Change & Complexity | |
| ▲ AI vulnerability reachability analysis | Correlate CVEs with actual code usage paths to prioritize exploitable risks over theoretical ones | Security & Compliance | |
| Stage duration warning | Warn if Stage 1 exceeds 10 minutes; slow fast-feedback loops mask defects and delay trunk integration | Process & Deployment | |
CD Stage 1: Contract and Boundary Validation < 10 min
These validate boundaries between components.
CD Stage 2: Broader Automated Verification < 15 min
These run in parallel where possible.
Acceptance Tests < 20 min
These validate user-facing behavior in a production-like environment.
Out-of-Pipeline Verification
The following checks are non-deterministic - they depend on live environments, external
systems, or real user behavior - and cannot be made into blocking pipeline gates without
coupling your ability to deploy to factors outside your control. They run asynchronously
or post-deployment and back up the deterministic pipeline with a continuous safety net.
Failures trigger review, alerts, or rollback decisions. They never block a commit from
reaching production.
Integration Tests (Post-Deploy)
Integration tests validate that the
test doubles used in
contract tests still match the real services
they simulate. They are non-deterministic because they exercise real service boundaries
and their results depend on the current state of those services. They run on a schedule
or post-deployment - not on every commit - and failures trigger review, not a
pipeline block.
| Check | Defect Sources Addressed | Catalog Section | Pre-Feature |
|---|
| Provider verification | Interface drift between contract test doubles and real services | Integration & Boundaries | Required |
| Cross-service integration validation | Breaking changes at real service boundaries | Integration & Boundaries | Required |
| ▲ AI boundary coverage analysis | Integration boundaries missing contract tests; semantic service relationship mapping | Testing & Observability Gaps | |
| ▲ AI behavioral assumption detection | Undocumented assumptions at service boundaries that contract tests don’t cover | Integration & Boundaries | |
Production Verification
These run during and after deployment. They are not optional - they close the feedback loop.
Pre-Feature Baseline
These gates must be active before starting feature work
Without these gates passing on every commit to trunk, defects accumulate faster than the
team can detect them. If any are missing, add them before writing new features. The
Foundations phase covers how to establish
this baseline.
- Linting and formatting
- Static type checking
- Secret scanning
- SAST for injection patterns
- Compilation / build
- Solitary and sociable unit tests
- Contract tests at every integration boundary
- Dependency vulnerability scan
- Schema migration validation
Pipeline Patterns
These three patterns apply the quality gates above to progressively more complex team
and deployment topologies. Most organizations start with Pattern 1 and evolve toward
Pattern 3 as team count and deployment independence requirements grow.
- Single Team, Single Deployable - one team owns one
modular monolith with a linear pipeline
- Multiple Teams, Single Deployable - multiple teams own
sub-domain modules within a shared modular monolith, each with its own sub-pipeline
feeding a thin integration pipeline
- Independent Teams, Independent Deployables - each team
owns an independently deployable service with its own full pipeline and API contract
verification
Mapping to the Defect Sources Catalog
Each quality gate above is derived from the Systemic Defect Fixes
catalog. The catalog organizes defects by origin - product and discovery, integration,
knowledge, change and complexity, testing gaps, process, data, dependencies, security, and
performance. The pipeline gates are the automated enforcement points for the systemic
prevention strategies described in the catalog.
Gates marked with ▲ correspond to catalog entries where AI
shifts detection earlier than current rule-based automation. For expert agent patterns that
implement these gates in an agentic CD context, see
ACD Pipeline Enforcement.
When adding or removing gates, consult the catalog to ensure that no defect category loses
its detection point. A gate that seems redundant may be the only automated check for a
specific defect source.
Further Reading
For a deeper treatment of pipeline design, stage sequencing, and deployment strategies, see
Dave Farley’s
Continuous Delivery Pipelines which covers pipeline
architecture patterns in detail.
Related Content
1.1 - Single Team, Single Deployable
A linear pipeline pattern for a single team owning a modular monolith.
This architecture suits a team of up to 8-10 people owning a
modular monolith - a single deployable
application with well-defined internal module boundaries. The codebase is organized by
domain, not by technical layer. Each module encapsulates its own data, logic, and
interfaces, communicating with other modules through explicit internal APIs. The
application deploys as one unit, but its internal structure makes it possible to reason
about, test, and change one module without understanding the entire codebase. The pipeline
is linear with parallel stages where dependencies allow.
Pre-Feature Gate
CI Stage
Parallel Verification
Acceptance
Production
graph TD
classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
classDef ci fill:#224968,stroke:#1a3a54,color:#fff
classDef parallel fill:#30648e,stroke:#224968,color:#fff
classDef accept fill:#6c757d,stroke:#565e64,color:#fff
classDef prod fill:#a63123,stroke:#8a2518,color:#fff
A["Pre-commit Gates<br/><small>Lint, Types, Secrets, SAST</small>"]:::prefeature
B["Build + Unit Tests"]:::prefeature
C["Contract + Schema Tests"]:::prefeature
D["Security Scans"]:::parallel
E["Performance Benchmarks"]:::parallel
F["Acceptance Tests<br/><small>Production-Like Env</small>"]:::accept
G["Create Immutable Artifact"]:::ci
H["Deploy Canary / Progressive"]:::prod
I["Health Checks + SLO Monitors<br/>Auto-Rollback"]:::prod
A -->|"commit to trunk"| B
B --> C
C --> D & E
D --> F
E --> F
F --> G
G --> H
H --> IKey Characteristics
- One pipeline, one artifact: The entire application builds and deploys as a single
immutable artifact. There is no fan-out or fan-in.
- Linear with parallel branches: Security scans and performance benchmarks run in
parallel because neither depends on the other. Everything else is sequential.
- Trunk-based development: All developers commit to trunk at least daily. The pipeline
runs on every commit.
- Total target time: Under 15 minutes from commit to production-ready artifact.
Acceptance tests may extend this to 20 minutes for complex applications.
- Ownership: The team owns the pipeline definition, which lives in the same repository
as the application code.
When This Architecture Breaks Down
This architecture stops working when:
- The system becomes too large for a single team to manage.
- Build times extend along with the ability to respond quickly even after optimization
- Different parts of the application need different deployment cadences
When these symptoms appear, consider splitting into the
multi-team architecture or decomposing the application into
independently deployable services with their
own pipelines.
Related Content
1.2 - Multiple Teams, Single Deployable
A sub-pipeline pattern for multiple teams contributing domain modules to a shared modular monolith.
This architecture suits organizations where multiple teams contribute to a single
deployable modular monolith - a common
pattern for large applications, mobile apps, or platforms where the final artifact must
be assembled from team contributions.
The modular monolith structure is what makes multi-team ownership possible. Each team
owns a specific module representing a bounded sub-domain of the application. Team A
might own checkout and payments, Team B owns inventory and fulfillment, Team C owns
user accounts and authentication. Modules communicate through explicit internal APIs,
not by reaching into each other’s database tables or calling private methods. Each
team’s sub-pipeline validates only their module. A shared integration pipeline assembles
and verifies the combined result.
This ownership model is critical. Without clear module boundaries, teams step on each
other’s code, sub-pipelines trigger on unrelated changes, and merge conflicts replace
pipeline contention as the bottleneck. The module split must follow the application’s
domain boundaries, not its technical layers. A team that owns “the database layer” or
“the API controllers” will always be coupled to every other team. A team that owns
“payments” can change its database, API, and UI independently. If the codebase is not
yet structured as a modular monolith, restructure it before adopting this architecture
- otherwise the sub-pipelines will constantly interfere with each other.
graph TD
classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
classDef team fill:#224968,stroke:#1a3a54,color:#fff
classDef integration fill:#30648e,stroke:#224968,color:#fff
classDef prod fill:#a63123,stroke:#8a2518,color:#fff
subgraph teamA ["Payments Sub-Domain (Team A)"]
A1["Pre-commit Gates"]:::prefeature
A2["Build + Unit Tests"]:::prefeature
A3["Contract Tests"]:::prefeature
A4["Security + Perf"]:::team
A1 --> A2 --> A3 --> A4
end
subgraph teamB ["Inventory Sub-Domain (Team B)"]
B1["Pre-commit Gates"]:::prefeature
B2["Build + Unit Tests"]:::prefeature
B3["Contract Tests"]:::prefeature
B4["Security + Perf"]:::team
B1 --> B2 --> B3 --> B4
end
subgraph teamC ["Accounts Sub-Domain (Team C)"]
C1["Pre-commit Gates"]:::prefeature
C2["Build + Unit Tests"]:::prefeature
C3["Contract Tests"]:::prefeature
C4["Security + Perf"]:::team
C1 --> C2 --> C3 --> C4
end
subgraph integ ["Integration Pipeline"]
I1["Assemble Combined Artifact"]:::integration
I2["Integration Contract Tests"]:::integration
I3["Acceptance Tests<br/><small>Production-Like Env</small>"]:::integration
I4["Create Immutable Artifact"]:::integration
I1 --> I2 --> I3 --> I4
end
A4 --> I1
B4 --> I1
C4 --> I1
I4 --> D1["Deploy Canary / Progressive"]:::prod
D1 --> D2["Health Checks + SLO Monitors<br/>Auto-Rollback"]:::prodKey Characteristics
- Module ownership by domain: Each team owns a bounded module of the application’s
functionality. Ownership is defined by domain, not by technical layer. The team is
responsible for all code, tests, and pipeline configuration within their module.
- Team-owned sub-pipelines: Each team runs their own pre-commit, build, unit test,
contract test, and security gates independently. A team’s sub-pipeline validates only
their module and is their fast feedback loop.
- Contract tests at both levels: Teams run contract tests in their sub-pipeline to
catch boundary issues at the module edges. The integration pipeline runs cross-module
contract tests to verify the assembled result.
- Integration pipeline is thin: The integration pipeline does not re-run each team’s
tests. It validates only what cannot be validated in isolation - cross-module
integration, the assembled artifact, and end-to-end acceptance tests.
- Sub-pipeline target time: Under 10 minutes. This is the team’s primary feedback loop
and must stay fast.
- Integration pipeline target time: Under 15 minutes. If it grows beyond this, the
integration test suite needs decomposition or the application needs architectural changes
to enable independent deployment.
- Trunk-based development with path filters: All teams commit to the same trunk.
Sub-pipelines trigger based on path filters aligned to module boundaries, so a
change to the payments module does not trigger the inventory sub-pipeline.
Preventing the Integration Pipeline from Becoming a Bottleneck
The integration pipeline is a shared resource and the most likely bottleneck in this
architecture. To keep it fast:
- Move tests left into sub-pipelines: Every test that can run in a sub-pipeline should
run there. The integration pipeline should only contain tests that require the full
assembled artifact.
- Use contract tests aggressively: Contract tests in sub-pipelines catch most
integration issues without needing the full system. The integration pipeline’s contract
tests are a verification layer, not the primary detection point.
- Run the integration pipeline on every commit to trunk: Do not batch. Batching
creates large changesets that are harder to debug when they fail.
- Parallelize acceptance tests: Group acceptance tests by feature area and run groups
in parallel.
- Monitor integration pipeline duration: Set an alert if it exceeds 15 minutes. Treat
this the same as a failing test - fix it immediately.
When to Move Away from This Architecture
This architecture is a pragmatic pattern for organizations that cannot yet decompose their
monolith into independently deployable services. The long-term goal is
loose coupling -
independent services with independent pipelines that do not need a shared integration step.
Signs you are ready to decompose:
- Contract tests catch virtually all integration issues in sub-pipelines
- The integration pipeline adds little value beyond what sub-pipelines already verify
- Teams are blocked by integration pipeline queuing more than once per week
- Different parts of the application need different deployment cadences
Related Content
1.3 - Independent Teams, Independent Deployables
A fully independent pipeline pattern for teams deploying their own services in any order, with API contract verification replacing integration testing.
This is the target architecture for continuous delivery at scale. Each team owns an
independently deployable service with its own pipeline, its own release cadence, and
its own path to production. No team waits for another team to deploy. No integration
pipeline serializes their work. The only shared infrastructure is the API contract
layer that defines how services communicate.
This architecture demands disciplined API management. Without it, independent deployment
is an illusion - teams deploy whenever they want, but they break each other constantly.
graph TD
classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
classDef team fill:#224968,stroke:#1a3a54,color:#fff
classDef contract fill:#30648e,stroke:#224968,color:#fff
classDef prod fill:#a63123,stroke:#8a2518,color:#fff
classDef api fill:#6c757d,stroke:#565e64,color:#fff
subgraph svcA ["Service A Pipeline (Team A)"]
A1["Pre-commit Gates"]:::prefeature
A2["Build + Unit Tests"]:::prefeature
A3["Contract<br/>Verification"]:::prefeature
A4["Security + Perf"]:::team
A5["Acceptance Tests"]:::team
A6["Create Immutable Artifact"]:::team
A1 --> A2 --> A3 --> A4 --> A5 --> A6
end
subgraph svcB ["Service B Pipeline (Team B)"]
B1["Pre-commit Gates"]:::prefeature
B2["Build + Unit Tests"]:::prefeature
B3["Contract<br/>Verification"]:::prefeature
B4["Security + Perf"]:::team
B5["Acceptance Tests"]:::team
B6["Create Immutable Artifact"]:::team
B1 --> B2 --> B3 --> B4 --> B5 --> B6
end
subgraph svcC ["Service C Pipeline (Team C)"]
C1["Pre-commit Gates"]:::prefeature
C2["Build + Unit Tests"]:::prefeature
C3["Contract<br/>Verification"]:::prefeature
C4["Security + Perf"]:::team
C5["Acceptance Tests"]:::team
C6["Create Immutable Artifact"]:::team
C1 --> C2 --> C3 --> C4 --> C5 --> C6
end
subgraph apis ["API Schema Registry"]
R1["Published API Schemas<br/><small>OpenAPI, AsyncAPI, Protobuf</small>"]:::api
R2["Backward Compatibility<br/>Checks"]:::api
R3["Consumer Pacts<br/><small>where available</small>"]:::api
R1 --- R2 --- R3
end
A3 <-..->|"verify"| R3
B3 <-..->|"verify"| R3
C3 <-..->|"verify"| R3
A6 --> A7["Deploy + Canary"]:::prod
A7 --> A8["Health + SLOs"]:::prod
B6 --> B7["Deploy + Canary"]:::prod
B7 --> B8["Health + SLOs"]:::prod
C6 --> C7["Deploy + Canary"]:::prod
C7 --> C8["Health + SLOs"]:::prodPre-Feature Gate
Team Pipeline
API Schema Registry
Production
Key Characteristics
- Fully independent deployment: Each team deploys on its own schedule. Team A can
deploy ten times a day while Team C deploys once a week. No coordination is required.
- No shared integration pipeline: There is no fan-in step. Each pipeline goes
straight from artifact creation to production. This eliminates the integration bottleneck
entirely.
- Contract tests replace integration tests: Instead of testing all services together,
each team verifies its API contracts independently. The level of contract verification
depends on how much coordination is possible between teams (see
contract verification approaches below).
- Each team owns its full pipeline: From pre-commit to production monitoring. No
shared pipeline definitions, no central platform team gating deployments.
Why API Management Is Critical
Independent deployment only works when teams can change their service without breaking
others. This requires a shared understanding of API boundaries that is enforced
automatically, not through meetings or documents that drift.
Without API management, independent pipelines create independent failures. Teams
deploy incompatible changes, discover the breakage in production, and revert to
coordinated releases to stop the bleeding. This is worse than the multi-team architecture
because it creates the illusion of independence while delivering the reliability of chaos.
What API Management Requires
Published API schemas: Every service publishes its API contract (OpenAPI, AsyncAPI,
Protobuf, or equivalent) as a versioned artifact. The schema is the source of truth for
what the service provides.
Contract verification (see approaches below):
At minimum, providers verify backward compatibility against their own published schema.
Where cross-team coordination is feasible, consumer-driven contracts add stronger
guarantees.
Backward compatibility enforcement: Every API change is checked for backward
compatibility against the published schema. Breaking changes require a new API version
using the expand-then-contract pattern:
- Deploy the new version alongside the old
- Migrate consumers to the new version
- Remove the old version only after all consumers have migrated
Schema registry: A central registry (Confluent Schema Registry, a simple artifact
repository, or a Pact Broker where consumer-driven contracts are used) stores published
schemas. Pipelines pull from this registry to run compatibility checks. The registry is
shared infrastructure, but it does not gate deployments - it provides data that each
team’s pipeline uses to make its own go/no-go decision.
API versioning strategy: Teams agree on a versioning convention (URL path versioning,
header versioning, or semantic versioning for message schemas) and enforce it through
pipeline gates. The convention must be simple enough that every team follows it without
deliberation.
Contract Verification Approaches
Not all teams can coordinate on shared contract tooling. The right approach depends on
the relationship between provider and consumer teams. These approaches are listed from
least to most coordination required. Use the strongest approach your context supports.
| Approach | How It Works | Coordination Required | Best When |
|---|
| Provider schema compatibility | Provider’s pipeline checks every change for backward compatibility against its own published schema (e.g., OpenAPI diff). No consumer involvement needed. | None between teams | Teams are in different organizations, or consumers are external/unknown |
| Provider-maintained consumer tests | Provider team writes tests that exercise known consumer usage patterns based on API analytics, documentation, or past breakage. | Minimal - provider observes consumers | Provider can see consumer traffic patterns but cannot require consumer participation |
| Consumer-driven contracts | Consumers publish pacts describing the subset of the provider API they depend on. Provider runs these pacts in its pipeline. See Contract Tests. | High - shared tooling, broker, and agreement to maintain pacts | Teams are in the same organization with shared tooling and willingness to maintain pacts |
Most organizations use a mix. Internal teams with shared tooling can adopt consumer-driven
contracts. Teams consuming third-party or cross-organization APIs use provider schema
compatibility checks and provider-maintained consumer tests.
The critical requirement is not which approach you use but that every provider pipeline
verifies backward compatibility before deployment. The minimum viable contract
verification is an automated schema diff against the published API - if the diff contains
a breaking change, the pipeline fails.
Additional Quality Gates for Distributed Architectures
| Gate | Defect Sources Addressed | Catalog Section |
|---|
| Provider schema backward compatibility | Interface mismatches from provider changes | Integration & Boundaries |
| Consumer-driven contract verification (where feasible) | Wrong assumptions about upstream/downstream | Integration & Boundaries |
| API schema backward compatibility check | Schema migration and backward compatibility failures | Data & State |
| Cross-service timeout propagation check | Missing timeout and deadline enforcement across boundaries | Performance & Resilience |
| Circuit breaker and fallback verification | Network partitions and partial failures handled wrong | Dependency & Infrastructure |
| Distributed tracing validation | Missing observability across service boundaries | Testing & Observability Gaps |
When This Architecture Works
This architecture is the goal for organizations with:
- Multiple teams that need different deployment cadences
- Services with well-defined, stable API boundaries
- Teams mature enough to own their full delivery pipeline
- Investment in contract testing tooling and API governance
When This Architecture Fails
- Shared database schemas: Multiple services can share a database engine without
problems. The failure mode is shared schemas - when Service A and Service B both read
from and write to the same tables, a schema migration by one service can break the
other’s queries. Each service must own its own schema. If two services need the same
data, expose it through an API or event, not through direct table access.
- Synchronous dependency chains: If Service A calls Service B which calls Service C
in the request path, a deployment of C can break A through B. Circuit breakers and
fallbacks are required at every boundary, and contract tests must cover failure modes,
not just success paths.
- No contract verification discipline: If teams skip backward compatibility checks
or let contract test failures slide, breakage shifts from the pipeline to production.
The architecture degrades into uncoordinated deployments with production as the
integration environment. At minimum, every provider must run automated schema
compatibility checks - even without consumer-driven contracts.
- Missing observability: When services deploy independently, debugging production
issues requires distributed tracing, correlated logging, and SLO monitoring across
service boundaries. Without this, independent deployment means independent
troubleshooting with no way to trace cause and effect.
Relationship to the Other Architectures
Architecture 3 is where Architecture 2 teams evolve to. The progression is:
- Single team, single deployable - one team, one pipeline, one artifact
- Multiple teams, single deployable - multiple teams, sub-pipelines, shared
integration step
- Independent teams, independent deployables - multiple teams, fully independent
pipelines, contract-based integration
The move from 2 to 3 happens incrementally. Extract one service at a time. Give it
its own pipeline. Establish contract tests between it and the monolith. When the contract
tests are reliable, stop running the extracted service’s code through the integration
pipeline. Repeat until the integration pipeline is empty.
Related Content
2 - Systemic Defect Fixes
A catalog of defect sources across the delivery value stream with earliest detection points, AI shift-left opportunities, and systemic prevention strategies.
Defects do not appear randomly. They originate from specific, predictable sources in the delivery
value stream. This reference catalogs those sources so teams can shift detection left, automate
where possible, and apply AI where it adds real value to the feedback loop.
The goal is systems thinking: detect issues as early as possible in the value stream so feedback informs continuous improvement in how we work, not just reactive fixes to individual defects.
- ▲ AI shifts detection earlier than current automation alone
- Dark cells = current automation is sufficient; AI adds no additional value
- No marker = AI assists at the current detection point but does not shift it earlier
How to Use This Catalog
- Pick your pain point. Find the category where your team loses the most time to defects or rework. Start there, not at the top.
- Focus on the Systemic Prevention column. Automated detection catches defects faster, but systemic prevention eliminates entire categories. Prioritize the prevention fix for each issue you selected.
- Measure before and after. Track defect escape rate by category and time-to-detection. If the systemic fix is working, both metrics improve within weeks.
Discovery
Requirements
Design
Coding
Pre-commit
CI
Acceptance Tests
Production
Shift left: earlier detection is cheaper to fix
Categories
| Category | What it covers |
|---|
| Product & Discovery | Wrong features, misaligned requirements, accessibility gaps - defects born before coding begins |
| Integration & Boundaries | Interface mismatches, behavioral assumptions, race conditions at service boundaries |
| Knowledge & Communication | Implicit domain knowledge, ambiguous requirements, tribal knowledge loss, divergent mental models |
| Change & Complexity | Unintended side effects, technical debt, feature interactions, configuration drift |
| Testing & Observability Gaps | Untested edge cases, missing contract tests, insufficient monitoring, environment parity |
| Process & Deployment | Long-lived branches, manual steps, large batches, inadequate rollback, work stacking |
| Data & State | Schema migration failures, null assumptions, concurrency issues, cache invalidation |
| Dependency & Infrastructure | Third-party breaking changes, environment differences, network partition handling |
| Security & Compliance | Vulnerabilities, secrets in source, auth gaps, injection, regulatory requirements, audit trails |
| Performance & Resilience | Regressions, resource leaks, capacity limits, missing timeouts, graceful degradation |
Where AI helps - and where it does not
AI adds the most value where detection requires reasoning across multiple signals that existing
tools cannot correlate: ambiguous requirements, undocumented assumptions, semantic code impact,
and knowledge gaps. Where deterministic tools already solve the problem (infrastructure drift,
null safety, branch age), AI adds cost without benefit. Look for the ▲ markers to find the highest-value AI opportunities.
Related Content
2.1 - Product & Discovery Defects
Defects that originate before a single line of code is written - the most expensive category because they compound through every downstream phase.
These defects originate before a single line of code is written. They are the most expensive to
fix because they compound through every downstream phase.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Building the wrong thing | Discovery | Product analytics platforms, usage trend alerts | ▲ Synthesize user feedback, support tickets, and usage data to surface misalignment earlier than production metrics | Validated user research before backlog entry; dual-track agile |
| Solving a problem nobody has | Discovery | Support ticket clustering tools, feature adoption tracking | ▲ Semantic analysis of interview transcripts, forums, and support tickets to identify real vs. assumed pain | Problem validation as a stage gate; publish problem brief before solution |
| Correct problem, wrong solution | Discovery | A/B testing frameworks, feature flag cohort comparison | Evaluate prototypes against problem definitions; generate alternative approaches | Prototype multiple approaches; measurable success criteria first |
| Meets spec but misses user intent | Requirements | Session replay tools, rage-click and error-loop detection | ▲ Review acceptance criteria against user behavior data to flag misalignment | Acceptance criteria focused on user outcomes, not checklists |
| Over-engineering beyond need | Design | Static analysis for dead code and unused abstractions | ▲ Flag unnecessary abstraction layers and premature optimization in code review | YAGNI principle; justify every abstraction layer |
| Prioritizing wrong work | Discovery | DORA metrics versus business outcomes, WSJF scoring | Synthesize roadmap, customer data, and market signals to surface opportunity costs | WSJF prioritization with outcome data |
| Inaccessible UI excludes users | Pre-commit | axe-core, pa11y, Lighthouse accessibility audits | Current tooling sufficient | WCAG compliance as acceptance criteria; automated accessibility checks in pipeline |
Related Content
2.2 - Integration & Boundaries Defects
Defects at system boundaries that are invisible to unit tests and often survive until production. Contract testing and deliberate boundary design are the primary defenses.
Defects at system boundaries are invisible to unit tests and often survive until production.
Contract testing and deliberate boundary design are the primary defenses.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Interface mismatches | CI | Consumer-driven contract tests, API schema validators | Predict which consumers break from API changes based on usage patterns | Mandatory contract tests per boundary; API-first with generated clients |
| Wrong assumptions about upstream/downstream | Design | Chaos engineering platforms, synthetic transactions, fault injection | ▲ Review code and docs to identify undocumented behavioral assumptions | Document behavioral contracts; defensive coding at boundaries |
| Race conditions | Pre-commit | Thread sanitizers, race detectors, formal verification tools, fuzz testing | Flag concurrency anti-patterns but cannot replace formal detection tools | Idempotent design; queues over shared mutable state |
Related Content
2.3 - Knowledge & Communication Defects
Defects that emerge from gaps between what people know and what the code expresses - the hardest to detect with automated tools and the easiest to prevent with team practices.
These defects emerge from gaps between what people know and what the code expresses.
They are the hardest to detect with automated tools and the easiest to prevent with team practices.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Implicit domain knowledge not in code | Coding | Magic number detection, code ownership analytics | ▲ Identify undocumented business rules and knowledge gaps from code and test analysis | Domain-Driven Design with ubiquitous language; embed rules in code |
| Ambiguous requirements | Requirements | Flag stories without acceptance criteria, BDD spec coverage tracking | ▲ Review requirements for ambiguity, missing edge cases, and contradictions; generate test scenarios | Three Amigos before work; example mapping; executable specs |
| Tribal knowledge loss | Coding | Bus factor analysis from commit history, single-author concentration alerts | ▲ Generate documentation from code and tests; flag documentation drift from implementation | Pair/mob programming as default; rotate on-call; living docs |
| Divergent mental models across teams | Design | Divergent naming detection, contract test failures | ▲ Compare terminology and domain models across codebases to detect semantic mismatches | Shared domain models; explicit bounded contexts |
Related Content
2.4 - Change & Complexity Defects
Defects caused by the act of changing existing code. The larger the change and the longer it lives outside trunk, the higher the risk.
These defects are caused by the act of changing existing code. The larger the change and the
longer it lives outside trunk, the higher the risk.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Unintended side effects | CI | Automated test suites, mutation testing frameworks, change impact analysis | ▲ Reason about semantic change impact beyond syntactic dependencies; automated blast radius analysis | Small focused commits; trunk-based development; feature flags |
| Accumulated technical debt | CI | Complexity trends, duplication scoring, dependency cycle detection, quality gates | ▲ Identify architectural drift, abstraction decay, and calcified workarounds | Refactoring as part of every story; dedicated debt budget |
| Unanticipated feature interactions | Acceptance Tests | Combinatorial and pairwise testing, feature flag interaction matrix | Reason about feature interactions semantically; flag conflicts testing matrices miss | Feature flags with controlled rollout; modular design; canary deployments |
| Configuration drift | CI | Infrastructure-as-code drift detection, environment diffing | Current tooling sufficient | Infrastructure as code; immutable infrastructure; GitOps |
Related Content
2.5 - Testing & Observability Gap Defects
Defects that survive because the safety net has holes. The fix is not more testing - it is better-targeted testing and observability that closes the specific gaps.
These defects survive because the safety net has holes. The fix is not more testing: it is
better-targeted testing and observability that closes the specific gaps.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Untested edge cases and error paths | CI | Mutation testing frameworks, branch coverage thresholds | ▲ Analyze code paths and generate tests for untested boundaries and error conditions | Property-based testing as standard; boundary value analysis |
| Missing contract tests at boundaries | CI | Boundary inventory versus contract test inventory | ▲ Identify boundaries lacking tests by understanding semantic service relationships | Mandatory contract tests per new boundary |
| Insufficient monitoring | Design | Observability coverage scoring, health endpoint checks, structured logging verification | Current tooling sufficient | Observability as non-functional requirement; SLOs for every user-facing path |
| Test environments don’t reflect production | CI | Automated environment parity checks, synthetic transaction comparison, infrastructure-as-code diff tools | Current tooling sufficient | Production-like data in staging; test in production with flags |
Related Content
2.6 - Process & Deployment Defects
Defects caused by the delivery process itself. Manual steps, large batches, and slow feedback loops create the conditions for failure.
These defects are caused by the delivery process itself. Manual steps, large batches, and
slow feedback loops create the conditions for failure.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Long-lived branches | Pre-commit | Branch age alerts, merge conflict frequency, CI dashboard for branch count | Process change, not AI | Trunk-based development; merge at least daily |
| Manual pipeline steps | CI | Pipeline audit for manual gates, deployment lead time analysis | Automation, not AI | Automate every step commit-to-production |
| Batching too many changes per release | CI | Changes-per-deploy metrics, deployment frequency tracking | CD practice, not AI | Every commit is a release candidate; single-piece flow |
| Inadequate rollback capability | CI | Automated rollback testing in CI, mean time to rollback measurement | Deployment patterns, not AI | Blue/green or canary deployments; auto-rollback on health failure |
| Reliance on human review to catch preventable defects | Coding | Linters, static analysis security testing, type systems, complexity scoring | ▲ Semantic code review for logic errors and missing edge cases that automated rules cannot express | Reserve human review for knowledge transfer and design decisions |
| Manual review of risks and compliance (CAB) | Design | Change lead time analysis, CAB effectiveness metrics | ▲ Automated change risk scoring from change diff and deployment history; blast radius analysis | Replace CAB with automated progressive delivery |
| Work stacking on individuals; everything started, nothing finished; PRs waiting days for review; uneven workloads; blocked work sits idle; completed work misses the intent | CI | Issue tracker reports where individuals have multiple items assigned simultaneously | Process change, not AI | Push-Based Work Assignment anti-pattern |
Related Content
2.7 - Data & State Defects
Data defects are particularly dangerous because they can corrupt persistent state. Unlike code defects, data corruption often cannot be fixed by deploying a new version.
Data defects are particularly dangerous because they can corrupt persistent state. Unlike code
defects, data corruption often cannot be fixed by deploying a new version.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Schema migration and backward compatibility failures | CI | Schema compatibility validators, migration dry-runs | Predict downstream impact by understanding consumer usage patterns | Expand-then-contract schema migrations; never breaking changes |
| Null or missing data assumptions | Pre-commit | Null safety static analyzers, strict type systems | Flag code where optional fields are used without null checks | Null-safe type systems; Option/Maybe as default; validate at boundaries |
| Concurrency and ordering issues | CI | Thread sanitizers, load tests with randomized timing | Design patterns, not AI | Design for out-of-order delivery; idempotent consumers |
| Cache invalidation errors | Acceptance Tests | Cache consistency monitoring, TTL verification, stale data detection | Review cache invalidation logic for incomplete paths or mismatches | Short TTLs; event-driven invalidation |
Related Content
2.8 - Dependency & Infrastructure Defects
Defects that originate outside your codebase but break your system. The fix is to treat external dependencies as untrusted boundaries.
These defects originate outside your codebase but break your system. The fix is to treat
external dependencies as untrusted boundaries.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Third-party library breaking changes | CI | Dependency update automation, software composition analysis for breaking versions | Review changelogs and API diffs to assess breaking change risk; predict compatibility issues | Pin dependencies; automated upgrade PRs with test gates |
| Infrastructure differences across environments | CI | Infrastructure-as-code drift detection, config comparison, environment parity scoring | IaC and GitOps, not AI | Single source of truth for all environments; containerization |
| Network partitions and partial failures handled wrong | Acceptance Tests | Chaos engineering platforms, synthetic transaction monitoring | Review architectures for missing failure handling patterns | Circuit breakers; retries; bulkheads as defaults; test failure modes explicitly |
Related Content
2.9 - Security & Compliance Defects
Security and compliance defects are silent until they are catastrophic. The gap between what the code does and what policy requires is invisible without deliberate, automated verification at every stage.
Security and compliance defects are silent until they are catastrophic. They share a pattern:
the gap between what the code does and what policy requires is invisible without deliberate,
automated verification at every stage.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Known vulnerabilities in dependencies | CI | Software composition analysis, CVE database scanning, dependency lock file auditing | ▲ Correlate vulnerability advisories with actual usage paths to prioritize exploitable risks over theoretical ones | Automated dependency updates with test gates; pin and audit all transitive dependencies |
| Secrets committed to source control | Pre-commit | Pre-commit secret scanners, entropy-based detection, git history auditing tools | Flag patterns that resemble credentials in code, config, and documentation | Secrets management platform; inject at runtime, never store in repo |
| Authentication and authorization gaps | Design | Security-focused integration tests, RBAC policy validators, access matrix verification | ▲ Review code paths for missing authorization checks and privilege escalation patterns | Centralized auth framework; deny-by-default access policies; automated access matrix tests |
| Injection vulnerabilities | Pre-commit | SAST tools, taint analysis, parameterized query enforcement | ▲ Identify subtle injection vectors that pattern-matching rules miss, including second-order injection | Input validation at boundaries; parameterized queries as default; content security policies |
| Regulatory requirement gaps | Requirements | Compliance-as-code policy engines, automated control mapping | ▲ Map regulatory requirements to implementation artifacts and flag uncovered controls | Compliance requirements as acceptance criteria; automated evidence collection |
| Missing audit trails | Design | Structured logging verification, audit event coverage scoring | Review code for state-changing operations that lack audit logging | Audit logging as a framework default; every state change emits a structured event |
| License compliance violations | CI | License scanning tools, SBOM generation and policy evaluation | Review license compatibility across the full dependency graph | Approved license allowlist enforced in CI; SBOM generated on every build |
Related Content
2.10 - Performance & Resilience Defects
Performance defects degrade gradually, often hiding behind averages until a threshold tips and the system fails under real load. Detection requires baselines, budgets, and automated enforcement - not periodic manual testing.
Performance defects are rarely binary. They degrade gradually, often hiding behind averages
until a threshold tips and the system fails under real load. Detection requires baselines,
budgets, and automated enforcement - not periodic manual testing.
| Issue | Earliest Detection (Automation) | Automated Detection | Earlier Detection with AI | Systemic Prevention |
|---|
| Performance regressions | CI | Automated benchmark suites, performance budget enforcement in CI | ▲ Identify code changes likely to degrade performance from structural analysis before benchmarks run | Performance budgets enforced in CI; benchmark suite runs on every commit |
| Resource leaks | CI | Memory and connection pool profilers, leak detection in automated test runs | Flag allocation patterns without corresponding cleanup in code review | Resource management via language-level constructs (try-with-resources, RAII, using); pool size alerts |
| Unknown capacity limits | Acceptance Tests | Load testing frameworks, capacity threshold monitoring, saturation alerts | Predict capacity bottlenecks from architecture and traffic patterns | Regular automated load tests; capacity model updated with every architecture change |
| Missing timeout and deadline enforcement | Pre-commit | Static analysis for unbounded calls, integration test timeout verification | ▲ Identify call chains with missing or inconsistent timeout propagation | Default timeouts on all external calls; deadline propagation across service boundaries |
| Slow user-facing response times | CI | Real user monitoring, synthetic transaction baselines, web vitals tracking | Correlate frontend and backend telemetry to pinpoint latency sources | Response time SLOs per user-facing path; performance budgets for page weight and API latency |
| Missing graceful degradation | Design | Chaos engineering platforms, failure injection, circuit breaker verification | ▲ Review architectures for single points of failure and missing fallback paths | Design for partial failure; circuit breakers and fallbacks as defaults; game day exercises |
Related Content
3 - CD Practices
Concise definitions of the core continuous delivery practices from MinimumCD.
These pages define the minimum practices required for continuous delivery. Each page covers
what the practice is, why it matters, and what the minimum criteria are. For migration
guidance and tactical how-to content, follow the links to the corresponding phase pages.
Core Practices
3.1 - Continuous Integration
Integrate work to trunk at least daily with automated testing to maintain a releasable codebase.
Definition
Continuous Integration (CI) is the activity of each developer integrating work to the trunk of version control at least daily and verifying that the work is, to the best of our knowledge, releasable.
CI is not just about tooling - it is fundamentally about team workflow and working agreements.
Minimum Activities Required
- Trunk-based development - all work integrates to trunk
- Work integrates to trunk at a minimum daily (each developer, every day)
- Work has automated testing before merge to trunk
- Work is tested with other work automatically on merge
- All feature work stops when the build is red
- New work does not break delivered work
Why This Matters
Without CI, Teams Experience
- Integration hell: Weeks or months of painful merge conflicts
- Late defect detection: Bugs found after they are expensive to fix
- Reduced collaboration: Developers work in isolation, losing context
- Deployment fear: Large batches of untested changes create risk
- Slower delivery: Time wasted on merge conflicts and rework
- Quality erosion: Without rapid feedback, technical debt accumulates
With CI, Teams Achieve
- Rapid feedback: Know within minutes if changes broke something
- Smaller changes: Daily integration forces better work breakdown
- Better collaboration: Team shares ownership of the codebase
- Lower risk: Small, tested changes are easier to diagnose and fix
- Faster delivery: No integration delays blocking deployment
- Higher quality: Continuous testing catches issues early
What Is Improved
Teamwork
CI requires strong teamwork to function correctly. Key improvements:
- Pull workflow: Team picks next important work instead of working from assignments
- Code review cadence: Quick reviews (< 4 hours) keep work flowing
- Pair programming: Real-time collaboration eliminates review delays
- Shared ownership: Everyone maintains the codebase together
- Team goals over individual tasks: Focus shifts from “my work” to “our progress”
Work Breakdown
CI forces better work decomposition:
- Definition of Ready: Every story has testable acceptance criteria before work starts
- Small batches: If the team can complete work in < 2 days, it is refined enough
- Vertical slicing: Each change delivers a thin, tested slice of functionality
- Incremental delivery: Features built incrementally, each step integrated daily
Testing
CI requires a shift in testing approach:
- From writing tests after code is “complete” to writing tests before/during coding (TDD/BDD)
- From testing implementation details to testing behavior and outcomes
- From manual testing before deployment to automated testing on every commit
- From separate QA phase to quality built into development
Migration Guidance
For detailed guidance on adopting CI practices during your CD migration, see:
Additional Resources
3.2 - Trunk-Based Development
All changes integrate into a single shared trunk with no intermediate branches.
“Trunk-based development has been shown to be a predictor of high performance in software development and delivery. It is characterized by fewer than three active branches in a code repository; branches and forks having very short lifetimes (e.g., less than a day) before being merged; and application teams rarely or never having ‘code lock’ periods when no one can check in code or do pull requests due to merging conflicts, code freezes, or stabilization phases.”
- Accelerate by Nicole Forsgren Ph.D., Jez Humble & Gene Kim
Definition
Trunk-based development (TBD) is a team workflow where changes are integrated into the trunk with no intermediate integration (develop, test, etc.) branch. The two common workflows are making changes directly to the trunk or using very short-lived branches that branch from the trunk and integrate back into the trunk.
Release branches are an intermediate step that some choose on their path to continuous delivery while improving their quality processes in the pipeline. True CD releases from the trunk.
Minimum Activities Required
- All changes integrate into the trunk
- If branches from the trunk are used:
- They originate from the trunk
- They re-integrate to the trunk
- They are short-lived and removed after the merge
What Is Improved
- Smaller changes: TBD emphasizes small, frequent changes that are easier for the team to review and more resistant to impactful merge conflicts. Conflicts become rare and trivial.
- We must test: TBD requires us to implement tests as part of the development process.
- Better teamwork: We need to work more closely as a team. This has many positive impacts, not least we will be more focused on getting the team’s highest priority done.
- Better work definition: Small changes require us to decompose the work into a level of detail that helps uncover things that lack clarity or do not make sense. This provides much earlier feedback on potential quality issues.
- Replaces process with engineering: Instead of creating a process where we control the release of features with branches, we can control the release of features with engineering techniques called evolutionary coding methods. These techniques have additional benefits related to stability that cannot be found when replaced by process.
- Reduces risk: Long-lived branches carry two common risks. First, the change will not integrate cleanly and the merge conflicts result in broken or lost features. Second, the branch will be abandoned, usually because of the first reason.
Migration Guidance
For detailed guidance on adopting TBD during your CD migration, see:
Additional Resources
3.3 - Single Path to Production
All deployments flow through one automated pipeline - no exceptions.
Definition
The deployment pipeline is the single, standardized path for all changes to reach any environment - development, testing, staging, or production. No manual deployments, no side channels, no “quick fixes” bypassing the pipeline. If it is not deployed through the pipeline, it does not get deployed.
Key Principles
- Single path: All deployments flow through the same pipeline
- No exceptions: Even hotfixes and rollbacks go through the pipeline
- Automated: Deployment is triggered automatically after pipeline validation
- Auditable: Every deployment is tracked and traceable
- Consistent: The same process deploys to all environments
What Is Improved
- Reliability: Every deployment is validated the same way
- Traceability: Clear audit trail from commit to production
- Consistency: Environments stay in sync
- Speed: Automated deployments are faster than manual
- Safety: Quality gates are never bypassed
- Confidence: Teams trust that production matches what was tested
- Recovery: Rollbacks are as reliable as forward deployments
Migration Guidance
For detailed guidance on establishing a single path to production, see:
Additional Resources
3.4 - Deterministic Pipeline
The same inputs to the pipeline always produce the same outputs.
Definition
A deterministic pipeline produces consistent, repeatable results. Given the same inputs (code, configuration, dependencies), the pipeline will always produce the same outputs and reach the same pass/fail verdict. The pipeline’s decision on whether a change is releasable is definitive - if it passes, deploy it; if it fails, fix it.
Key Principles
- Repeatable: Running the pipeline twice with identical inputs produces identical results
- Authoritative: The pipeline is the final arbiter of quality, not humans
- Immutable: No manual changes to artifacts or environments between pipeline stages
- Trustworthy: Teams trust the pipeline’s verdict without second-guessing
What Makes a Pipeline Deterministic
- Version control everything: Source code, IaC, pipeline definitions, test data, dependency lockfiles, tool versions
- Lock dependency versions: Always use lockfiles. Never rely on
latest or version ranges. - Eliminate environmental variance: Containerize builds, pin image tags, install exact tool versions
- Remove human intervention: No manual approvals in the critical path, no manual environment setup
- Fix flaky tests immediately: Quarantine, fix, or delete. Never allow a “just re-run it” culture.
What Is Improved
- Quality increases: Real issues are never dismissed as “flaky tests”
- Speed increases: No time wasted on test reruns or manual verification
- Trust increases: Teams rely on the pipeline instead of adding manual gates
- Debugging improves: Failures are reproducible, making root cause analysis easier
- Delivery improves: Faster, more reliable path from commit to production
Migration Guidance
For detailed guidance on building a deterministic pipeline, see:
- Deterministic Pipeline - Phase 2 pipeline practice with anti-pattern/good-pattern examples and getting started steps
Additional Resources
3.5 - Definition of Deployable
Automated criteria that determine when a change is ready for production.
Definition
The “definition of deployable” is your organization’s agreed-upon set of non-negotiable quality criteria that every artifact must pass before it can be deployed to any environment. This definition should be automated, enforced by the pipeline, and treated as the authoritative verdict on whether a change is ready for deployment.
Key Principles
- Pipeline is definitive: If the pipeline passes, the artifact is deployable - no exceptions
- Automated validation: All criteria are checked automatically, not manually
- Consistent across environments: The same standards apply whether deploying to test or production
- Fails fast: The pipeline rejects artifacts that do not meet the standard immediately
What Should Be in Your Definition
Your definition of deployable should include automated checks for:
- Security: SAST scans, dependency vulnerability scans, secret detection
- Functionality: Unit tests, integration tests, end-to-end tests, regression tests
- Compliance: Audit trails, policy as code, change documentation
- Performance: Response time thresholds, load test baselines, resource utilization
- Reliability: Health check validation, graceful degradation tests, rollback verification
- Code quality: Linting, static analysis, complexity metrics
What Is Improved
- Removes bottlenecks: No waiting for manual approval meetings
- Increases quality: Automated checks catch more issues than manual reviews
- Reduces cycle time: Deployable artifacts are identified in minutes, not days
- Improves collaboration: Shared understanding of quality standards
- Enables continuous delivery: Trust in the pipeline makes frequent deployments safe
Migration Guidance
For detailed guidance on defining what “deployable” means for your organization, see:
- Deployable Definition - Phase 2 pipeline practice with progressive quality gates, context-specific definitions, and getting started steps
Additional Resources
3.6 - Immutable Artifacts
Build once, deploy everywhere. The artifact is never modified after creation.
Definition
Central to CD is that we are validating the artifact with the pipeline. It is built once and deployed to all environments. A common anti-pattern is building an artifact for each environment. The pipeline should generate immutable, versioned artifacts.
Immutable Pipeline: Failures should be addressed by changes in version control so that two executions with the same configuration always yield the same results. Never go to the failure point, make adjustments in the environment, and re-start from that point.
Immutable Artifacts: Some package management systems allow the creation of release candidate versions. For example, it is common to find -SNAPSHOT versions in Java. However, this means the artifact’s behavior can change without modifying the version. Version numbers are cheap. If we are to have an immutable pipeline, it must produce an immutable artifact. Never use or produce -SNAPSHOT versions.
Immutability provides the confidence to know that the results from the pipeline are real and repeatable.
What Is Improved
- Everything must be version controlled: source code, environment configurations, application configurations, and even test data. This reduces variability and improves the quality process.
- Confidence in testing: The artifact validated in pre-production is byte-for-byte identical to what runs in production.
- Faster rollback: Previous artifacts are unchanged in the artifact repository, ready to be redeployed.
- Audit trail: Every artifact is traceable to a specific commit and pipeline run.
Migration Guidance
For detailed guidance on implementing immutable artifacts, see:
- Immutable Artifacts - Phase 2 pipeline practice with anti-patterns, good patterns, and getting started steps
Additional Resources
3.7 - Production-Like Environments
Test in environments that mirror production to catch environment-specific issues early.
Definition
It is crucial to leverage pre-production environments in your CD pipeline to run all of your tests (unit, integration, UAT, manual QA, E2E) early and often. Test environments increase interaction with new features and exposure to bugs - both of which are important prerequisites for reliable software.
Types of Pre-Production Environments
Most organizations employ both static and short-lived environments and utilize them for case-specific stages of the SDLC:
Staging environment: The last environment that teams run automated tests against prior to deployment, particularly for testing interaction between all new features after a merge. Its infrastructure reflects production as closely as possible.
Ephemeral environments: Full-stack, on-demand environments spun up on every code change. Each ephemeral environment is leveraged in your pipeline to run E2E, unit, and integration tests on every code change. These environments are defined in version control, created and destroyed automatically on demand. They are short-lived by definition but should closely resemble production. They replace long-lived “static” environments and the maintenance required to keep those stable.
What Is Improved
- Infrastructure is kept consistent: Test environments deliver results that reflect real-world performance. Fewer unprecedented bugs reach production since using prod-like data and dependencies allows you to run your entire test suite earlier.
- Test against latest changes: These environments rebuild upon code changes with no manual intervention.
- Test before merge: Attaching an ephemeral environment to every PR enables E2E testing in your CI before code changes get deployed to staging.
Migration Guidance
For detailed guidance on implementing production-like environments, see:
Additional Resources
3.8 - Rollback
Fast, automated recovery from any deployment.
Definition
Rollback on-demand means the ability to quickly and safely revert to a previous working version of your application at any time, without requiring special approval, manual intervention, or complex procedures. It should be as simple and reliable as deploying forward.
Key Principles
- Fast: Rollback completes in minutes, not hours. Target < 5 minutes.
- Automated: No manual steps or special procedures. Single command or click.
- Safe: Rollback is validated just like forward deployment.
- Simple: Any team member can execute it without specialized knowledge.
- Tested: Rollback mechanism is regularly tested, not just used in emergencies.
What Is Improved
- Mean Time To Recovery (MTTR): Drops from hours to minutes
- Deployment frequency: Increases due to reduced risk
- Team confidence: Higher willingness to deploy
- Customer satisfaction: Faster incident resolution
- On-call burden: Reduced stress for on-call engineers
Migration Guidance
For detailed guidance on implementing rollback capability, see:
- Rollback - Phase 2 pipeline practice with blue-green, canary, feature flag, and database-safe rollback patterns
Additional Resources
3.9 - Application Configuration
Separate what varies between environments from what does not.
Definition
Application configuration defines the internal behavior of your application and is bundled with the artifact. It does not vary between environments. This is distinct from environment configuration (secrets, URLs, credentials) which varies by deployment.
We embrace The Twelve-Factor App config definitions:
- Application Configuration: Internal to the app, does NOT vary by environment (feature flags, business rules, UI themes, default settings)
- Environment Configuration: Varies by deployment (database URLs, API keys, service endpoints, credentials)
Key Principles
Application configuration should be:
- Version controlled with the source code
- Deployed as part of the immutable artifact
- Testable in the CI pipeline
- Unchangeable after the artifact is built
What Is Improved
- Immutability: The artifact tested in staging is identical to what runs in production
- Traceability: You can trace any behavior back to a specific commit
- Testability: Application behavior can be validated in the pipeline before deployment
- Reliability: No configuration drift between environments caused by manual changes
- Faster rollback: Rolling back an artifact rolls back all application configuration changes
Migration Guidance
For detailed guidance on managing application configuration, see:
Additional Resources
4 - Metrics
Detailed definitions for key delivery metrics. Understand what to measure and why.
These metrics help you assess your current delivery performance and track improvement
over time. Not all metrics are equally useful at every stage of a CD migration.
Leading Indicators
Leading indicators reflect the current state of team behaviors. They move immediately
when those behaviors change, making them the most useful metrics for driving improvement
during a CD migration. When a leading indicator is unhealthy, the cause is visible and
addressable today.
DORA Outcome Metrics
The four DORA key metrics are lagging indicators drawn from the DORA research program.
They reflect the cumulative effect of many upstream behaviors and confirm that improvement
work is having the expected systemic effect. Because they are outcome measures, they move
slowly: changes in leading indicator behaviors take weeks or months to surface in these
numbers. Use them to validate the direction of improvement, not to drive it.
4.1 - Integration Frequency
How often developers integrate code changes to the trunk. A leading indicator of CI maturity and small batch delivery.
Definition
Integration Frequency measures the average number of production-ready pull requests
a team merges to trunk per day, normalized by team size. On a team of five
developers, healthy continuous integration practice produces at least five
integrations per day, roughly one per developer.
This metric is a direct indicator of how well a team practices
Continuous Integration.
Teams that integrate frequently work in small batches, receive fast feedback, and
reduce the risk associated with large, infrequent merges.
A value of 1.0 or higher per developer per day indicates that work is being
decomposed into small, independently deliverable increments.
How to Measure
- Count trunk merges. Track the number of pull requests (or direct commits)
merged to
main or trunk each day. - Normalize by team size. Divide the daily count by the number of developers
actively contributing that day.
- Calculate the rolling average. Use a 5-day or 10-day rolling window to
smooth daily variation and surface meaningful trends.
Most source control platforms expose this data through their APIs:
- GitHub: list merged pull requests via the REST or GraphQL API.
- GitLab: query merged merge requests per project.
- Bitbucket: use the pull request activity endpoint.
Alternatively, count commits to the default branch if pull requests are not used.
Targets
| Level | Integration Frequency (per developer per day) |
|---|
| Low | Less than 1 per week |
| Medium | A few times per week |
| High | Once per day |
| Elite | Multiple times per day |
The elite target aligns with trunk-based development, where developers push small
changes to the trunk multiple times daily and rely on automated testing and feature
flags to manage risk.
Common Pitfalls
- Meaningless commits. Teams may inflate the count by integrating trivial or
empty changes. Pair this metric with code review quality and defect rate.
- Breaking the trunk. Pushing faster without adequate test coverage leads to a
red build and slows the entire team. Always pair Integration Frequency with build
success rate and Change Fail Rate.
- Counting the wrong thing. Merges to long-lived feature branches do not count.
Only merges to the trunk or main integration branch reflect true CI practice.
- Ignoring quality. If defect rates rise as integration
frequency increases, the team is skipping quality steps. Use defect rate as a
guardrail metric.
Connection to CD
Integration Frequency is the foundational metric for Continuous Delivery. Without
frequent integration, every downstream metric suffers:
- Smaller batches reduce risk. Each integration carries less change, making
failures easier to diagnose and fix.
- Faster feedback loops. Frequent integration means the CI pipeline runs more
often, catching issues within minutes instead of days.
- Enables trunk-based development. High integration frequency is incompatible
with long-lived branches. Teams naturally move toward short-lived branches or
direct trunk commits.
- Reduces merge conflicts. The longer code stays on a branch, the more likely
it diverges from trunk. Frequent integration keeps the delta small.
- Prerequisite for deployment frequency. You cannot deploy more often than you
integrate. Improving this metric directly unblocks improvements to
Release Frequency.
To improve Integration Frequency:
4.2 - Build Duration
Time from code commit to a deployable artifact. A leading indicator of feedback speed and the floor for mean time to repair.
Definition
Build Duration measures the elapsed time from when a developer pushes a commit
until the CI pipeline produces a deployable artifact and all automated quality
gates have passed. This includes compilation, unit tests, integration tests, static
analysis, security scans, and artifact packaging.
Build Duration represents the minimum possible time between deciding to make a
change and having that change ready for production. It sets a hard floor on
Lead Time and directly constrains how quickly a team can
respond to production incidents.
This metric is sometimes referred to as “pipeline cycle time” or “CI cycle time.”
The book Accelerate references it as part of “hard lead time.”
How to Measure
- Record the commit timestamp. Capture when the commit arrives at the CI
server (webhook receipt or pipeline trigger time).
- Record the artifact-ready timestamp. Capture when the final pipeline stage
completes successfully and the deployable artifact is published.
- Calculate the difference. Subtract the commit timestamp from the
artifact-ready timestamp.
- Track the median and p95. The median shows typical performance. The 95th
percentile reveals worst-case builds that block developers.
Most CI platforms expose build duration natively:
- GitHub Actions:
createdAt and updatedAt on workflow runs. - GitLab CI: pipeline
created_at and finished_at. - Jenkins: build start time and duration fields.
- CircleCI: workflow duration in the Insights dashboard.
Set up alerts when builds exceed your target threshold so the team can investigate
regressions immediately.
Targets
| Level | Build Duration |
|---|
| Low | More than 30 minutes |
| Medium | 10 to 30 minutes |
| High | 5 to 10 minutes |
| Elite | Less than 5 minutes |
The ten-minute threshold is a widely recognized guideline. Builds longer than ten
minutes break developer flow, discourage frequent integration, and increase the
cost of fixing failures.
Common Pitfalls
- Removing tests to hit targets. Reducing test count or skipping test types
(integration, security) lowers build duration but degrades quality. Always pair
this metric with Change Fail Rate and defect rate.
- Ignoring queue time. If builds wait in a queue before execution, the
developer experiences the queue time as part of the feedback delay even though it
is not technically “build” time. Measure wall-clock time from commit to result.
- Optimizing the wrong stage. Profile the pipeline before optimizing. Often a
single slow test suite or a sequential step that could run in parallel dominates
the total duration.
- Flaky tests. Tests that intermittently fail cause retries, effectively
doubling or tripling build duration. Track flake rate alongside build duration.
Connection to CD
Build Duration is a critical bottleneck in the Continuous Delivery pipeline:
- Constrains Mean Time to Repair. When production is down, the build pipeline
is the minimum time to get a fix deployed. A 30-minute build means at least 30
minutes of downtime for any fix, no matter how small. Reducing build duration
directly improves MTTR.
- Enables frequent integration. Developers are unlikely to integrate multiple
times per day if each integration takes 30 minutes to validate. Short builds
encourage higher Integration Frequency.
- Shortens feedback loops. The sooner a developer learns that a change broke
something, the less context they have lost and the cheaper the fix. Builds under
ten minutes keep developers in flow.
- Supports continuous deployment. Automated deployment pipelines cannot deliver
changes rapidly if the build stage is slow. Build duration is often the largest
component of Lead Time.
To improve Build Duration:
- Parallelize stages. Run unit tests, linting, and security scans concurrently
rather than sequentially.
- Replace slow end-to-end tests. Move heavyweight end-to-end tests to an
asynchronous post-deploy verification stage. Use contract tests and service
virtualization in the main pipeline.
- Decompose large services. Smaller codebases compile and test faster. If build
duration is stubbornly high, consider breaking the service into smaller domains.
- Cache aggressively. Cache dependencies, Docker layers, and compilation
artifacts between builds.
- Set a build time budget. Alert the team whenever a new test or step pushes
the build past your target, so test efficiency is continuously maintained.
4.3 - Development Cycle Time
Average time from when work starts until it is running in production. A leading indicator of batch size and delivery flow.
Definition
Development Cycle Time measures the elapsed time from when a developer begins work
on a story or task until that work is deployed to production and available to users.
It captures the full construction phase of delivery: coding, code review, testing,
integration, and deployment.
This is distinct from Lead Time, which includes the time a request
spends waiting in the backlog before work begins. Development Cycle Time focuses
exclusively on the active delivery phase.
The Accelerate research uses “lead time for changes” (measured from commit to
production) as a key DORA metric. Development Cycle Time extends this slightly
further back to when work starts, capturing the full development process including
any time between starting work and the first commit.
How to Measure
- Record when work starts. Capture the timestamp when a story moves to
“In Progress” in your issue tracker, or when the first commit for the story
appears.
- Record when work reaches production. Capture the timestamp of the
production deployment that includes the completed story.
- Calculate the difference. Subtract the start time from the production
deploy time.
- Report the median and distribution. The median provides a typical value.
The distribution (or a control chart) reveals variability and outliers that
indicate process problems.
Sources for this data include:
- Issue trackers (Jira, GitHub Issues, Azure Boards): status transition
timestamps.
- Source control: first commit timestamp associated with a story.
- Deployment logs: timestamp of production deployments linked to stories.
Linking stories to deployments is essential. Use commit message conventions (e.g.,
story IDs in commit messages) or deployment metadata to create this connection.
Targets
| Level | Development Cycle Time |
|---|
| Low | More than 2 weeks |
| Medium | 1 to 2 weeks |
| High | 2 to 7 days |
| Elite | Less than 2 days |
Elite teams deliver completed work to production within one to two days of starting
it. This is achievable only when work is decomposed into small increments, the
pipeline is fast, and deployment is automated.
Common Pitfalls
- Marking work “Done” before it reaches production. If “Done” means “code
complete” rather than “deployed,” the metric understates actual cycle time. The
Definition of Done must include production deployment.
- Skipping the backlog. Moving items from “Backlog” directly to “Done” after
deploying hides the true wait time and development duration. Ensure stories pass
through the standard workflow stages.
- Splitting work into functional tasks. Breaking a story into separate
“development,” “testing,” and “deployment” tasks obscures the end-to-end cycle
time. Measure at the story or feature level.
- Ignoring variability. A low average can hide a bimodal distribution where
some stories take hours and others take weeks. Use a control chart or histogram
to expose the full picture.
- Optimizing for speed without quality. If cycle time drops but
Change Fail Rate rises, the team is cutting corners.
Use quality metrics as guardrails.
Connection to CD
Development Cycle Time is the most comprehensive measure of delivery flow and sits
at the heart of Continuous Delivery:
- Exposes bottlenecks. A long cycle time reveals where work gets stuck:
waiting for code review, queued for testing, blocked by a manual approval, or
delayed by a slow pipeline. Each bottleneck is a target for improvement.
- Drives smaller batches. The only way to achieve a cycle time under two days
is to decompose work into very small increments. This naturally leads to smaller
changes, less risk, and faster feedback.
- Reduces waste from changing priorities. Long cycle times mean work in progress
is exposed to priority changes, context switches, and scope creep. Shorter cycles
reduce the window of vulnerability.
- Improves feedback quality. The sooner a change reaches production, the sooner
the team gets real user feedback. Short cycle times enable rapid learning and
course correction.
- Subsumes other metrics. Cycle time is affected by Integration
Frequency, Build Duration,
and Work in Progress. Improving any of these upstream
metrics will reduce cycle time.
To improve Development Cycle Time:
- Decompose work into stories that can be completed and deployed within one to two
days.
- Remove handoffs between teams (e.g., separate dev and QA teams).
- Automate the build and deploy pipeline to eliminate manual steps.
- Improve test design so the pipeline runs faster without sacrificing coverage.
- Limit Work in Progress so the team focuses on finishing
work rather than starting new items.
4.4 - Lead Time
Total time from when a change is committed until it is running in production. A DORA lagging outcome metric for pipeline efficiency.
Definition
Lead Time measures the total elapsed time from when a code change is committed to
the version control system until that change is successfully running in production.
This is one of the four key metrics identified by the DORA (DevOps Research and
Assessment) team as a predictor of software delivery performance. Lead Time is a lagging
outcome metric: it reflects the cumulative effect of pipeline automation, work decomposition,
and integration practices. Improving Build Duration and
Integration Frequency are the leading indicators to address first.
In the broader value stream, “lead time” can also refer to the time from a customer
request to delivery. The DORA definition focuses specifically on the segment from
commit to production, which the Accelerate research calls “lead time for changes.”
This narrower definition captures the efficiency of your delivery pipeline and
deployment process.
Lead Time includes Build Duration plus any additional time
for deployment, approval gates, environment provisioning, and post-deploy
verification. It is a superset of build time and a subset of
Development Cycle Time, which also includes the
coding phase before the first commit.
How to Measure
- Record the commit timestamp. Use the timestamp of the commit as recorded in
source control (not the local author timestamp, but the time it was pushed or
merged to the trunk).
- Record the production deployment timestamp. Capture when the deployment
containing that commit completes successfully in production.
- Calculate the difference. Subtract the commit time from the deploy time.
- Aggregate across commits. Report the median lead time across all commits
deployed in a given period (daily, weekly, or per release).
Data sources:
- Source control: commit or merge timestamps from Git, GitHub, GitLab, etc.
- Pipeline platform: pipeline completion times from Jenkins, GitHub Actions,
GitLab CI, etc.
- Deployment tooling: production deployment timestamps from Argo CD, Spinnaker,
Flux, or custom scripts.
For teams practicing continuous deployment, lead time may be nearly identical to
build duration. For teams with manual approval gates or scheduled release windows,
lead time will be significantly longer.
Targets
| Level | Lead Time for Changes |
|---|
| Low | More than 6 months |
| Medium | 1 to 6 months |
| High | 1 day to 1 week |
| Elite | Less than 1 hour |
These levels are drawn from the DORA State of DevOps research. Elite performers
deliver changes to production in under an hour from commit, enabled by fully
automated pipelines and continuous deployment.
Common Pitfalls
- Measuring only build time. Lead time includes everything after the commit,
not just the CI pipeline. Manual approval gates, scheduled deployment windows,
and environment provisioning delays must all be included.
- Ignoring waiting time. A change may sit in a queue waiting for a release
train, a change advisory board (CAB) review, or a deployment window. This wait
time is part of lead time and often dominates the total.
- Tracking requests instead of commits. Some teams measure from customer request
to delivery. While valuable, this conflates backlog prioritization with delivery
efficiency. Keep this metric focused on the commit-to-production segment.
- Hiding items from the backlog. Requests tracked in spreadsheets or side
channels before entering the backlog distort lead time measurements. Ensure all
work enters the system of record promptly.
- Reducing quality to reduce lead time. Shortening approval processes or
skipping test stages reduces lead time at the cost of quality. Pair this metric
with Change Fail Rate as a guardrail.
Connection to CD
Lead Time is one of the four DORA metrics and a direct measure of your delivery
pipeline’s end-to-end efficiency:
- Reveals pipeline bottlenecks. A large gap between build duration and lead time
points to manual processes, approval queues, or deployment delays that the team
can target for automation.
- Measures the cost of failure recovery. When production breaks, lead time is
the minimum time to deliver a fix (unless you roll back). This makes lead time
a direct input to Mean Time to Repair.
- Drives automation. The primary way to reduce lead time is to automate every
step between commit and production: build, test, security scanning, environment
provisioning, deployment, and verification.
- Reflects deployment strategy. Teams using continuous deployment have lead
times measured in minutes. Teams using weekly release trains have lead times
measured in days. The metric makes the cost of batching visible.
- Connects speed and stability. The DORA research shows that elite performers
achieve both low lead time and low Change Fail Rate.
Speed and quality are not trade-offs. They reinforce each other when the
delivery system is well-designed.
To improve Lead Time:
- Automate the deployment pipeline end to end, eliminating manual gates.
- Replace change advisory board (CAB) reviews with automated policy checks and
peer review.
- Deploy on every successful build rather than batching changes into release trains.
- Reduce Build Duration to shrink the largest component of
lead time.
- Monitor and eliminate environment provisioning delays.
4.5 - Change Fail Rate
Percentage of production deployments that cause a failure or require remediation. A DORA lagging outcome metric for delivery stability.
Definition
Change Fail Rate measures the percentage of deployments to production that result
in degraded service, negative customer impact, or require immediate remediation
such as a rollback, hotfix, or patch.
A “failed change” includes any deployment that:
- Is rolled back.
- Requires a hotfix deployed within a short window (commonly 24 hours).
- Triggers a production incident attributed to the change.
- Requires manual intervention to restore service.
This is one of the four DORA key metrics. It measures the stability side of
delivery performance, complementing the throughput metrics of
Lead Time and Release Frequency.
Change Fail Rate is a lagging outcome metric: it reflects the cumulative quality of your
test coverage, change size practices, and pipeline gates. The leading indicator to improve
first is Integration Frequency, since smaller batches
fail less often and are easier to diagnose.
How to Measure
- Count total production deployments over a defined period (weekly, monthly).
- Count deployments classified as failures using the criteria above.
- Divide failures by total deployments and express as a percentage.
Data sources:
- Deployment logs: total deployment count from your CD platform.
- Incident management: incidents linked to specific deployments (PagerDuty,
Opsgenie, ServiceNow).
- Rollback records: deployments that were reverted, either manually or by
automated rollback.
- Hotfix tracking: deployments tagged as hotfixes or emergency changes.
Automate the classification where possible. For example, if a deployment is
followed by another deployment of the same service within a defined window (e.g.,
one hour), flag the original as a potential failure for review.
Targets
| Level | Change Fail Rate |
|---|
| Low | 46 to 60% |
| Medium | 16 to 45% |
| High | 0 to 15% |
| Elite | 0 to 5% |
These levels are drawn from the DORA State of DevOps research. Elite performers
maintain a change fail rate below 5%, meaning fewer than 1 in 20 deployments causes
a problem.
Common Pitfalls
- Not recording failures. Deploying fixes without logging the original failure
understates the true rate. Ensure every incident and rollback is tracked.
- Reclassifying defects. Creating review processes that reclassify production
defects as “feature requests” or “known limitations” hides real failures.
- Inflating deployment count. Re-deploying the same working version to increase
the denominator artificially lowers the rate. Only count deployments that contain
new changes.
- Pursuing zero defects at the cost of speed. An obsessive focus on eliminating
all failures can slow Release Frequency to a crawl. A
small failure rate with fast recovery is preferable to near-zero failures with
monthly deployments.
- Ignoring near-misses. Changes that cause degraded performance but do not
trigger a full incident are still failures. Define clear criteria for what
constitutes a failed change and apply them consistently.
Connection to CD
Change Fail Rate is the primary quality signal in a Continuous Delivery pipeline:
- Validates pipeline quality gates. A rising change fail rate indicates that
the automated tests, security scans, and quality checks in the pipeline are not
catching enough defects. Each failure is an opportunity to add or improve a
quality gate.
- Enables confidence in frequent releases. Teams will only deploy frequently
if they trust the pipeline. A low change fail rate builds this trust and
supports higher Release Frequency.
- Smaller changes fail less. The DORA research consistently shows that smaller,
more frequent deployments have lower failure rates than large, infrequent
releases. Improving Integration Frequency naturally
improves this metric.
- Drives root cause analysis. Each failed change should trigger a blameless
investigation: what automated check could have caught this? The answers feed
directly into pipeline improvements.
- Balances throughput metrics. Change Fail Rate is the essential guardrail for
Lead Time and Release Frequency. If
those metrics improve while change fail rate worsens, the team is trading quality
for speed.
To improve Change Fail Rate:
- Deploy smaller changes more frequently to reduce the blast radius of failures.
- Identify the root cause of each failure and add automated checks to prevent
recurrence.
- Strengthen the test suite, particularly integration and contract tests that
validate interactions between services.
- Implement progressive delivery (canary releases, feature flags) to limit the
impact of defective changes before they reach all users.
- Conduct blameless post-incident reviews and feed learnings back into the
delivery pipeline.
4.6 - Mean Time to Repair
Average time from when a production incident is detected until service is restored. A DORA lagging outcome metric for recovery capability.
Definition
Mean Time to Repair (MTTR) measures the average elapsed time between when a
production incident is detected and when it is fully resolved and service is
restored to normal operation.
MTTR reflects an organization’s ability to recover from failure. It encompasses
detection, diagnosis, fix development, build, deployment, and verification. A
short MTTR depends on the entire delivery system working well: fast builds,
automated deployments, good observability, and practiced incident response.
The Accelerate research identifies MTTR as one of the four key DORA metrics and
notes that “software delivery performance is a combination of lead time, release
frequency, and MTTR.” It is the stability counterpart to the throughput metrics.
MTTR is a lagging outcome metric: it reflects the combined effectiveness of observability,
rollback capability, pipeline speed, and incident response practices. The leading indicators
to address first are Build Duration (which sets the floor
on how fast a fix can be deployed) and Release Frequency
(teams that deploy often have well-rehearsed recovery procedures).
How to Measure
- Record the detection timestamp. This is when the team first becomes aware of
the incident, typically when an alert fires, a customer reports an issue, or
monitoring detects an anomaly.
- Record the resolution timestamp. This is when the incident is resolved and
service is confirmed to be operating normally. Resolution means the customer
impact has ended, not merely that a fix has been deployed.
- Calculate the duration for each incident.
- Compute the average across all incidents in a given period.
Data sources:
- Incident management platforms: PagerDuty, Opsgenie, ServiceNow, or
Statuspage provide incident lifecycle timestamps.
- Monitoring and alerting: alert trigger times from Datadog, Prometheus
Alertmanager, CloudWatch, or equivalent.
- Deployment logs: timestamps of rollbacks or hotfix deployments.
Report both the mean and the median. The mean can be skewed by a single long
outage, so the median gives a better sense of typical recovery time. Also track
the maximum MTTR per period to highlight worst-case incidents.
Targets
| Level | Mean Time to Repair |
|---|
| Low | More than 1 week |
| Medium | 1 day to 1 week |
| High | Less than 1 day |
| Elite | Less than 1 hour |
Elite performers restore service in under one hour. This requires automated
rollback or roll-forward capability, fast build pipelines, and well-practiced
incident response processes.
Common Pitfalls
- Closing incidents prematurely. Marking an incident as resolved before the
customer impact has actually ended artificially deflates MTTR. Define “resolved”
clearly and verify that service is truly restored.
- Not counting detection time. If the team discovers a problem informally
(e.g., a developer notices something odd) and fixes it before opening an
incident, the time is not captured. Encourage consistent incident reporting.
- Ignoring recurring incidents. If the same issue keeps reappearing, each
individual MTTR may be short, but the cumulative impact is high. Track recurrence
as a separate quality signal.
- Conflating MTTR with MTTD. Mean Time to Detect (MTTD) and Mean Time to
Repair overlap but are distinct. If you only measure from alert to resolution,
you miss the detection gap, the time between when the problem starts and when
it is detected. Both matter.
- Optimizing MTTR without addressing root causes. Getting faster at fixing
recurring problems is good, but preventing those problems in the first place is
better. Pair MTTR with Change Fail Rate to ensure the
number of incidents is also decreasing.
Connection to CD
MTTR is a direct measure of how well the entire Continuous Delivery system supports
recovery:
- Pipeline speed is the floor. The minimum possible MTTR for a roll-forward
fix is the Build Duration plus deployment time. A 30-minute
build means you cannot restore service via a code fix in less than 30 minutes.
Reducing build duration directly reduces MTTR.
- Automated deployment enables fast recovery. Teams that can deploy with one
click or automatically can roll back or roll forward in minutes. Manual
deployment processes add significant time to every incident.
- Feature flags accelerate mitigation. If a failing change is behind a feature
flag, the team can disable it in seconds without deploying new code. This can
reduce MTTR from minutes to seconds for flag-protected changes.
- Observability shortens detection and diagnosis. Good logging, metrics, and
tracing help the team identify the cause of an incident quickly. Without
observability, diagnosis dominates the repair timeline.
- Practice improves performance. Teams that deploy frequently have more
experience responding to issues. High Release Frequency
correlates with lower MTTR because the team has well-rehearsed recovery
procedures.
- Trunk-based development simplifies rollback. When trunk is always deployable,
the team can roll back to the previous commit. Long-lived branches and complex
merge histories make rollback risky and slow.
To improve MTTR:
- Keep the pipeline always deployable so a fix can be deployed at any time.
- Reduce Build Duration to enable faster roll-forward.
- Implement feature flags for large changes so they can be disabled without
redeployment.
- Invest in observability: structured logging, distributed tracing, and
meaningful alerting.
- Practice incident response regularly, including deploying rollbacks and hotfixes.
- Conduct blameless post-incident reviews and feed learnings back into the pipeline
and monitoring.
4.7 - Release Frequency
How often changes are deployed to production. A DORA lagging outcome metric that confirms delivery throughput.
Definition
Release Frequency (also called Deployment Frequency) measures how often a team
successfully deploys changes to production. It is expressed as deployments per day,
per week, or per month, depending on the team’s current cadence.
This is one of the four DORA key metrics and a lagging outcome metric. It reflects the
cumulative effect of upstream behaviors: work decomposition, integration practices, test
quality, and pipeline automation. Higher release frequency is a consequence of those behaviors
improving, not a lever to pull directly. To improve release frequency, improve
Integration Frequency and
Development Cycle Time first.
Each deployment should deliver a meaningful change. Re-deploying the same artifact
or deploying empty changes does not count.
How to Measure
- Count production deployments. Record each successful deployment to the
production environment over a defined period.
- Exclude non-changes. Do not count re-deployments of unchanged artifacts,
infrastructure-only changes (unless relevant), or deployments to non-production
environments.
- Calculate frequency. Divide the count by the time period. Express as
deployments per day (for high performers) or per week/month (for teams earlier
in their journey).
Data sources:
- CD platforms: Argo CD, Spinnaker, Flux, Octopus Deploy, or similar tools
track every deployment.
- Pipeline logs: GitHub Actions, GitLab CI, Jenkins, and CircleCI
record deployment job executions.
- Cloud provider logs: AWS CodeDeploy, Azure DevOps, GCP Cloud Deploy, and
Kubernetes audit logs.
- Custom deployment scripts: Add a logging line that records the timestamp,
service name, and version to a central log or metrics system.
Targets
| Level | Release Frequency |
|---|
| Low | Less than once per 6 months |
| Medium | Once per month to once per 6 months |
| High | Once per week to once per month |
| Elite | Multiple times per day |
These levels are drawn from the DORA State of DevOps research. Elite performers
deploy on demand, multiple times per day, with each deployment containing a small
set of changes.
Common Pitfalls
- Counting empty deployments. Re-deploying the same artifact or building
artifacts that contain no changes inflates the metric without delivering value.
Count only deployments with meaningful changes.
- Ignoring failed deployments. If you count deployments that are immediately
rolled back, the frequency looks good but the quality is poor. Pair with
Change Fail Rate to get the full picture.
- Equating frequency with value. Deploying frequently is a means, not an end.
Deploying 10 times a day delivers no value if the changes do not meet user needs.
Release Frequency measures capability, not outcome.
- Batch releasing to hit a target. Combining multiple changes into a single
release to deploy “more often” defeats the purpose. The goal is small, individual
changes flowing through the pipeline independently.
- Focusing on speed without quality. If release frequency increases but
Change Fail Rate also increases, the team is releasing
faster than its quality processes can support. Slow down and improve the pipeline.
Connection to CD
Release Frequency is the ultimate output metric of a Continuous Delivery pipeline:
- Validates the entire delivery system. High release frequency is only possible
when the pipeline is fast, tests are reliable, deployment is automated, and the
team has confidence in the process. It is the end-to-end proof that CD is working.
- Reduces deployment risk. Each deployment carries less change when deployments
are frequent. Less change means less risk, easier rollback, and simpler
debugging when something goes wrong.
- Enables rapid feedback. Frequent releases get features and fixes in front of
users sooner. This shortens the feedback loop and allows the team to course-correct
before investing heavily in the wrong direction.
- Exercises recovery capability. Teams that deploy frequently practice the
deployment process daily. When a production incident occurs, the deployment
process is well-rehearsed and reliable, directly improving
Mean Time to Repair.
- Decouples deploy from release. At high frequency, teams separate the act of
deploying code from the act of enabling features for users. Feature flags,
progressive delivery, and dark launches become standard practice.
To improve Release Frequency:
- Reduce Development Cycle Time by decomposing work
into smaller increments.
- Remove manual handoffs to other teams (e.g., ops, QA, change management).
- Automate every step of the deployment process, from build through production
verification.
- Replace manual change approval boards with automated policy checks and peer
review.
- Convert hard dependencies on other teams or services into soft dependencies using
feature flags and service virtualization.
- Adopt Trunk-Based Development so that
trunk is always in a deployable state.
4.8 - Work in Progress
Number of work items started but not yet completed. A leading indicator of flow problems, context switching, and delivery delays.
Definition
Work in Progress (WIP) is the total count of work items that have been started but
not yet completed and delivered to production. This includes all types of work:
stories, defects, tasks, spikes, and any other items that a team member has begun
but not finished.
WIP is a leading indicator from Lean manufacturing. Unlike trailing metrics such as
Development Cycle Time or
Lead Time, WIP tells you about problems that are happening right
now. High WIP predicts future delivery delays, increased cycle time, and lower
quality.
Little’s Law provides the mathematical relationship:
If throughput (the rate at which items are completed) stays constant, increasing WIP
directly increases cycle time. The only way to reduce cycle time without working
faster is to reduce WIP.
How to Measure
- Count all in-progress items. At a regular cadence (daily or at each standup),
count the number of items in any active state on your team’s board. Include
everything between “To Do” and “Done.”
- Normalize by team size. Divide WIP by the number of team members to get a
per-person ratio. This makes the metric comparable across teams of different sizes.
- Track over time. Record the WIP count daily and observe trends. A rising WIP
count is an early warning of delivery problems.
Data sources:
- Kanban boards: Jira, Azure Boards, Trello, GitHub Projects, or physical
boards. Count cards in any column between the backlog and done.
- Issue trackers: Query for items with an “In Progress,” “In Review,”
“In QA,” or equivalent active status.
- Manual count: At standup, ask: “How many things are we actively working on
right now?”
The simplest and most effective approach is to make WIP visible by keeping the team
board up to date and counting active items daily.
Targets
| Level | WIP per Team |
|---|
| Low | More than 2x team size |
| Medium | Between 1x and 2x team size |
| High | Equal to team size |
| Elite | Less than team size (ideally half) |
The guiding principle is that WIP should never exceed team size. A team of five
should have at most five items in progress at any time. Elite teams often work
in pairs, bringing WIP to roughly half the team size.
Common Pitfalls
- Hiding work. Not moving items to “In Progress” when working on them keeps
WIP artificially low. The board must reflect reality. If someone is working on
it, it should be visible.
- Marking items done prematurely. Moving items to “Done” before they are
deployed to production understates WIP. The Definition of Done must include
production deployment.
- Creating micro-tasks. Splitting a single story into many small tasks
(development, testing, code review, deployment) and tracking each separately
inflates the item count without changing the actual work. Measure WIP at the
story or feature level.
- Ignoring unplanned work. Production support, urgent requests, and
interruptions consume capacity but are often not tracked on the board. If the
team is spending time on it, it is WIP and should be visible.
- Setting WIP limits but not enforcing them. WIP limits only work if the team
actually stops starting new work when the limit is reached. Treat WIP limits as
a hard constraint, not a suggestion.
Connection to CD
WIP is the most actionable flow metric and directly impacts every aspect of
Continuous Delivery:
- Predicts cycle time. Per Little’s Law, WIP and cycle time are directly
proportional. Reducing WIP is the fastest way to reduce
Development Cycle Time without changing anything
else about the delivery process.
- Reduces context switching. When developers juggle multiple items, they lose
time switching between contexts. Research consistently shows that each additional
item in progress reduces effective productivity. Low WIP means more focus and
faster completion.
- Exposes blockers. When WIP limits are in place and an item gets blocked, the
team cannot simply start something new. They must resolve the blocker first. This
forces the team to address systemic problems rather than working around them.
- Enables continuous flow. CD depends on a steady flow of small changes moving
through the pipeline. High WIP creates irregular, bursty delivery. Low WIP
creates smooth, predictable flow.
- Improves quality. When teams focus on fewer items, each item gets more
attention. Code reviews happen faster, testing is more thorough, and defects are
caught sooner. This naturally reduces Change Fail Rate.
- Supports trunk-based development. High WIP often correlates with many
long-lived branches. Reducing WIP encourages developers to complete and integrate
work before starting something new, which aligns with
Integration Frequency goals.
To reduce WIP:
- Set explicit WIP limits for the team and enforce them. Start with a limit equal
to team size and reduce it over time.
- Prioritize finishing work over starting new work. At standup, ask “What can I
help finish?” before “What should I start?”
- Prioritize code review and pairing to unblock teammates over picking up new items.
- Make the board visible and accurate. Use it as the single source of truth for
what the team is working on.
- Identify and address recurring blockers that cause items to stall in progress.
5 - DORA Recommended Practices
The practices that drive software delivery performance, as identified by DORA research.
The DevOps Research and Assessment (DORA) research program has identified practices that
predict high software delivery performance. These practices are not tools or technologies.
They are cultural conditions and behaviors that enable teams to deliver software quickly,
reliably, and sustainably.
This page organizes the DORA recommended practices by their relevance to each migration phase. Use it
as a reference to understand which practices you are building at each stage of your journey
and which ones to focus on next.
Using This Table
“Primary” means the phase where the practice is the main focus of improvement work.
“Ongoing” means the practice is relevant in every phase and should be continuously
nurtured. “Started” or “Expanded” means the practice is introduced or deepened in that
phase. No entry means the practice is not a primary concern in that phase, though it may
still be relevant.
Practice Maturity by Phase
| Practice | Phase 0 | Phase 1 | Phase 2 | Phase 3 | Phase 4 |
|---|
| Version control | Prerequisite | | | | |
| Continuous integration | | Primary | | | |
| Deployment automation | | | Primary | | |
| Trunk-based development | | Primary | | | |
| Test automation | | Primary | Expanded | | |
| Test data management | | | Primary | | |
| Shift left on security | | | Primary | | |
| Loosely coupled architecture | | | | Primary | |
| Empowered teams | Ongoing | Ongoing | Ongoing | Ongoing | Ongoing |
| Customer feedback | | | | | Primary |
| Value stream visibility | Primary | | | Revisited | |
| Working in small batches | | Started | | Primary | |
| Team experimentation | Ongoing | Ongoing | Ongoing | Ongoing | Ongoing |
| Limit WIP | | | | Primary | |
| Visual management | Started | Ongoing | Ongoing | Ongoing | Ongoing |
| Monitoring and observability | | | Started | Expanded | Primary |
| Proactive notification | | | | | Primary |
| Generative culture | Ongoing | Ongoing | Ongoing | Ongoing | Ongoing |
| Learning culture | Ongoing | Ongoing | Ongoing | Ongoing | Ongoing |
| Collaboration among teams | | Started | Primary | | |
| Job satisfaction | Ongoing | Ongoing | Ongoing | Ongoing | Ongoing |
| Transformational leadership | Ongoing | Ongoing | Ongoing | Ongoing | Ongoing |
Continuous Delivery Practices
These practices directly support the mechanics of getting software from commit to production.
They are the primary focus of Phases 1 and 2 of the migration.
Version Control
All production artifacts (application code, test code, infrastructure configuration,
deployment scripts, and database schemas) are stored in version control and can be
reproduced from a single source of truth.
Migration relevance: This is a prerequisite for Phase 1. If any part of your delivery
process depends on files stored on a specific person’s machine or a shared drive, address that
before beginning the migration.
Continuous Integration
Developers integrate their work to trunk at least daily. Each integration triggers an
automated build and test process. Broken builds are fixed within minutes.
Migration relevance: Phase 1: Foundations. CI is the gateway
practice. Without it, none of the pipeline practices in Phase 2 can function. See
Build Automation and
Trunk-Based Development.
Deployment Automation
Deployments are fully automated and can be triggered by anyone on the team. No manual steps
are required between a green pipeline and production.
Migration relevance: Phase 2: Pipeline. Specifically,
Single Path to Production and
Rollback.
Trunk-Based Development
Developers work in small batches and merge to trunk at least daily. Branches, if used, are
short-lived (less than one day). There are no long-lived feature branches.
Migration relevance: Phase 1: Trunk-Based Development.
This is one of the first practices to establish because it enables CI.
Test Automation
A comprehensive suite of automated tests provides confidence that the software is deployable.
Tests are reliable, fast, and maintained as carefully as production code.
Migration relevance: Phase 1: Testing Fundamentals.
Also see the Testing reference section for guidance on specific test types.
Test Data Management
Test data is managed in a way that allows automated tests to run independently, repeatably,
and without relying on shared mutable state. Tests can create and clean up their own data.
Migration relevance: Becomes critical during Phase 2 when you need
production-like environments and deterministic pipeline results.
Shift Left on Security
Security is integrated into the development process rather than added as a gate at the end.
Automated security checks run in the pipeline. Security requirements are part of the
definition of deployable.
Migration relevance: Integrated during Phase 2: Pipeline Architecture
as automated quality gates rather than manual review steps.
Architecture Practices
These practices address the structural characteristics of your system that enable or prevent
independent, frequent deployment.
Loosely Coupled Architecture
Teams can deploy their services independently without coordinating with other teams. Changes
to one service do not require changes to other services. APIs have well-defined contracts.
Migration relevance: Phase 3: Architecture Decoupling.
This practice becomes critical when optimizing for deployment frequency and small batch sizes.
Product and Process Practices
These practices address how work is planned, prioritized, and delivered.
Customer Feedback
Product decisions are informed by direct feedback from customers. Teams can observe how
features are used in production and adjust accordingly.
Migration relevance: Becomes fully enabled in Phase 4: Deliver on Demand
when every change reaches production quickly enough for real customer feedback to inform
the next change.
Value Stream Visibility
The team has a clear view of the entire delivery process from request to production, including
wait times, handoffs, and rework loops.
Migration relevance: Phase 0: Value Stream Mapping.
This is the first activity in the migration because it informs every decision that follows.
Working in Small Batches
Work is broken down into small increments that can be completed, tested, and deployed
independently. Each increment delivers measurable value or validated learning.
Migration relevance: Begins in Phase 1: Work Decomposition
and is optimized in Phase 3: Small Batches.
Limit Work in Progress
Teams have explicit WIP limits that constrain the number of items in any stage of the delivery
process. WIP limits are enforced and respected.
Migration relevance: Phase 3: Limiting WIP. Reducing WIP
is one of the most effective ways to improve lead time and delivery predictability.
Visual Management
The state of all work is visible to the entire team through dashboards, boards, or other
visual tools. Anyone can see what is in progress, what is blocked, and what has been deployed.
Migration relevance: All phases. Visual management supports the identification of
constraints in Phase 0 and the enforcement of WIP limits in Phase 3.
Monitoring and Observability
Teams have access to production metrics, logs, and traces that allow them to understand system
behavior, detect issues, and diagnose problems quickly.
Migration relevance: Critical for Phase 4: Progressive Rollout
where automated health checks determine whether a deployment proceeds or rolls back. Also
supports fast mean time to restore.
Proactive Notification
Teams are alerted to problems before customers are affected. Monitoring thresholds and
anomaly detection trigger notifications that enable rapid response.
Migration relevance: Becomes critical in Phase 4 when deployments are continuous and
automated. Proactive notification is what makes continuous deployment safe.
Collaboration Among Teams
Development, operations, security, and product teams work together rather than in silos.
Handoffs are minimized. Shared responsibility replaces blame.
Migration relevance: All phases, but especially Phase 2: Pipeline
where the pipeline must encode the quality criteria from all disciplines (security, testing,
operations) into automated gates.
Practices Relevant in Every Phase
The following practices are not tied to a specific migration phase. They are conditions
that support every phase and should be cultivated continuously throughout the migration.
Empowered Teams. Teams choose their own tools, technologies, and approaches within
organizational guardrails. Teams that cannot make local decisions about their pipeline, test
strategy, or deployment approach will be unable to iterate quickly enough to make progress.
Team Experimentation. Teams can try new ideas, tools, and approaches without requiring
lengthy approval. Failed experiments are treated as learning, not waste. The migration itself
is an experiment that requires psychological safety and organizational support.
Generative Culture. Following Ron Westrum’s typology, a generative culture is characterized
by high cooperation, shared risk, and focus on the mission. Teams in pathological or
bureaucratic cultures will struggle with every phase because practices like TBD and CI require
trust and psychological safety.
Learning Culture. The organization invests in learning. Teams have time for experimentation,
training, and knowledge sharing. The CD migration is a learning journey that requires time and
space to learn new practices, make mistakes, and improve.
Job Satisfaction. Team members find their work meaningful and have the autonomy and resources
to do it well. The migration should improve job satisfaction by reducing
toil and giving teams faster feedback. If the migration is experienced as a
burden, something is wrong with the approach.
Transformational Leadership. Leaders support the migration with vision, resources, and
organizational air cover. Without leadership support, the migration will stall when it
encounters the first organizational blocker.
6 - CD Dependency Tree
Visual guide showing how CD practices depend on and build upon each other.
The full interactive dependency tree is at
practices.minimumcd.org. This page summarizes the key
dependency chains and how they map to the migration phases in this guide.
Continuous delivery is not a single practice you adopt. It is a system of interdependent
practices where each one supports and enables others. Understanding these dependencies helps
you plan your migration in the right order, addressing foundational practices before building
on them.
Using the Tree to Diagnose Problems
When something in your delivery process is not working, trace it through the dependency tree
to find the root cause.
Deployments keep failing.
Look at what feeds CD in the tree. Is your pipeline deterministic? Are you using immutable artifacts? Is your application config externalized? The failure is likely in one of the
pipeline practices.
CI builds are constantly broken.
Look at what feeds CI. Are developers actually practicing TBD (integrating daily)? Is the test
suite reliable, or is it full of flaky tests? Is the build automated end-to-end? The broken
builds are a symptom of a problem in the development practices layer.
You cannot reduce batch size.
Look at what feeds small batches. Is work being decomposed into vertical slices? Are feature flags available so partial work can be deployed safely? Is the architecture decoupled enough
to allow independent deployment? The batch size problem originates in one of these upstream
practices.
Every feature requires cross-team coordination to deploy.
Look at team structure. Are teams organized around domains they can deliver independently, or
around technical layers that force handoffs for every feature? If deploying a feature requires
the frontend team, backend team, and DBA team to coordinate a release window, the team
structure is preventing independent delivery. No amount of pipeline automation fixes this.
The team boundaries need to change.
Migration Tip
When you encounter a problem, resist the urge to fix the symptom. Use the
dependency tree to trace the problem to its root cause.
Fixing the symptom (for example, adding more manual testing to catch deployment failures) will
not solve the underlying issue and often adds toil that makes things worse. Fix the dependency
that is broken, and the downstream problem resolves itself.
Mapping to Migration Phases
The dependency tree directly informs the sequencing of migration phases:
| Dependency Layer | Migration Phase | Why This Order |
|---|
| Development practices (BDD, trunk-based development) | Phase 1 - Foundations | These are prerequisites for CI, which is a prerequisite for everything else |
| Build and test infrastructure (build automation, automated testing, test environments) | Phase 1 and Phase 2 | You need reliable build and test infrastructure before you can build a reliable pipeline |
| Pipeline practices (application pipeline, immutable artifacts, configuration management, rollback) | Phase 2 - Pipeline | The pipeline depends on solid CI and development practices |
| Flow optimization (small batches, feature flags, WIP limits, metrics) | Phase 3 - Optimize | Optimization requires a working pipeline to optimize |
| Organizational practices (cross-functional teams, component ownership, developer-driven support) | All phases | These cross-cutting practices support every phase. Team structure should be addressed early because it constrains architecture and work decomposition |
Understanding the Dependency Model
How Dependencies Work
CD sits at the top of the tree. It depends directly on many practices, each of which has its own
dependencies. When practice A depends on practice B, it means B is a prerequisite or enabler
for A. You cannot reliably adopt A without B in place.
For example, continuous delivery depends directly on:
| Category | Direct Dependencies |
|---|
| Pipeline | Application pipeline, immutable artifacts, on-demand rollback, configuration management |
| Testing | Continuous testing, automated database changes, test environments |
| Integration | Continuous integration |
| Environment | Automated environment provisioning, monitoring and alerting |
| Organizational | Cross-functional product teams, developer-driven support, prioritized features |
| Development | ATDD, modular system design |
Each of these has its own dependency chain. The application pipeline alone depends on automated
testing, deployment automation, automated artifact versioning, and quality gates. Automated
testing in turn depends on build automation. Build automation depends on version control and
dependency management. The chain runs deep.
Key Dependency Chains
BDD enables testing enables CI enables CD
Behavior-Driven Development produces clear, testable acceptance criteria. Those criteria drive
component testing and acceptance test-driven development. A comprehensive, fast test suite
enables Continuous Integration with confidence. And CI is the foundational prerequisite for CD.
If your team skips BDD, stories are ambiguous. If stories are ambiguous, tests are incomplete
or wrong. If tests are unreliable, CI is unreliable. And if CI is unreliable, CD is impossible.
Trunk-Based Development enables CI
CI requires that all developers integrate to a shared trunk at least once per day. If your team
uses long-lived feature branches, you are not doing CI regardless of how often your build server
runs. TBD is not optional for CD. It is a prerequisite.
Cross-functional teams enable component ownership enables modular systems
How teams are organized determines what they can deliver independently. A team organized around a
domain (owning the services, data, and interfaces for that domain) can decompose work into
vertical slices within their boundary and deploy without
coordinating with other teams. A team organized around a technical layer (the “frontend team,”
the “DBA team”) cannot. Every feature requires handoffs across layer teams, and deployment
requires coordinating all of them.
Conway’s Law makes this structural: the system’s architecture will mirror the team structure.
In the dependency tree, cross-functional product teams enable component ownership, which enables
the modular system design that CD requires.
Version control is the root of everything
Nearly every automation practice traces back to version control. Build automation, configuration
management, infrastructure automation, and component ownership all depend on it. If your version
control practices are weak (infrequent commits, poor branching discipline, configuration stored
outside version control), the entire tree above it is compromised.
7 - Glossary
Key terms and definitions used throughout this guide.
This glossary defines the terms used across every phase of the CD migration guide. Where a term
has a specific meaning within a migration phase, the relevant phase is noted.
For terms related to agentic continuous delivery, AI agents, and LLMs, see the
Agentic CD Glossary.
A
Acceptance Criteria
Concrete expectations for a change, expressed as observable outcomes that can be used as fitness
functions - executed as deterministic tests or evaluated by review agents. In
ACD, acceptance criteria include a done definition (what
“done” looks like from an observer’s perspective) and an evaluation design (test cases with
known-good outputs). They constrain the agent: comprehensive criteria prevent incorrect code
from passing, while shallow criteria allow code that passes tests but violates intent. See
Acceptance Criteria.
Referenced in:
Agent-Assisted Specification,
Agent Delivery Contract,
AI Adoption Roadmap,
AI-Generated Code Ships Without Developer Understanding,
AI Is Generating Technical Debt Faster Than the Team Can Absorb It,
AI Tooling Slows You Down Instead of Speeding You Up,
CD Dependency Tree,
Find Your Symptom,
Pipeline Enforcement and Expert Agents,
Pitfalls and Metrics,
Rubber-Stamping AI-Generated Code,
Small-Batch Agent Sessions,
Testing Fundamentals,
The Four Prompting Disciplines,
Tokenomics: Optimizing Token Usage in Agent Architecture,
Work Decomposition,
Working Agreements
ACD (Agentic Continuous Delivery)
See Agentic CD Glossary.
Agent (AI)
See Agentic CD Glossary.
Agent Loop
See Agentic CD Glossary.
Agent Session
See Agentic CD Glossary.
Artifact
A packaged, versioned output of a build process (e.g., a container image, JAR file, or binary).
In a CD pipeline, artifacts are built once and promoted through environments without
modification. See Immutable Artifacts.
Referenced in:
Agent-Assisted Specification,
Agentic Architecture Patterns,
Agentic Continuous Delivery (ACD),
Build Automation,
Build Duration,
CD for Greenfield Projects,
Coding and Review Agent Configuration,
Data Pipelines and ML Models Have No Deployment Automation,
Deployable Definition,
Deployments Are One-Way Doors,
Deterministic Pipeline,
Developers Cannot Run the Pipeline Locally,
DORA Recommended Practices,
End-to-End Tests,
Every Change Requires a Ticket and Approval Chain,
Experience Reports,
Component Tests,
Independent Teams, Independent Deployables,
Merge Freezes Before Deployments,
Metrics-Driven Improvement,
Missing Deployment Pipeline,
Multiple Teams, Single Deployable,
No Contract Testing Between Services,
No Evidence of What Was Deployed or When,
Pipeline Enforcement and Expert Agents,
Pitfalls and Metrics,
Rollback,
Single Team, Single Deployable,
Small-Batch Agent Sessions,
The Agentic Development Learning Curve,
The Build Runs Again for Every Environment,
Agent Delivery Contract,
The Team Ignores Alerts Because There Are Too Many,
The Team Is Afraid to Deploy,
Tightly Coupled Monolith,
Tokenomics: Optimizing Token Usage in Agent Architecture,
Working Agreements
B
Black Box Testing
See Testing Glossary.
Baseline Metrics
The set of delivery measurements taken before beginning a migration, used as the benchmark
against which improvement is tracked. See Phase 0 - Baseline Metrics.
Referenced in:
Phase 0: Assess
Batch Size
The amount of change included in a single deployment. Smaller batches reduce risk, simplify
debugging, and shorten feedback loops. Reducing batch size is a core focus of
Phase 3 - Small Batches.
Referenced in:
CD Dependency Tree,
DORA Recommended Practices,
FAQ,
Hardening Sprints Are Needed Before Every Release,
Metrics-Driven Improvement,
Missing Deployment Pipeline,
New Releases Introduce Regressions in Previously Working Functionality,
Phase 2: Pipeline,
Releases Are Infrequent and Painful,
Small Batches
BDD (Behavior-Driven Development)
A collaboration practice where developers, testers, and product representatives define expected
behavior using structured examples before code is written. BDD produces executable
specifications that serve as both documentation and automated tests. BDD supports effective
work decomposition by forcing clarity about what a
story actually means before development begins.
Referenced in:
Agent-Assisted Specification,
Agentic Continuous Delivery (ACD),
AI Tooling Slows You Down Instead of Speeding You Up,
CD Dependency Tree,
Coding and Review Agent Configuration,
Getting Started: Where to Put What,
Knowledge & Communication Defects,
Pipeline Enforcement and Expert Agents,
Pitfalls and Metrics,
Small Batches,
Small-Batch Agent Sessions,
TBD Migration Guide,
Agent Delivery Contract,
Work Decomposition
Blue-Green Deployment
A deployment strategy that maintains two identical production environments. New code is deployed
to the inactive environment, verified, and then traffic is switched. See
Progressive Rollout.
Referenced in:
Every Deployment Is Immediately Visible to All Users,
Process & Deployment Defects
Branch Lifetime
The elapsed time between creating a branch and merging it to trunk. CD requires branch lifetimes
measured in hours, not days or weeks. Long branch lifetimes are a symptom of poor work
decomposition or slow code review. See Trunk-Based Development.
Referenced in:
AI Adoption Roadmap,
FAQ,
Feedback Takes Hours Instead of Minutes,
Long-Lived Feature Branches,
Merging Is Painful and Time-Consuming,
Metrics-Driven Improvement,
TBD Migration Guide
C
Canary Deployment
A deployment strategy where a new version is rolled out to a small subset of users or servers
before full rollout. If the canary shows no issues, the deployment proceeds to 100%. See
Progressive Rollout.
Referenced in:
Change & Complexity Defects,
Pipeline Enforcement and Expert Agents,
Process & Deployment Defects,
Progressive Rollout
CD (Continuous Delivery)
The practice of ensuring that every change to the codebase is always in a deployable state and
can be released to production at any time through a fully automated pipeline. Continuous
delivery does not require that every change is deployed automatically, but it requires that
every change could be deployed automatically. This is the primary goal of this migration
guide.
Referenced in:
Agent-Assisted Specification,
AI Adoption Roadmap,
Agentic Continuous Delivery (ACD),
CD Dependency Tree,
CD for Greenfield Projects,
Change Advisory Board Gates,
Data Pipelines and ML Models Have No Deployment Automation,
Deterministic Pipeline,
DORA Recommended Practices,
Experience Reports,
FAQ,
Feature Flags,
Horizontal Slicing,
Independent Teams, Independent Deployables,
Inverted Test Pyramid,
Knowledge Silos,
Leadership Sees CD as a Technical Nice-to-Have,
Learning Paths,
Long-Lived Feature Branches,
Manual Testing Only,
Metrics-Driven Improvement,
Missing Deployment Pipeline,
Phase 0: Assess,
Phase 1: Foundations,
Phase 2: Pipeline,
Phase 3: Optimize,
Pipeline Enforcement and Expert Agents,
Pipeline Reference Architecture,
Process & Deployment Defects,
Push-Based Work Assignment,
Retrospectives,
Rubber-Stamping AI-Generated Code,
Small Batches,
Team Membership Changes Constantly,
Test Doubles,
Testing Fundamentals,
The Deployment Target Does Not Support Modern CI/CD Tooling,
Thin-Spread Teams,
Tightly Coupled Monolith,
Unit Tests,
Work Decomposition
Change Failure Rate (CFR)
The percentage of deployments to production that result in a degraded service and require
remediation (e.g., rollback, hotfix, or patch). One of the four DORA metrics. See
Metrics - Change Fail Rate.
Referenced in:
Architecture Decoupling,
CD for Greenfield Projects,
Change Advisory Board Gates,
Experience Reports,
FAQ,
Metrics-Driven Improvement,
Phase 0: Assess,
Pitfalls and Metrics,
Retrospectives
CI (Continuous Integration)
The practice of integrating code changes to a shared trunk at least once per day, where each
integration is verified by an automated build and test suite. CI is a prerequisite for CD, not
a synonym. A team that runs automated builds on feature branches but merges weekly is not doing
CI. See Build Automation.
Referenced in:
Architecture Decoupling,
CD Dependency Tree,
CD for Greenfield Projects,
Change & Complexity Defects,
Data & State Defects,
Data Pipelines and ML Models Have No Deployment Automation,
Dependency & Infrastructure Defects,
Deterministic Pipeline,
Developers Cannot Run the Pipeline Locally,
Experience Reports,
FAQ,
Feedback Takes Hours Instead of Minutes,
Component Tests,
Integration & Boundaries Defects,
Inverted Test Pyramid,
It Works on My Machine,
Long-Lived Feature Branches,
Manual Testing Only,
Merge Freezes Before Deployments,
Merging Is Painful and Time-Consuming,
Metrics-Driven Improvement,
Missing Deployment Pipeline,
No Evidence of What Was Deployed or When,
Performance & Resilience Defects,
Pipeline Enforcement and Expert Agents,
Pipeline Reference Architecture,
Process & Deployment Defects,
Coding and Review Agent Configuration,
Agentic Architecture Patterns,
Security & Compliance Defects,
Security Review Is a Gate, Not a Guardrail,
Services Reach Production with No Health Checks or Alerting,
Small-Batch Agent Sessions,
Symptoms for Developers,
Test Suite Is Too Slow to Run,
Testing & Observability Gap Defects,
Tests Pass in One Environment but Fail in Another,
Tests Randomly Pass or Fail,
The Development Workflow Has Friction at Every Step,
Unit Tests
Constraint
In the Theory of Constraints, the single factor most limiting the throughput of a system.
During a CD migration, your job is to find and fix constraints in order of impact. See
Identify Constraints.
Referenced in:
Agent-Assisted Specification,
Agent Delivery Contract,
AI Is Generating Technical Debt Faster Than the Team Can Absorb It,
Baseline Metrics,
Build Automation,
Current State Checklist,
DORA Recommended Practices,
Experience Reports,
FAQ,
Identify Constraints,
Knowledge Silos,
Learning Paths,
Migrate to CD,
Migrating Brownfield to CD,
Multiple Services Must Be Deployed Together,
Phase 0: Assess,
Push-Based Work Assignment,
Releases Are Infrequent and Painful,
Releases Depend on One Person,
Security Review Is a Gate, Not a Guardrail,
Sprint Planning Is Dominated by Dependency Negotiation,
The Agentic Development Learning Curve,
The Four Prompting Disciplines,
Untestable Architecture,
Value Stream Mapping
Context (LLM)
See Agentic CD Glossary.
Context Window
See Agentic CD Glossary.
Context Engineering
See Agentic CD Glossary.
Continuous Deployment
An extension of continuous delivery where every change that passes the automated pipeline is
deployed to production without manual intervention. Continuous delivery ensures every change
can be deployed; continuous deployment ensures every change is deployed. See
Phase 4 - Deliver on Demand.
Referenced in:
AI Adoption Roadmap,
Architecture Decoupling,
Change Advisory Board Gates,
DORA Recommended Practices,
Experience Reports,
FAQ,
Feature Flags,
Tightly Coupled Monolith
D
Deployable
A change that has passed all automated quality gates defined by the team and is ready for
production deployment. The definition of deployable is codified in the pipeline, not decided
by a person at deployment time. See Deployable Definition.
Referenced in:
CD for Greenfield Projects,
DORA Recommended Practices,
Deployable Definition,
Everything Started, Nothing Finished,
Experience Reports,
FAQ,
Component Tests,
Horizontal Slicing,
Independent Teams, Independent Deployables,
Long-Lived Feature Branches,
Merge Freezes Before Deployments,
Monolithic Work Items,
Multiple Services Must Be Deployed Together,
Multiple Teams, Single Deployable,
Releases Are Infrequent and Painful,
Rubber-Stamping AI-Generated Code,
Small Batches,
Team Alignment to Code,
Trunk-Based Development,
Work Decomposition,
Work Items Take Days or Weeks to Complete,
Working Agreements
Deployment Frequency
How often an organization successfully deploys to production. One of the four DORA metrics.
See Metrics - Release Frequency.
Referenced in:
Architecture Decoupling,
CD for Greenfield Projects,
Change Advisory Board Gates,
DORA Recommended Practices,
Experience Reports,
Integration Frequency,
Leadership Sees CD as a Technical Nice-to-Have,
Metrics-Driven Improvement,
Missing Deployment Pipeline,
No Contract Testing Between Services,
Phase 0: Assess,
Process & Deployment Defects,
Release Frequency,
Retrospectives,
Single Path to Production,
TBD Migration Guide,
The Team Is Caught Between Shipping Fast and Not Breaking Things,
Tightly Coupled Monolith,
Untestable Architecture
Development Cycle Time
The elapsed time from the first commit on a change to that change being deployable. This
measures the efficiency of your development and pipeline process, excluding upstream wait times.
See Metrics - Development Cycle Time.
Dependency
Code, service, or resource whose behavior is not defined in the current module. Dependencies
vary by location and ownership:
- Internal dependency - code in another file or module within the same repository, or in
another repository your team controls. Internal dependencies share your release cycle and
your team can change them directly.
- External dependency - a third-party library, external API, or
managed service outside your team’s direct control.
The distinction matters for testing. Internal dependencies are part of your own codebase and
should be exercised through real code paths in tests. Replacing them with
test doubles couples your tests to
implementation details and causes rippling failures during routine refactoring. Reserve test
doubles for external dependencies and runtime connections where real
invocation is impractical or non-deterministic.
See also: Hard Dependency, Soft Dependency.
Referenced in:
Defect Feedback Loop,
Testing Fundamentals,
The Agentic Development Learning Curve,
Work Decomposition
Declarative Agent
See Agentic CD Glossary.
Delivery Contract
See Agentic CD Glossary.
Done Definition
The observable outcomes portion of acceptance criteria. A done definition
describes what “done” looks like from an independent observer’s perspective - someone who was
not involved in the implementation. Combined with an evaluation design,
done definitions form the testable boundary of a delivery contract. See
Agent Delivery Contract.
Referenced in:
Agent Delivery Contract,
Agent-Assisted Specification
DORA Metrics
The four key metrics identified by the DORA (DevOps Research and Assessment) research program
as predictive of software delivery performance: deployment frequency, lead time for changes,
change failure rate, and mean time to restore service. See DORA Recommended Practices.
Referenced in:
CD for Greenfield Projects,
Change Fail Rate,
Development Cycle Time,
DORA Recommended Practices,
Experience Reports,
FAQ,
Lead Time,
Mean Time to Repair,
Metrics-Driven Improvement,
Phase 3: Optimize,
Product & Discovery Defects,
Release Frequency,
Retrospectives,
Small Batches,
Work Decomposition
E
External Dependency
A dependency on code or services outside your team’s direct control. External
dependencies include third-party libraries, public APIs, managed cloud services, and any
resource whose release cycle and availability your team cannot influence.
External dependencies are the primary case where test doubles add value. A test double for an
external API verifies your integration logic without relying on network availability or
third-party rate limits. By contrast, mocking internal code - another class in the same
repository or a module your team owns - creates fragile tests that break whenever the internal
implementation changes, even when the behavior is correct.
When evaluating whether to mock something, ask: “Can my team change this code and release it
in our pipeline?” If yes, it is an internal dependency and should be tested through real code
paths. If no, it is an external dependency and a test double is appropriate.
See also: Dependency, Hard Dependency.
Referenced in:
Testing Fundamentals
Evaluation Design
See Agentic CD Glossary.
Expert Agent
See Agentic CD Glossary.
F
Feature Team
A team organized around user-facing features or customer journeys rather than owned product
subdomains. A feature team is cross-functional - it contains the skills to deliver a feature
end-to-end - but it does not own a stable domain of code. Multiple feature teams may modify
the same components, with no single team accountable for quality or consistency within them.
In practice: feature teams must re-orient on code they do not continuously maintain each time
a feature requires it; quality agreements cannot be enforced within the team because other
teams also modify the same code; and while feature teams appear to minimize inter-team
dependencies, they produce the opposite - everyone who can change a component is effectively
on the same large, loosely communicating team. Feature teams are structurally equivalent to
long-lived project teams.
Contrast with full-stack product team and
subdomain product team, which achieve cross-functional delivery
through stable domain ownership rather than feature-by-feature assembly.
Referenced in:
Team Alignment to Code
Feature Flag
A mechanism that allows code to be deployed to production with new functionality disabled,
then selectively enabled for specific users, percentages of traffic, or environments. Feature
flags decouple deployment from release. See Feature Flags.
Referenced in:
Architecture Decoupling,
CD Dependency Tree,
CD for Greenfield Projects,
Change & Complexity Defects,
Change Advisory Board Gates,
Change Fail Rate,
Database Migrations Block or Break Deployments,
Deploying Stateful Services Causes Outages,
Every Change Requires a Ticket and Approval Chain,
Every Deployment Is Immediately Visible to All Users,
Experience Reports,
FAQ,
Feature Flags,
Hard-Coded Environment Assumptions,
Horizontal Slicing,
Integration Frequency,
Long-Lived Feature Branches,
Mean Time to Repair,
Monolithic Work Items,
Phase 3: Optimize,
Pipeline Enforcement and Expert Agents,
Product & Discovery Defects,
Progressive Rollout,
Rollback,
Single Path to Production,
Small Batches,
TBD Migration Guide,
Teams Cannot Change Their Own Pipeline Without Another Team,
The Team Resists Merging to the Main Branch,
Trunk-Based Development,
Vendor Release Cycles Constrain the Team’s Deployment Frequency,
Work Decomposition,
Work Requires Sign-Off from Teams Not Involved in Delivery,
Working Agreements
Flow Efficiency
The ratio of active work time to total elapsed time in a delivery process. A flow efficiency of
15% means that for every hour of actual work, roughly 5.7 hours are spent waiting. Value stream
mapping reveals your flow efficiency. See Value Stream Mapping.
Referenced in:
Value Stream Mapping
Full-Stack Product Team
A team that owns every layer of a user-facing capability - UI, API, and data store - and whose
public interface is designed for human users. A vertical slice for a full-stack product team
delivers one observable behavior from the user interface through to the database. The slice is
done when a user can observe the behavior through that interface. Contrast with
subdomain product team.
Referenced in:
Horizontal Slicing,
Small Batches,
Work Decomposition
G
Guardrail
A safety constraint encoded in a pipeline, system prompt, or
hook that limits what an agent can do. Guardrails are deterministic
boundaries, not suggestions. Examples include pre-commit hooks that block secrets from being
committed, pipeline gates that reject changes exceeding a complexity threshold, and system
prompt rules that prevent an agent from modifying test specifications. Guardrails protect
against both agent errors and hallucinations without requiring human
intervention on every change. See
Pipeline Enforcement and Expert Agents.
Referenced in:
AI Adoption Roadmap,
Coding and Review Agent Configuration,
Pipeline Enforcement and Expert Agents,
The Four Prompting Disciplines
GitFlow
A branching model created by Vincent Driessen in 2010 that uses multiple long-lived branches
(main, develop, release/*, hotfix/*, feature/*) with specific merge rules and
directions. GitFlow was designed for infrequent, scheduled releases and is fundamentally
incompatible with continuous delivery because it defers integration, creates multiple paths
to production, and adds merge complexity. See the
TBD Migration Guide
for a step-by-step path from GitFlow to trunk-based development.
Referenced in:
Single Path to Production,
TBD Migration Guide,
Trunk-Based Development
H
Hard Dependency
A dependency that must be resolved before work can proceed. In delivery, hard dependencies
include things like waiting for another team’s API, a shared database migration, or an
infrastructure provisioning request. Hard dependencies create queues and increase lead time.
Eliminating hard dependencies is a focus of
Architecture Decoupling.
Referenced in:
Team Alignment to Code
Hallucination
See Agentic CD Glossary.
Hardening Sprint
A sprint dedicated to stabilizing and fixing defects before a release. The existence of
hardening sprints is a strong signal that quality is not being built in during regular
development. Teams practicing CD do not need hardening sprints because every commit is
deployable. See Testing Fundamentals.
Referenced in:
Hardening Sprints Are Needed Before Every Release
Hook (Agent)
See Agentic CD Glossary.
Hypothesis-Driven Development
An approach that frames every change as an experiment with a predicted outcome. Instead of
specifying a change as a requirement to implement, the team states a hypothesis: “We believe
[this change] will produce [this outcome] because [this reason].” After deployment, the team
validates whether the predicted outcome occurred. Changes that confirm the hypothesis build
confidence. Changes that refute it produce learning that informs the next hypothesis. This
creates a feedback loop where every deployed change generates a signal, whether it “succeeds”
or not. See Hypothesis-Driven Development
for the full lifecycle and
Agent Delivery Contract
for how hypotheses integrate with specification artifacts.
Referenced in:
Metrics-Driven Improvement,
Agent Delivery Contract,
Agent-Assisted Specification
I
Immutable Artifact
A build artifact that is never modified after creation. The same artifact that is tested in the
pipeline is the exact artifact that is deployed to production. Configuration differences between
environments are handled externally. See Immutable Artifacts.
Referenced in:
CD Dependency Tree,
FAQ,
Merge Freezes Before Deployments
Intent Engineering
See Agentic CD Glossary.
Integration Frequency
How often a developer integrates code to the shared trunk. CD requires at least daily
integration. See Metrics - Integration Frequency.
Referenced in:
The Team Has No Shared Agreements About How to Work
L
Lead Time for Changes
The elapsed time from when a commit is made to when it is successfully running in production.
One of the four DORA metrics. See Metrics - Lead Time.
Referenced in:
Architecture Decoupling,
CD for Greenfield Projects,
Development Cycle Time,
FAQ,
Lead Time,
Leadership Sees CD as a Technical Nice-to-Have,
Manual Testing Only,
Metrics-Driven Improvement,
Phase 0: Assess,
Retrospectives,
Security Review Is a Gate, Not a Guardrail,
Working Agreements
M
Mean Time to Restore (MTTR)
The elapsed time from when a production incident is detected to when service is restored. One
of the four DORA metrics. Teams practicing CD have short MTTR because deployments are small,
rollback is automated, and the cause of failure is easy to identify. See
Metrics - Mean Time to Repair.
Referenced in:
Architecture Decoupling,
CD for Greenfield Projects,
Metrics-Driven Improvement,
Retrospectives
Model Routing
See Agentic CD Glossary.
Modular Monolith
A single deployable application whose codebase is organized into well-defined modules with
explicit boundaries. Each module encapsulates a bounded domain and communicates with other
modules through defined interfaces, not by reaching into shared database tables or calling
internal methods directly. The application deploys as one unit, but its internal structure
allows teams to reason about, test, and change one module independently. See
Pipeline Reference Architecture and
Premature Microservices.
Referenced in:
Multiple Teams, Single Deployable,
Pipeline Reference Architecture,
Single Team, Single Deployable,
Team Alignment to Code
O
Orchestrator
See Agentic CD Glossary.
P
Pipeline
The automated sequence of build, test, and deployment stages that every change passes through
on its way to production. See Phase 2 - Pipeline.
Referenced in:
Agentic Continuous Delivery (ACD),
AI Adoption Roadmap,
CD Dependency Tree,
CD for Greenfield Projects,
Change Advisory Board Gates,
Data Pipelines and ML Models Have No Deployment Automation,
Database Migrations Block or Break Deployments,
Deploying Stateful Services Causes Outages,
Deployments Are One-Way Doors,
Deterministic Pipeline,
Developers Cannot Run the Pipeline Locally,
DORA Recommended Practices,
Each Language Has Its Own Ad Hoc Pipeline,
Every Change Rebuilds the Entire Repository,
Every Change Requires a Ticket and Approval Chain,
Every Deployment Is Immediately Visible to All Users,
Experience Reports,
Feedback Takes Hours Instead of Minutes,
Component Tests,
Getting a Test Environment Requires Filing a Ticket,
Getting Started: Where to Put What,
High Coverage but Tests Miss Defects,
Horizontal Slicing,
Independent Teams, Independent Deployables,
Inverted Test Pyramid,
Leadership Sees CD as a Technical Nice-to-Have,
Long-Lived Feature Branches,
Manual Testing Only,
Merge Freezes Before Deployments,
Metrics-Driven Improvement,
Missing Deployment Pipeline,
No Evidence of What Was Deployed or When,
Phase 1: Foundations,
Phase 2: Pipeline,
Phase 3: Optimize,
Pipeline Enforcement and Expert Agents,
Pipeline Reference Architecture,
Pipelines Take Too Long,
Pitfalls and Metrics,
Process & Deployment Defects,
Product & Discovery Defects,
Production Issues Discovered by Customers,
Production Problems Are Discovered Hours or Days Late,
Push-Based Work Assignment,
Retrospectives,
Rubber-Stamping AI-Generated Code,
Coding and Review Agent Configuration,
Agentic Architecture Patterns,
Recommended Patterns for Agentic Workflow Architecture,
Releases Are Infrequent and Painful,
Releases Depend on One Person,
Security Review Is a Gate, Not a Guardrail,
Services in the Same Portfolio Have Wildly Different Maturity Levels,
Services Reach Production with No Health Checks or Alerting,
Small-Batch Agent Sessions,
Testing Fundamentals,
Staging Passes but Production Fails,
Symptoms for Developers,
TBD Migration Guide,
Team Alignment to Code,
Teams Cannot Change Their Own Pipeline Without Another Team,
Test Doubles,
Test Environments Take Too Long to Reset Between Runs,
Test Suite Is Too Slow to Run,
Tests Pass in One Environment but Fail in Another,
Tests Randomly Pass or Fail,
The Agentic Development Learning Curve,
The Build Runs Again for Every Environment,
The Deployment Target Does Not Support Modern CI/CD Tooling,
The Development Workflow Has Friction at Every Step,
Agent Delivery Contract,
The Team Ignores Alerts Because There Are Too Many,
The Team Is Afraid to Deploy,
The Team Is Caught Between Shipping Fast and Not Breaking Things,
The Team Resists Merging to the Main Branch,
Thin-Spread Teams,
Tightly Coupled Monolith,
Tokenomics: Optimizing Token Usage in Agent Architecture,
Vendor Release Cycles Constrain the Team’s Deployment Frequency,
Work Requires Sign-Off from Teams Not Involved in Delivery,
Your Migration Journey
Production-Like Environment
A test or staging environment that matches production in configuration, infrastructure, and
data characteristics. Testing in environments that differ from production is a common source
of deployment failures. See Production-Like Environments.
Referenced in:
CD for Greenfield Projects,
DORA Recommended Practices,
FAQ,
Hard-Coded Environment Assumptions,
Pipeline Enforcement and Expert Agents,
Pipeline Reference Architecture,
Progressive Rollout,
Stakeholders See Working Software Only at Release Time,
TBD Migration Guide
Prompt
See Agentic CD Glossary.
Prompt Caching
See Agentic CD Glossary.
Prompt Craft
See Agentic CD Glossary.
Prompting Discipline
See Agentic CD Glossary.
Programmatic Agent
See Agentic CD Glossary.
R
Rollback
The ability to revert a production deployment to a previous known-good state. CD requires
automated rollback that takes minutes, not hours. See Rollback.
Referenced in:
CD Dependency Tree,
CD for Greenfield Projects,
Change Advisory Board Gates,
Change Fail Rate,
Data Pipelines and ML Models Have No Deployment Automation,
Database Migrations Block or Break Deployments,
Deployable Definition,
Deployments Are One-Way Doors,
Every Change Requires a Ticket and Approval Chain,
Experience Reports,
Feature Flags,
Horizontal Slicing,
Mean Time to Repair,
Metrics-Driven Improvement,
Missing Deployment Pipeline,
No Deployment Health Checks,
Phase 2: Pipeline,
Pipeline Reference Architecture,
Pitfalls and Metrics,
Process & Deployment Defects,
Production Problems Are Discovered Hours or Days Late,
Progressive Rollout,
Release Frequency,
Releases Depend on One Person,
Single Path to Production,
Symptoms for Developers,
Systemic Defect Fixes,
TBD Migration Guide,
The Team Is Caught Between Shipping Fast and Not Breaking Things,
Tightly Coupled Monolith,
Work Decomposition
Repository Readiness
See Agentic CD Glossary.
S
Skill (Agent)
See Agentic CD Glossary.
Soft Dependency
A dependency that can be worked around or deferred. Unlike hard dependencies, soft dependencies
do not block work but may influence sequencing or design decisions. Feature flags can turn many
hard dependencies into soft dependencies by allowing incomplete integrations to be deployed in
a disabled state.
Specification Engineering
See Agentic CD Glossary.
Story Points
A relative estimation unit used by some teams to forecast effort. Story points are frequently
misused as a productivity metric, which creates perverse incentives to inflate estimates and
discourages the small work decomposition that CD requires. If your organization uses story
points as a velocity target, see Metrics-Driven Improvement.
Referenced in:
Leadership Sees CD as a Technical Nice-to-Have,
Some Developers Are Overloaded While Others Wait for Work,
Team Burnout and Unsustainable Pace,
Velocity as Individual Metric
Sub-agent
See Agentic CD Glossary.
Subdomain Product Team
A team that owns a bounded subdomain within a larger distributed system - full-stack within
their service (API, business logic, data store) but not directly user-facing. Their public
interface is designed for machines: other services or teams consume it through a defined API
contract. A vertical slice for a subdomain product team delivers one observable behavior
through that contract. The slice is done when the API satisfies the agreed behavior for its
service consumers. Contrast with full-stack product team.
Referenced in:
Horizontal Slicing,
Small Batches,
Work Decomposition
System Prompt
See Agentic CD Glossary.
T
TBD (Trunk-Based Development)
A source-control branching model where all developers integrate to a single shared branch
(trunk) at least once per day. Short-lived feature branches (less than a day) are acceptable.
Long-lived feature branches are not. TBD is a prerequisite for CI, which is in turn a
prerequisite for CD. See Trunk-Based Development.
Referenced in:
Build Automation,
CD Dependency Tree,
CD for Greenfield Projects,
Change & Complexity Defects,
DORA Recommended Practices,
FAQ,
Feature Flags,
Integration Frequency,
Long-Lived Feature Branches,
Metrics-Driven Improvement,
Multiple Teams, Single Deployable,
Phase 1: Foundations,
Process & Deployment Defects,
Retrospectives,
Single Team, Single Deployable,
TBD Migration Guide,
Team Membership Changes Constantly,
The Team Resists Merging to the Main Branch,
Trunk-Based Development,
Work Decomposition,
Work in Progress,
Work Items Take Days or Weeks to Complete,
Working Agreements
TDD (Test-Driven Development)
See Testing Glossary.
Referenced in:
Testing Fundamentals
Token
See Agentic CD Glossary.
Tokenomics
See Agentic CD Glossary.
See Agentic CD Glossary.
Toil
Repetitive, manual work related to maintaining a production service that is automatable, has
no lasting value, and scales linearly with service size. Examples include manual deployments,
manual environment provisioning, and manual test execution. Eliminating toil is a primary
benefit of building a CD pipeline.
Referenced in:
AI Adoption Roadmap,
Architecture Decoupling,
Build Duration,
CD Dependency Tree,
Change Advisory Board Gates,
Deployable Definition,
DORA Recommended Practices,
Experience Reports,
Feature Flags,
Lead Time,
Progressive Rollout,
Tightly Coupled Monolith,
Your Migration Journey
U
Unplanned Work
Work that arrives outside the planned backlog - production incidents, urgent bug fixes,
ad hoc requests. High levels of unplanned work indicate systemic quality or operational
problems. Teams with high change failure rates generate their own unplanned work through
failed deployments. Reducing unplanned work is a natural outcome of improving change failure
rate through CD practices.
Referenced in:
Team Burnout and Unsustainable Pace,
Thin-Spread Teams
V
Virtual Service
See Testing Glossary.
Referenced in:
Test Environments Take Too Long to Reset Between Runs
Value Stream Map
A visual representation of every step required to deliver a change from request to production,
showing process time, wait time, and percent complete and accurate at each step. The
foundational tool for Phase 0 - Assess.
Referenced in:
FAQ,
Phase 0: Assess
Vertical Sliced Story
A user story that delivers a thin slice of functionality across all layers of the system
(UI, API, database, etc.) rather than a horizontal slice that implements one layer completely.
Vertical slices are independently deployable and testable, which is essential for CD. Vertical
slicing is a core technique in Work Decomposition.
Referenced in:
Agent-Assisted Specification,
CD Dependency Tree,
CD for Greenfield Projects,
Horizontal Slicing,
Long-Lived Feature Branches,
Monolithic Work Items,
Small Batches,
Small-Batch Agent Sessions,
Sprint Planning Is Dominated by Dependency Negotiation,
Stakeholders See Working Software Only at Release Time
W
WIP (Work in Progress)
The number of work items that have been started but not yet completed. High WIP increases lead
time, reduces focus, and increases context-switching overhead. Limiting WIP is a key practice
in Phase 3 - Limiting WIP.
Referenced in:
Architecture Decoupling,
CD Dependency Tree,
Development Cycle Time,
DORA Recommended Practices,
Everything Started, Nothing Finished,
Experience Reports,
Feature Flags,
Metrics-Driven Improvement,
Phase 3: Optimize,
Pitfalls and Metrics,
Push-Based Work Assignment,
Retrospectives,
Retrospectives Produce No Real Change,
Small Batches,
Symptoms for Managers,
TBD Migration Guide,
Team Burnout and Unsustainable Pace,
Team Membership Changes Constantly,
The Team Has No Shared Agreements About How to Work,
Tokenomics: Optimizing Token Usage in Agent Architecture,
Work Decomposition,
Work in Progress,
Working Agreements
White Box Testing
See Testing Glossary.
Working Agreement
An explicit, documented set of team norms covering how work is defined, reviewed, tested, and
deployed. Working agreements create shared expectations and reduce friction. See
Working Agreements.
Referenced in:
AI Tooling Slows You Down Instead of Speeding You Up,
Pull Requests Sit for Days Waiting for Review,
Rubber-Stamping AI-Generated Code,
The Team Has No Shared Agreements About How to Work
8 - FAQ
Frequently asked questions about continuous delivery and this migration guide.
About This Guide
Why does this migration guide exist?
Many teams say they want to adopt continuous delivery but do not know where to start. The CD
landscape is full of tools, frameworks, and advice, but there is no clear, sequenced path from
“we deploy monthly” to “we can deploy any change at any time.” This guide provides that path.
It is built on the MinimumCD definition of continuous delivery and
draws on practices from the Dojo Consortium and the
DORA research. The content is organized as a phased migration journey
from your current state to continuous delivery rather than as a description of what CD looks
like when you are already there.
Who is this guide for?
This guide is for development teams, tech leads, and engineering managers who want to improve
their software delivery practices. It is designed for teams that are currently deploying
infrequently (monthly, quarterly, or less) and want to reach a state where any change can be
deployed to production at any time.
You do not need to be starting from zero. If your team already has CI in place, you can begin
with Phase 2: Pipeline. If you have a pipeline but deploy infrequently, start
with Phase 3: Optimize. Use the Phase 0 assessment to find your
starting point.
Should we adopt this guide as an organization or as a team?
Start with a single team. CD adoption works best when a team can experiment, learn, and iterate
without waiting for organizational consensus. Once one team demonstrates results (shorter lead
times, lower change failure rate, more frequent deployments), other teams will have a concrete
example to follow.
Organizational adoption comes after team adoption, not before. The role of organizational
leadership is to create the conditions for teams to succeed: stable team composition, tool
funding, policy flexibility for deployment processes, and protection from pressure to cut
corners on quality.
How do we use this guide for improvement?
Start with Phase 0: Assess. Map your value stream, measure your current
performance, and identify your top constraints. Then work through the phases in order, focusing
on one constraint at a time.
The guide is not a checklist to complete in sequence. It is a reference that helps you decide
what to work on next. Some teams will spend months in Phase 1 building testing fundamentals.
Others will move quickly to Phase 2 because they already have strong development practices.
Your value stream map and metrics tell you where to invest.
Revisit your assessment periodically. As you improve, new constraints will emerge. The phases
give you a framework for addressing them.
Continuous Delivery Concepts
What is the difference between continuous delivery and continuous deployment?
Continuous delivery means every change to the codebase is always in a deployable state and
can be released to production at any time through a fully automated pipeline. The decision to
deploy may still be made by a human, but the capability to deploy is always present.
Continuous deployment is an extension of continuous delivery where every change that passes
the automated pipeline is deployed to production without manual intervention.
This migration guide takes you through continuous delivery (Phases 0-3) and then to continuous
deployment (Phase 4). Continuous delivery is the prerequisite. You cannot safely automate
deployment decisions until your pipeline reliably determines what is deployable.
Is continuous delivery the same as having a CD pipeline?
No. Many teams have a CD pipeline tool (Jenkins, GitHub Actions, GitLab CI, etc.) but are
not practicing continuous delivery. A pipeline tool is necessary but not sufficient.
Continuous delivery also requires trunk-based development, comprehensive test automation, a
single path to production, immutable artifacts, and the ability to deploy any green build.
If your team has a pipeline but uses long-lived feature branches, deploys only at the end of a
sprint, or requires manual testing before a release, you have a pipeline tool but you are not
practicing continuous delivery. The current-state checklist
in Phase 0 helps you assess the gap.
What does “the pipeline is the only path to production” mean?
It means there is exactly one way for any change to reach production: through the automated
pipeline. No one can SSH into a server and make a change. No one can skip the test suite for
an “urgent” fix. No one can deploy from their local machine.
This constraint is what gives you confidence. If every change in production has been through
the same build, test, and deployment process, you know what is running and how it got there.
If exceptions are allowed, you lose that guarantee, and your ability to reason about production
state degrades.
During your migration, establishing this single path is a key milestone in
Phase 2.
What does “application configuration” mean in the context of CD?
Application configuration refers to values that change between environments but are not part of
the application code: database connection strings, API endpoints, feature flag states, logging
levels, and similar settings.
In a CD pipeline, configuration is externalized. It lives outside the artifact and is injected
at deployment time. This is what makes immutable artifacts
possible. You build the artifact once and deploy it to any environment by providing the
appropriate configuration.
If configuration is embedded in the artifact (for example, hardcoded URLs or environment-specific
config files baked into a container image), you must rebuild the artifact for each environment,
which means the artifact you tested is not the artifact you deploy. This breaks the immutability
guarantee. See Application Config.
What is an “immutable artifact” and why does it matter?
An immutable artifact is a build output (container image, binary, package) that is never
modified after it is created. The exact artifact that passes your test suite is the exact
artifact that is deployed to staging, and then to production. Nothing is recompiled, repackaged,
or patched between environments.
This matters because it eliminates an entire category of deployment failures: “it worked in
staging but not in production” caused by differences in the build. If the same bytes are
deployed everywhere, build-related discrepancies are impossible.
Immutability requires externalizing configuration (see above) and storing artifacts in a
registry or repository. See Immutable Artifacts.
What does “deployable” mean?
A change is deployable when it has passed all automated quality gates defined in the pipeline.
The definition is codified in the pipeline itself, not decided by a person at deployment time.
A typical deployable definition includes:
- All unit tests pass
- All integration tests pass
- All acceptance tests pass
- Static analysis checks pass (linting, security scanning)
- The artifact is built and stored in the artifact registry
- Deployment to a production-like environment succeeds
- Smoke tests in the production-like environment pass
If any of these gates fail, the change is not deployable. The pipeline makes this determination
automatically and consistently. See Deployable Definition.
What is the difference between deployment and release?
Deployment is the act of putting code into a production environment.
Release is the act of making functionality available to users.
These are different events, and decoupling them is one of the most powerful techniques in CD.
You can deploy code to production without releasing it to users by using
feature flags. The code is running in production, but the new
functionality is disabled. When you are ready, you enable the flag and the feature is released.
This decoupling is important because it separates the technical risk (will the deployment
succeed?) from the business risk (will users like the feature?). You can manage each risk
independently. Deployments become routine technical events. Releases become deliberate business
decisions.
Migration Questions
How long does the migration take?
It depends on where you start and how much organizational support you have. As a rough guide:
- Phase 0 (Assess): 1-2 weeks
- Phase 1 (Foundations): 1-6 months, depending on current testing and TBD maturity
- Phase 2 (Pipeline): 1-3 months
- Phase 3 (Optimize): 2-6 months
- Phase 4 (Deliver on Demand): 1-3 months
These ranges assume a single team working on the migration alongside regular delivery work.
The biggest variable is Phase 1: teams with no test automation or TBD practice will spend
longer building foundations than teams that already have these in place.
Do not treat these timelines as commitments. The migration is an iterative improvement process,
not a project with a deadline.
Do we stop delivering features during the migration?
No. The migration is done alongside regular delivery work, not instead of it. Each migration
practice is adopted incrementally: you do not stop the world to rewrite your test suite or
redesign your pipeline.
For example, in Phase 1 you adopt trunk-based development by reducing branch lifetimes
gradually: from two weeks to one week to two days to same-day. You add automated tests
incrementally, starting with the highest-risk code paths. You decompose work into smaller
stories one sprint at a time.
The migration practices themselves improve your delivery speed, so the investment pays off
as you go. Teams that have completed Phase 1 typically report delivering features faster than
before, not slower.
What if our organization requires manual change approval (CAB)?
Many organizations have Change Advisory Board (CAB) processes that require manual approval
before production deployments. This is one of the most common organizational blockers for CD.
The path forward is to replace the manual approval with automated evidence: a mature CD
pipeline provides stronger safety guarantees than a committee meeting, and your DORA metrics
can demonstrate this. Most CAB processes were designed for monthly releases with hundreds of
changes per batch; when you deploy daily with one or two changes, the risk profile is
fundamentally different. See CAB Gates
for a detailed approach to this transition.
What if we have a monolithic architecture?
You can practice continuous delivery with a monolith. CD does not require microservices. Many
of the highest-performing teams in the DORA research deploy monolithic applications multiple
times per day.
What matters is that your architecture supports independent testing and deployment. A
well-structured monolith with a comprehensive test suite and a reliable pipeline can achieve
CD. A poorly structured collection of microservices with shared databases and coordinated
releases cannot.
Architecture decoupling is addressed in Phase 3, but
it is about enabling independent deployment and reducing coordination costs, not about adopting
any particular architectural style.
What if our tests are slow or unreliable?
This is one of the most common starting conditions. A slow or flaky test suite undermines
every CD practice: developers stop trusting the tests, broken builds are ignored, and the
pipeline becomes a bottleneck rather than an enabler. The fix is incremental: quarantine
flaky tests, parallelize execution, rebalance toward fast unit tests, and set a pipeline
time budget (under 10 minutes). See
Testing Fundamentals and the
Testing reference section for detailed guidance.
Where do I start if I am not sure which phase applies to us?
Start with Phase 0: Assess. Complete the
value stream mapping exercise, take
baseline metrics, and fill out the
current-state checklist. These activities will tell you
exactly where you stand and which phase to begin with.
If you do not have time for a full assessment, ask yourself these questions:
- Do all developers integrate to trunk at least daily? If no, start with Phase 1.
- Do you have a single automated pipeline that every change goes through? If no, start with Phase 2.
- Can you deploy any green build to production on demand? If no, focus on the gap between your current state and Phase 2 completion criteria.
- Do you deploy at least weekly? If no, look at Phase 3 for batch size and flow optimization.
Is CD about speed or quality?
Quality. The purpose of the pipeline is to validate that an artifact is production-worthy or
reject it. Do not chase daily deployments without first building confidence in your ability to
detect failure. Move validation as close to the developer as possible: run it on the desktop,
run it again on merge to trunk, run it again when the trunk changes.
Testing is not limited to component tests. You need to test for security, compliance,
performance, and everything else required in your context. Set error budgets and do not exceed
them. When your error budget is spent, stop shipping features and invest in pipeline
hardening. When something breaks in production, harden the pipeline. When exploratory testing
uncovers an edge case, harden the pipeline. The primary goal is to build efficient and
effective quality gates. Only then can you move quickly.
9 - Resources
Books, videos, and further reading on continuous delivery and deployment.
This page collects the books, websites, and videos that inform the practices in this migration
guide. Resources are organized by topic and annotated with which migration phase they are most
relevant to.
Books
Continuous Delivery and Deployment
- Modern Software Engineering by Dave Farley
- Farley’s broader take on what it means to do software engineering well. Covers the principles
behind CD - iterating toward a goal, getting fast feedback, working in small steps - and
connects them to test-driven development, managing complexity, and designing for testability.
Useful for teams that want to understand the why behind CD practices, not just the how.
- Most relevant to: All phases
- Continuous Delivery Pipelines by Dave Farley
- A practical, focused guide to building CD pipelines. Farley covers pipeline design, testing
strategies, and deployment patterns in a direct, implementation-oriented style. Start here
if you want a concise guide to the pipeline practices in Phase 2.
- Most relevant to: Phase 2: Pipeline
- Continuous Delivery by Jez Humble and Dave Farley
- The foundational text on CD. Published in 2010, it remains the most comprehensive treatment
of the principles and practices that make continuous delivery work. Covers version control
patterns, build automation, testing strategies, deployment pipelines, and release management.
If you read one book before starting your migration, read this one.
- Most relevant to: All phases
- Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim
- Presents the DORA research findings that link technical practices to organizational
performance. Covers the four key metrics (deployment frequency, lead time, change failure
rate, MTTR) and the capabilities that predict high performance. Essential reading for anyone
who needs to make the business case for a CD migration.
- Most relevant to: Phase 0: Assess and Phase 3: Metrics-Driven Improvement
- Engineering the Digital Transformation by Gary Gruver
- Addresses the organizational and leadership challenges of large-scale delivery
transformation. Gruver draws on his experience leading transformations at HP and other large
enterprises. Particularly valuable for leaders sponsoring a migration who need to understand
the change management, communication, and sequencing challenges ahead.
- Most relevant to: Organizational leadership across all phases
- Release It! by Michael T. Nygard
- Covers the design and architecture patterns that make production systems resilient. Topics
include stability patterns (circuit breakers, bulkheads, timeouts), deployment patterns, and
the operational realities of running software at scale. Essential reading before entering
Phase 4, where the team has the capability to deploy any change on demand.
- Most relevant to: Phase 4: Deliver on Demand and Phase 2: Rollback
- The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis
- A practical companion to The Phoenix Project. Covers the Three Ways (flow, feedback, and
continuous learning) and provides detailed guidance on implementing DevOps practices. Useful
as a reference throughout the migration.
- Most relevant to: All phases
- The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford
- A novel that illustrates DevOps principles through the story of a fictional IT organization
in crisis. Useful for building organizational understanding of why delivery improvement
matters, especially for stakeholders who will not read a technical book.
- Most relevant to: Building organizational buy-in during Phase 0
Testing
- Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce
- The definitive guide to test-driven development in practice. Goes beyond unit testing to
cover acceptance testing, test doubles, and how TDD drives design. Essential reading for
Phase 1 testing fundamentals.
- Most relevant to: Phase 1: Testing Fundamentals
- Working Effectively with Legacy Code by Michael Feathers
- Practical techniques for adding tests to untested code, breaking dependencies, and
incrementally improving code that was not designed for testability. Indispensable if your
migration starts with a codebase that has little or no automated testing.
- Most relevant to: Phase 1: Testing Fundamentals
Work Decomposition and Flow
- User Story Mapping by Jeff Patton
- A practical guide to breaking features into deliverable increments using story maps. Patton’s
approach directly supports the vertical slicing discipline required for small batch delivery.
- Most relevant to: Phase 1: Work Decomposition
- The Principles of Product Development Flow by Donald Reinertsen
- A rigorous treatment of flow economics in product development. Covers queue theory, batch
size economics, WIP limits, and the cost of delay. Dense but transformative. Reading this
book will change how you think about every aspect of your delivery process.
- Most relevant to: Phase 3: Optimize
- Making Work Visible by Dominica DeGrandis
- Focuses on identifying and eliminating the “time thieves” that steal productivity: too much
WIP, unknown dependencies, unplanned work, conflicting priorities, and neglected work. A
practical companion to the WIP limiting practices in Phase 3.
- Most relevant to: Phase 3: Limiting WIP
Databases
- Refactoring Databases: Evolutionary Database Design by Scott Ambler and Pramod Sadalage
- The definitive guide to managing database schema changes incrementally. Covers expand-contract
migrations, backward-compatible schema changes, and techniques for evolving databases without
downtime. Essential reading for teams whose deployment pipeline includes database changes.
- Most relevant to: Phase 2: Pipeline and Phase 3: Small Batches
Architecture
- Building Microservices by Sam Newman
- Covers the architectural patterns that enable independent deployment, including service
boundaries, API design, data management, and testing strategies for distributed systems.
- Most relevant to: Phase 3: Architecture Decoupling
- Team Topologies by Matthew Skelton and Manuel Pais
- Addresses the relationship between team structure and software architecture (Conway’s Law in
practice). Covers team types, interaction modes, and how to evolve team structures to support
fast flow. Valuable for addressing the organizational blockers that surface throughout the
migration.
- Most relevant to: Organizational design across all phases
Websites
- MinimumCD.org
- Defines the minimum set of practices required to claim you are doing continuous delivery.
This migration guide uses the MinimumCD definition as its target state. Start here to
understand what CD actually requires.
- Dojo Consortium
- A community-maintained collection of CD practices, metrics definitions, and improvement
patterns. Many of the definitions and frameworks in this guide are adapted from the Dojo
Consortium’s work.
- DORA (dora.dev)
- The DevOps Research and Assessment site, which publishes the annual State of DevOps report
and provides resources for measuring and improving delivery performance.
- Trunk-Based Development
- The comprehensive reference for trunk-based development patterns. Covers short-lived
feature branches, feature flags, branch by abstraction, and release branching strategies.
- Martin Fowler’s blog (martinfowler.com)
- Martin Fowler’s site contains authoritative articles on continuous integration, continuous
delivery, microservices, refactoring, and software design. Key articles include
“Continuous Integration” and “Continuous Delivery.”
- Google Cloud Architecture Center: DevOps
- Google’s public documentation of the DORA capabilities, including self-assessment tools and
implementation guidance.
Videos
- “Modern Software Engineering” by Dave Farley (YouTube channel)
- Dave Farley’s YouTube channel provides weekly videos covering CD practices, pipeline design,
testing strategies, and software engineering principles. Accessible and practical.
- Most relevant to: All phases
- “Continuous Delivery” by Jez Humble (various conference talks)
- Jez Humble’s conference presentations cover the principles and research behind CD. His talk
“Why Continuous Delivery?” is an excellent introduction for teams and stakeholders who are
new to the concept.
- Most relevant to: Building understanding during Phase 0
- “Refactoring” and “TDD” talks by Martin Fowler and Kent Beck
- Foundational talks on the development practices that support CD. Understanding TDD and
refactoring is essential for Phase 1 testing fundamentals.
- Most relevant to: Phase 1: Foundations
- “The Smallest Thing That Could Possibly Work” by Bryan Finster
- Covers the work decomposition and small batch delivery practices that are central to this
migration guide. Focuses on practical techniques for breaking work into vertical slices.
- Most relevant to: Phase 1: Work Decomposition and Phase 3: Small Batches
- “Real Example of a Deployment Pipeline in the Fintech Industry” by Dave Farley
- A concrete walkthrough of a production deployment pipeline in a regulated financial services
environment. Demonstrates that CD practices are compatible with compliance requirements.
- Most relevant to: Phase 2: Pipeline
Blog Posts and Articles
- Continuous Integration Certification by Martin Fowler
- A short, practical test for whether your team is actually practicing continuous integration.
Useful as a self-assessment during Phase 1.
- Most relevant to: Phase 1: Foundations
- Continuous Delivery: Anatomy of the Deployment Pipeline by Dave Farley
- An article-length overview of deployment pipeline structure, covering commit stage, acceptance
testing, and release stages. A good companion to the pipeline phase of this guide.
- Most relevant to: Phase 2: Pipeline
Recommended Reading Order
If you are starting your migration and want to read in the most useful order:
- Accelerate, to understand the research and build the business case
- Continuous Delivery (Humble & Farley), to understand the full picture
- Continuous Delivery Pipelines (Farley), for practical pipeline implementation
- Working Effectively with Legacy Code, if your codebase lacks tests
- The Principles of Product Development Flow, to understand flow optimization
- Release It!, before moving to continuous deployment
Migration Tip
You do not need to read all of these before starting your migration. Start with the practices
in Phase 1, read Accelerate for the business case, and refer to the other resources as you
reach the relevant migration phase. The most important thing is to start delivering
improvements, not to finish a reading list.