This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Reference

Supporting material: glossary, metrics definitions, testing guides, and additional resources.

This section provides reference material that supports your migration journey. Use it alongside the phase guides for detailed definitions, metrics, and patterns.

Contents

  • Glossary - Key terms and definitions
  • CD Dependency Tree - How CD practices depend on each other
  • Common Blockers - Frequently encountered obstacles and how to address them
  • Defect Sources - Defect causes across the delivery value stream with detection methods and AI enhancements
  • DORA Capabilities - The capabilities that drive software delivery performance
  • Resources - Books, videos, and further reading
  • Metrics - Detailed definitions for key delivery metrics
  • Testing - Testing types, patterns, and best practices

1 - Glossary

Key terms and definitions used throughout this guide.

Adapted from Dojo Consortium

This glossary defines the terms used across every phase of the CD migration guide. Where a term has a specific meaning within a migration phase, the relevant phase is noted.

A

Artifact

A packaged, versioned output of a build process (e.g., a container image, JAR file, or binary). In a CD pipeline, artifacts are built once and promoted through environments without modification. See Immutable Artifacts.

B

Baseline Metrics

The set of delivery measurements taken before beginning a migration, used as the benchmark against which improvement is tracked. See Phase 0 – Baseline Metrics.

Batch Size

The amount of change included in a single deployment. Smaller batches reduce risk, simplify debugging, and shorten feedback loops. Reducing batch size is a core focus of Phase 3 – Small Batches.

BDD (Behavior-Driven Development)

A collaboration practice where developers, testers, and product representatives define expected behavior using structured examples before code is written. BDD produces executable specifications that serve as both documentation and automated tests. BDD supports effective work decomposition by forcing clarity about what a story actually means before development begins.

Blue-Green Deployment

A deployment strategy that maintains two identical production environments. New code is deployed to the inactive environment, verified, and then traffic is switched. See Progressive Rollout.

Branch Lifetime

The elapsed time between creating a branch and merging it to trunk. CD requires branch lifetimes measured in hours, not days or weeks. Long branch lifetimes are a symptom of poor work decomposition or slow code review. See Trunk-Based Development.

C

Canary Deployment

A deployment strategy where a new version is rolled out to a small subset of users or servers before full rollout. If the canary shows no issues, the deployment proceeds to 100%. See Progressive Rollout.

CD (Continuous Delivery)

The practice of ensuring that every change to the codebase is always in a deployable state and can be released to production at any time through a fully automated pipeline. Continuous delivery does not require that every change is deployed automatically, but it requires that every change could be deployed automatically. This is the primary goal of this migration guide.

Change Failure Rate (CFR)

The percentage of deployments to production that result in a degraded service and require remediation (e.g., rollback, hotfix, or patch). One of the four DORA metrics. See Metrics – Change Fail Rate.

CI (Continuous Integration)

The practice of integrating code changes to a shared trunk at least once per day, where each integration is verified by an automated build and test suite. CI is a prerequisite for CD, not a synonym. A team that runs automated builds on feature branches but merges weekly is not doing CI. See Build Automation.

Constraint

In the Theory of Constraints, the single factor most limiting the throughput of a system. During a CD migration, your job is to find and fix constraints in order of impact. See Identify Constraints.

Continuous Deployment

An extension of continuous delivery where every change that passes the automated pipeline is deployed to production without manual intervention. Continuous delivery ensures every change can be deployed; continuous deployment ensures every change is deployed. See Phase 4 – Deliver on Demand.

D

Deployable

A change that has passed all automated quality gates defined by the team and is ready for production deployment. The definition of deployable is codified in the pipeline, not decided by a person at deployment time. See Deployable Definition.

Deployment Frequency

How often an organization successfully deploys to production. One of the four DORA metrics. See Metrics – Release Frequency.

Development Cycle Time

The elapsed time from the first commit on a change to that change being deployable. This measures the efficiency of your development and pipeline process, excluding upstream wait times. See Metrics – Development Cycle Time.

DORA Metrics

The four key metrics identified by the DORA (DevOps Research and Assessment) research program as predictive of software delivery performance: deployment frequency, lead time for changes, change failure rate, and mean time to restore service. See DORA Capabilities.

F

Feature Flag

A mechanism that allows code to be deployed to production with new functionality disabled, then selectively enabled for specific users, percentages of traffic, or environments. Feature flags decouple deployment from release. See Feature Flags.

Flow Efficiency

The ratio of active work time to total elapsed time in a delivery process. A flow efficiency of 15% means that for every hour of actual work, roughly 5.7 hours are spent waiting. Value stream mapping reveals your flow efficiency. See Value Stream Mapping.

H

Hard Dependency

A dependency that must be resolved before work can proceed. In delivery, hard dependencies include things like waiting for another team’s API, a shared database migration, or an infrastructure provisioning request. Hard dependencies create queues and increase lead time. Eliminating hard dependencies is a focus of Architecture Decoupling.

Hardening Sprint

A sprint dedicated to stabilizing and fixing defects before a release. The existence of hardening sprints is a strong signal that quality is not being built in during regular development. Teams practicing CD do not need hardening sprints because every commit is deployable. See Common Blockers.

I

Immutable Artifact

A build artifact that is never modified after creation. The same artifact that is tested in the pipeline is the exact artifact that is deployed to production. Configuration differences between environments are handled externally. See Immutable Artifacts.

Integration Frequency

How often a developer integrates code to the shared trunk. CD requires at least daily integration. See Metrics – Integration Frequency.

L

Lead Time for Changes

The elapsed time from when a commit is made to when it is successfully running in production. One of the four DORA metrics. See Metrics – Lead Time.

M

Mean Time to Restore (MTTR)

The elapsed time from when a production incident is detected to when service is restored. One of the four DORA metrics. Teams practicing CD have short MTTR because deployments are small, rollback is automated, and the cause of failure is easy to identify. See Metrics – Mean Time to Repair.

P

Pipeline

The automated sequence of build, test, and deployment stages that every change passes through on its way to production. See Phase 2 – Pipeline.

Production-Like Environment

A test or staging environment that matches production in configuration, infrastructure, and data characteristics. Testing in environments that differ from production is a common source of deployment failures. See Production-Like Environments.

R

Rollback

The ability to revert a production deployment to a previous known-good state. CD requires automated rollback that takes minutes, not hours. See Rollback.

S

Soft Dependency

A dependency that can be worked around or deferred. Unlike hard dependencies, soft dependencies do not block work but may influence sequencing or design decisions. Feature flags can turn many hard dependencies into soft dependencies by allowing incomplete integrations to be deployed in a disabled state.

Story Points

A relative estimation unit used by some teams to forecast effort. Story points are frequently misused as a productivity metric, which creates perverse incentives to inflate estimates and discourages the small work decomposition that CD requires. If your organization uses story points as a velocity target, see Common Blockers.

T

TBD (Trunk-Based Development)

A source-control branching model where all developers integrate to a single shared branch (trunk) at least once per day. Short-lived feature branches (less than a day) are acceptable. Long-lived feature branches are not. TBD is a prerequisite for CI, which is in turn a prerequisite for CD. See Trunk-Based Development.

TDD (Test-Driven Development)

A development practice where tests are written before the production code that makes them pass. TDD supports CD by ensuring high test coverage, driving simple design, and producing a fast, reliable test suite. TDD feeds into the testing fundamentals required in Phase 1.

Toil

Repetitive, manual work related to maintaining a production service that is automatable, has no lasting value, and scales linearly with service size. Examples include manual deployments, manual environment provisioning, and manual test execution. Eliminating toil is a primary benefit of building a CD pipeline.

U

Unplanned Work

Work that arrives outside the planned backlog – production incidents, urgent bug fixes, ad hoc requests. High levels of unplanned work indicate systemic quality or operational problems. Teams with high change failure rates generate their own unplanned work through failed deployments. Reducing unplanned work is a natural outcome of improving change failure rate through CD practices.

V

Value Stream Map

A visual representation of every step required to deliver a change from request to production, showing process time, wait time, and percent complete and accurate at each step. The foundational tool for Phase 0 – Assess.

Vertical Sliced Story

A user story that delivers a thin slice of functionality across all layers of the system (UI, API, database, etc.) rather than a horizontal slice that implements one layer completely. Vertical slices are independently deployable and testable, which is essential for CD. Vertical slicing is a core technique in Work Decomposition.

W

WIP (Work in Progress)

The number of work items that have been started but not yet completed. High WIP increases lead time, reduces focus, and increases context-switching overhead. Limiting WIP is a key practice in Phase 3 – Limiting WIP.

Working Agreement

An explicit, documented set of team norms covering how work is defined, reviewed, tested, and deployed. Working agreements create shared expectations and reduce friction. See Working Agreements.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

2 - CD Dependency Tree

Visual guide showing how CD practices depend on and build upon each other.

Adapted from Dojo Consortium

Continuous delivery is not a single practice you adopt. It is a system of interdependent practices where each one supports and enables others. This dependency tree shows those relationships. Understanding the dependencies helps you plan your migration in the right order – addressing foundational practices before building on them.

The Dependency Tree

The diagram below shows how the core practices of CD relate to each other. Read it from bottom to top: lower practices enable higher ones. The migration phases in this guide are sequenced to follow these dependencies.

graph BT
    subgraph "Goal"
        CD["Continuous Delivery"]
    end

    subgraph "Continuous Integration"
        CI["Continuous Integration"]
    end

    subgraph "Development Practices"
        TBD["Trunk-Based Development"]
        TDD["Test-Driven Development"]
        BDD["Behavior-Driven Development"]
        WD["Work Decomposition"]
        CR["Code Review"]
    end

    subgraph "Build & Test Infrastructure"
        BA["Build Automation"]
        TS["Test Suite"]
        PLEnv["Production-Like Environments"]
    end

    subgraph "Pipeline Practices"
        SPP["Single Path to Production"]
        DP["Deterministic Pipeline"]
        IA["Immutable Artifacts"]
        AC["Application Config"]
        RB["Rollback"]
        DD["Deployable Definition"]
    end

    subgraph "Flow Optimization"
        SB["Small Batches"]
        FF["Feature Flags"]
        WIP["WIP Limits"]
        MDI["Metrics-Driven Improvement"]
    end

    subgraph "Organizational Practices"
        WA["Working Agreements"]
        Retro["Retrospectives"]
        AD["Architecture Decoupling"]
    end

    %% Development Practices feed CI
    TDD --> CI
    BDD --> TDD
    BDD --> WD
    TBD --> CI
    WD --> SB
    CR --> TBD

    %% Build infrastructure feeds CI
    BA --> CI
    TS --> CI
    TDD --> TS

    %% CI feeds pipeline
    CI --> SPP
    CI --> DP
    PLEnv --> DP

    %% Pipeline practices feed CD
    SPP --> CD
    DP --> CD
    IA --> CD
    AC --> IA
    RB --> CD
    DD --> CD

    %% Flow optimization feeds CD
    SB --> CD
    FF --> SB
    FF --> CD
    WIP --> SB
    MDI --> CD

    %% Organizational practices support everything
    WA --> CR
    WA --> DD
    Retro --> MDI
    AD --> FF
    AD --> SB

How to Read the Dependency Tree

Each arrow means “supports” or “enables.” When practice A has an arrow pointing to practice B, it means A is a prerequisite or enabler for B.

Key dependency chains to understand:

BDD enables TDD enables CI enables CD

Behavior-Driven Development produces clear, testable acceptance criteria. Those criteria drive Test-Driven Development at the code level. A comprehensive, fast test suite enables Continuous Integration with confidence. And CI is the foundational prerequisite for CD.

If your team skips BDD, stories are ambiguous. If stories are ambiguous, tests are incomplete or wrong. If tests are unreliable, CI is unreliable. And if CI is unreliable, CD is impossible.

Work Decomposition enables Small Batches enables CD

You cannot deploy small batches if your work items are large. Work decomposition – breaking features into vertical slices that can each be completed in two days or less – is what makes small batches possible. Small batches in turn reduce deployment risk and enable the rapid feedback that CD depends on.

Trunk-Based Development enables CI

CI requires that all developers integrate to a shared trunk at least once per day. If your team uses long-lived feature branches, you are not doing CI regardless of how often your build server runs. TBD is not optional for CD – it is a prerequisite.

Architecture Decoupling enables Feature Flags and Small Batches

Tightly coupled architectures force coordinated deployments. When changing service A requires simultaneously changing services B and C, small independent deployments become impossible. Architecture decoupling – through well-defined APIs, contract testing, and service boundaries – enables teams to deploy independently, use feature flags effectively, and maintain small batch sizes.

Mapping to Migration Phases

The dependency tree directly informs the sequencing of migration phases:

Dependency Layer Migration Phase Why This Order
Development practices (TBD, TDD, BDD, work decomposition, code review) Phase 1 – Foundations These are prerequisites for CI, which is a prerequisite for everything else
Build and test infrastructure (build automation, test suite, production-like environments) Phase 1 and Phase 2 You need a reliable build and test infrastructure before you can build a reliable pipeline
Pipeline practices (single path, deterministic pipeline, immutable artifacts, config, rollback) Phase 2 – Pipeline The pipeline depends on solid CI and development practices
Flow optimization (small batches, feature flags, WIP limits, metrics) Phase 3 – Optimize Optimization requires a working pipeline to optimize
Organizational practices (working agreements, retrospectives, architecture decoupling) All phases These cross-cutting practices support every phase and should be established early

Using the Tree to Diagnose Problems

When something in your delivery process is not working, trace it through the dependency tree to find the root cause.

Example 1: Deployments keep failing. Look at what feeds CD in the tree. Is your pipeline deterministic? Are you using immutable artifacts? Is your application config externalized? The failure is likely in one of the pipeline practices.

Example 2: CI builds are constantly broken. Look at what feeds CI. Are developers actually practicing TBD (integrating daily)? Is the test suite reliable, or is it full of flaky tests? Is the build automated end-to-end? The broken builds are a symptom of a problem in the development practices layer.

Example 3: You cannot reduce batch size. Look at what feeds small batches. Is work being decomposed into vertical slices? Are feature flags available so partial work can be deployed safely? Is the architecture decoupled enough to allow independent deployment? The batch size problem originates in one of these upstream practices.

Practices Not Shown

The tree above focuses on the core technical and process practices. Several important supporting practices are not shown for clarity but are covered elsewhere in this guide:

  • Observability and monitoring – essential for progressive rollout and fast incident response
  • Security automation – integrated into the pipeline as automated checks rather than manual gates
  • Database change management – a common constraint addressed during pipeline architecture
  • Team topology and organizational design – addressed through working agreements and architectural decoupling

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

3 - Common Blockers

Frequently encountered obstacles on the path to CD and how to address them.

Adapted from Dojo Consortium

Every team migrating to continuous delivery will encounter obstacles. Some are technical. Most are not. The blockers listed here are drawn from patterns observed across hundreds of teams attempting the journey to CD. Recognizing them early helps you address root causes rather than fight symptoms.

Work Breakdown Problems

Stories Too Large

What it looks like: User stories regularly take more than a week to complete. Developers work on a single story for days without integrating. Sprint commitments are frequently missed because “the story was bigger than we thought.”

Why it blocks CD: Large stories mean large batches. Large batches mean infrequent integration. Infrequent integration means painful merges, delayed feedback, and high-risk deployments. You cannot practice continuous integration – the prerequisite for CD – if your work items take a week.

What to do: Adopt vertical slicing. Every story should deliver a thin slice of user-visible functionality across all layers of the system. Target a maximum of two days from start to done. See Work Decomposition.

No Vertical Slicing

What it looks like: Stories are organized by technical layer (“build the API,” “build the database schema,” “build the UI”) rather than by user-visible behavior. Multiple stories must be completed before anything is demonstrable or testable end-to-end.

Why it blocks CD: Horizontal slices cannot be independently deployed or tested. They create hard dependencies between stories and teams. Nothing is deployable until all layers are assembled, which forces large-batch releases.

What to do: Rewrite stories as vertical slices that deliver end-to-end functionality, even if the initial slice is minimal. A single form field that saves to the database and displays a confirmation is a vertical slice. An entire database schema with no UI is not.

Team Workflow Problems

Too Much Work in Progress

What it looks like: Every developer is working on a different story. The team has 8 items in progress and 0 items done. Standup meetings are long because everyone has a different context to report on. Nothing is finished, but everything is started.

Why it blocks CD: High WIP destroys flow. When everything is in progress, nothing gets the focused attention needed to finish. Context switching between items adds overhead. The delivery pipeline sees sporadic, large commits rather than a steady stream of small ones.

What to do: Set explicit WIP limits. A team of 6 developers should have no more than 3-4 items in progress at any time. The goal is to finish work, not to start it. See Limiting WIP.

Distant Date Commitments

What it looks like: The team has committed to delivering a specific scope by a date months in the future. The commitment was made before the work was understood. Progress is tracked against the original plan, and “falling behind” triggers pressure to cut corners.

Why it blocks CD: Fixed-scope, fixed-date commitments incentivize large batches. Teams hoard changes until the deadline, then deploy everything at once. There is no incentive to deliver incrementally because the commitment is about the whole scope, not about continuous flow. When the deadline pressure mounts, testing is the first thing cut.

What to do: Shift to continuous delivery of small increments. Report progress by showing working software in production, not by comparing actuals to a Gantt chart. If date commitments are required by the organization, negotiate on scope rather than on quality.

Velocity Used as a Productivity Metric

What it looks like: Management tracks story points completed per sprint as a measure of team productivity. Teams are compared by velocity. There is pressure to increase velocity every sprint.

Why it blocks CD: When velocity is a target, it ceases to be a useful measure (Goodhart’s Law). Teams inflate estimates to look productive. Stories get larger because larger stories have more points. The incentive is to maximize points, not to deliver small, frequent, valuable changes to production.

What to do: Replace velocity with DORA metrics – deployment frequency, lead time, change failure rate, and mean time to restore. These measure delivery outcomes rather than output volume.

Manual Testing Gates

Hardening Sprints

What it looks like: The team allocates one or more sprints after “feature complete” to stabilize, fix bugs, and prepare for release. Code is frozen during hardening. Testers run manual regression suites. Bug counts are tracked on a burndown chart.

Why it blocks CD: A hardening sprint is an admission that the normal development process does not produce deployable software. If you need a dedicated period to make code production-ready, you are not continuously delivering – you are doing waterfall with shorter phases. Hardening sprints add weeks of delay and encourage teams to accumulate technical debt during feature sprints because “we’ll fix it in hardening.”

What to do: Eliminate the need for hardening by building quality in. Adopt TDD to ensure test coverage. Use a CI pipeline that runs the full test suite on every commit. Define “deployable” as an automated pipeline outcome, not as a manual assessment. See Testing Fundamentals and Deployable Definition.

Manual Regression Testing

What it looks like: Every release requires a manual regression test cycle that takes days or weeks. Testers execute scripted test cases against the application. New features are tested manually before they are considered done.

Why it blocks CD: Manual regression testing scales linearly with application size and inversely with delivery frequency. The more features you add, the longer regression takes. The longer regression takes, the less frequently you can deploy. This is the opposite of CD.

What to do: Automate regression tests. Not all at once – start with the highest-risk areas and the tests that block deployments most frequently. Your automated test suite should give you the same confidence as manual regression, but in minutes rather than days. See Testing Fundamentals.

Organizational Anti-Patterns

Meaningless Retrospectives

What it looks like: Retrospectives happen on schedule, but action items are never completed. The same problems surface every sprint. The team has stopped believing that retrospectives lead to change.

Why it blocks CD: CD requires continuous improvement. If the mechanism for identifying and addressing process problems is broken, systemic issues accumulate. The same blockers will persist indefinitely.

What to do: Limit retrospective action items to one or two per sprint and track them as work items with the same visibility as feature work. Make the action items specific and completable. “Improve testing” is not an action item. “Automate the login flow regression test” is. See Retrospectives.

Team Instability

What it looks like: Team members are frequently reassigned to other projects. New people join and leave every few sprints. The team never builds shared context or working agreements.

Why it blocks CD: CD practices depend on team discipline and shared understanding. TBD requires trust between developers. Code review speed depends on familiarity with the codebase. Working agreements require a stable group to establish and maintain. Constantly reshuffling teams means constantly restarting the journey.

What to do: Advocate for stable, long-lived teams. The team should own a product or service for its full lifecycle, not be assembled for a project and disbanded when it ends.

One Delivery per Sprint

What it looks like: The team delivers to production once per sprint, typically at the end. All stories from the sprint are bundled into a single release. The “sprint demo” is the first time stakeholders see working software.

Why it blocks CD: One delivery per sprint is not continuous delivery. It is a two-week batch release with Agile terminology. If something breaks in the batch, any of the changes could be the cause. Rollback means losing the entire sprint’s work. Feedback is delayed by weeks.

What to do: Start deploying individual stories as they are completed, not at the end of the sprint. This requires a working CI pipeline, trunk-based development, and the ability to deploy independently. These are the outcomes of Phase 1 and Phase 2.

Anti-Patterns Summary

The table below maps each common blocker to its root cause and the migration phase that addresses it.

Blocker Root Cause Migration Phase
Stories take a week or more No vertical slicing discipline Phase 1 – Work Decomposition
Too much WIP No WIP limits; starting over finishing Phase 3 – Limiting WIP
Hardening sprints Quality not built in during development Phase 1 – Testing Fundamentals
Manual regression testing Test automation insufficient Phase 1 – Testing Fundamentals
One delivery per sprint Batch mindset; no pipeline Phase 2 – Pipeline
Meaningless retrospectives No accountability for improvement actions Phase 3 – Retrospectives
Velocity as productivity metric Measuring output instead of outcomes Phase 3 – Metrics-Driven Improvement
Team instability Organizational project-based staffing Organizational change (all phases)
Distant date commitments Fixed-scope commitments made too early Incremental delivery + stakeholder education
Flaky tests tolerated Tests not maintained as production code Phase 1 – Testing Fundamentals
Long-lived feature branches No TBD practice Phase 1 – Trunk-Based Development
Manual deployments No deployment automation Phase 2 – Single Path to Production

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

4 - DORA Capabilities

The capabilities that drive software delivery performance, as identified by DORA research.

Adapted from Dojo Consortium

The DevOps Research and Assessment (DORA) research program has identified capabilities that predict high software delivery performance. These capabilities are not tools or technologies – they are practices and cultural conditions that enable teams to deliver software quickly, reliably, and sustainably.

This page organizes the DORA capabilities by their relevance to each migration phase. Use it as a reference to understand which capabilities you are building at each stage of your journey and which ones to focus on next.

Continuous Delivery Capabilities

These capabilities directly support the mechanics of getting software from commit to production. They are the primary focus of Phases 1 and 2 of the migration.

Version Control

All production artifacts – application code, test code, infrastructure configuration, deployment scripts, and database schemas – are stored in version control and can be reproduced from a single source of truth.

Migration relevance: This is a prerequisite for Phase 1. If any part of your delivery process depends on files stored on a specific person’s machine or a shared drive, address that before beginning the migration.

Continuous Integration

Developers integrate their work to trunk at least daily. Each integration triggers an automated build and test process. Broken builds are fixed within minutes.

Migration relevance: Phase 1 – Foundations. CI is the gateway capability. Without it, none of the pipeline practices in Phase 2 can function. See Build Automation and Trunk-Based Development.

Deployment Automation

Deployments are fully automated and can be triggered by anyone on the team. No manual steps are required between a green pipeline and production.

Migration relevance: Phase 2 – Pipeline. Specifically, Single Path to Production and Rollback.

Trunk-Based Development

Developers work in small batches and merge to trunk at least daily. Branches, if used, are short-lived (less than one day). There are no long-lived feature branches.

Migration relevance: Phase 1 – Trunk-Based Development. This is one of the first capabilities to establish because it enables CI.

Test Automation

A comprehensive suite of automated tests provides confidence that the software is deployable. Tests are reliable, fast, and maintained as carefully as production code.

Migration relevance: Phase 1 – Testing Fundamentals. Also see the Testing reference section for guidance on specific test types.

Test Data Management

Test data is managed in a way that allows automated tests to run independently, repeatably, and without relying on shared mutable state. Tests can create and clean up their own data.

Migration relevance: Becomes critical during Phase 2 when you need production-like environments and deterministic pipeline results.

Shift Left on Security

Security is integrated into the development process rather than added as a gate at the end. Automated security checks run in the pipeline. Security requirements are part of the definition of deployable.

Migration relevance: Integrated during Phase 2 – Pipeline Architecture as automated quality gates rather than manual review steps.

Architecture Capabilities

These capabilities address the structural characteristics of your system that enable or prevent independent, frequent deployment.

Loosely Coupled Architecture

Teams can deploy their services independently without coordinating with other teams. Changes to one service do not require changes to other services. APIs have well-defined contracts.

Migration relevance: Phase 3 – Architecture Decoupling. This capability becomes critical when optimizing for deployment frequency and small batch sizes.

Empowered Teams

Teams choose their own tools, technologies, and approaches within organizational guardrails. They do not need approval from a central architecture board for implementation decisions.

Migration relevance: All phases. Teams that cannot make local decisions about their pipeline, test strategy, or deployment approach will be unable to iterate quickly enough to make progress.

Product and Process Capabilities

These capabilities address how work is planned, prioritized, and delivered.

Customer Feedback

Product decisions are informed by direct feedback from customers. Teams can observe how features are used in production and adjust accordingly.

Migration relevance: Becomes fully enabled in Phase 4 – Deliver on Demand when every change reaches production quickly enough for real customer feedback to inform the next change.

Value Stream Visibility

The team has a clear view of the entire delivery process from request to production, including wait times, handoffs, and rework loops.

Migration relevance: Phase 0 – Value Stream Mapping. This is the first activity in the migration because it informs every decision that follows.

Working in Small Batches

Work is broken down into small increments that can be completed, tested, and deployed independently. Each increment delivers measurable value or validated learning.

Migration relevance: Begins in Phase 1 – Work Decomposition and is optimized in Phase 3 – Small Batches.

Team Experimentation

Teams can try new ideas, tools, and approaches without requiring approval through a lengthy process. Failed experiments are treated as learning, not as waste.

Migration relevance: All phases. The migration itself is an experiment. Teams need the psychological safety and organizational support to try new practices, fail occasionally, and adjust.

Lean Management Capabilities

These capabilities address how work is managed, measured, and improved.

Limit Work in Progress

Teams have explicit WIP limits that constrain the number of items in any stage of the delivery process. WIP limits are enforced and respected.

Migration relevance: Phase 3 – Limiting WIP. Reducing WIP is one of the most effective ways to improve lead time and delivery predictability.

Visual Management

The state of all work is visible to the entire team through dashboards, boards, or other visual tools. Anyone can see what is in progress, what is blocked, and what has been deployed.

Migration relevance: All phases. Visual management supports the identification of constraints in Phase 0 and the enforcement of WIP limits in Phase 3.

Monitoring and Observability

Teams have access to production metrics, logs, and traces that allow them to understand system behavior, detect issues, and diagnose problems quickly.

Migration relevance: Critical for Phase 4 – Progressive Rollout where automated health checks determine whether a deployment proceeds or rolls back. Also supports fast mean time to restore.

Proactive Notification

Teams are alerted to problems before customers are affected. Monitoring thresholds and anomaly detection trigger notifications that enable rapid response.

Migration relevance: Becomes critical in Phase 4 when deployments are continuous and automated. Proactive notification is what makes continuous deployment safe.

Cultural Capabilities

These capabilities address the human and organizational conditions that enable high performance.

Generative Culture

Following Ron Westrum’s organizational typology, a generative culture is characterized by high cooperation, shared risk, and a focus on the mission. Messengers are not punished. Failures are treated as learning opportunities. New ideas are welcomed.

Migration relevance: All phases. A generative culture is not a phase you implement – it is a condition you cultivate continuously. Teams in pathological or bureaucratic cultures will struggle with every phase of the migration because practices like TBD and CI require trust and psychological safety.

Learning Culture

The organization invests in learning. Teams have time for experimentation, training, and conference attendance. Knowledge is shared across teams.

Migration relevance: All phases. The CD migration is a learning journey. Teams need time and space to learn new practices, make mistakes, and improve.

Collaboration Among Teams

Development, operations, security, and product teams work together rather than in silos. Handoffs are minimized. Shared responsibility replaces blame.

Migration relevance: All phases, but especially Phase 2 – Pipeline where the pipeline must encode the quality criteria from all disciplines (security, testing, operations) into automated gates.

Job Satisfaction

Team members find their work meaningful and have the autonomy and resources to do it well. High job satisfaction predicts high delivery performance (the relationship is bidirectional).

Migration relevance: The migration itself should improve job satisfaction by reducing toil, eliminating painful manual processes, and giving teams faster feedback on their work. If the migration is experienced as a burden rather than an improvement, something is wrong with the approach.

Transformational Leadership

Leaders support the migration with vision, resources, and organizational air cover. They remove impediments, set direction, and create the conditions for teams to succeed without micromanaging the details.

Migration relevance: All phases. Without leadership support, the migration will stall when it encounters the first organizational blocker (budget for tools, policy changes for deployment processes, cross-team coordination).

Capability Maturity by Phase

The following table maps each DORA capability to the migration phase where it is most actively developed:

Capability Phase 0 Phase 1 Phase 2 Phase 3 Phase 4
Version control Prerequisite
Continuous integration Primary
Deployment automation Primary
Trunk-based development Primary
Test automation Primary Expanded
Test data management Primary
Shift left on security Primary
Loosely coupled architecture Primary
Empowered teams Ongoing Ongoing Ongoing Ongoing Ongoing
Customer feedback Primary
Value stream visibility Primary Revisited
Working in small batches Started Primary
Team experimentation Ongoing Ongoing Ongoing Ongoing Ongoing
Limit WIP Primary
Visual management Started Ongoing Ongoing Ongoing Ongoing
Monitoring and observability Started Expanded Primary
Proactive notification Primary
Generative culture Ongoing Ongoing Ongoing Ongoing Ongoing
Learning culture Ongoing Ongoing Ongoing Ongoing Ongoing
Collaboration among teams Started Primary
Job satisfaction Ongoing Ongoing Ongoing Ongoing Ongoing
Transformational leadership Ongoing Ongoing Ongoing Ongoing Ongoing

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

5 - Resources

Books, videos, and further reading on continuous delivery and deployment.

Adapted from MinimumCD.org

This page collects the books, websites, and videos that inform the practices in this migration guide. Resources are organized by topic and annotated with which migration phase they are most relevant to.

Books

Continuous Delivery and Deployment

Continuous Delivery Pipelines by Dave Farley
A practical, focused guide to building CD pipelines. Farley covers pipeline design, testing strategies, and deployment patterns in a direct, implementation-oriented style. Start here if you want a concise guide to the pipeline practices in Phase 2.
Most relevant to: Phase 2 – Pipeline
Continuous Delivery by Jez Humble and Dave Farley
The foundational text on CD. Published in 2010, it remains the most comprehensive treatment of the principles and practices that make continuous delivery work. Covers version control patterns, build automation, testing strategies, deployment pipelines, and release management. If you read one book before starting your migration, read this one.
Most relevant to: All phases
Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim
Presents the DORA research findings that link technical practices to organizational performance. Covers the four key metrics (deployment frequency, lead time, change failure rate, MTTR) and the capabilities that predict high performance. Essential reading for anyone who needs to make the business case for a CD migration.
Most relevant to: Phase 0 – Assess and Phase 3 – Metrics-Driven Improvement
Engineering the Digital Transformation by Gary Gruver
Addresses the organizational and leadership challenges of large-scale delivery transformation. Gruver draws on his experience leading transformations at HP and other large enterprises. Particularly valuable for leaders sponsoring a migration who need to understand the change management, communication, and sequencing challenges ahead.
Most relevant to: Organizational leadership across all phases
Release It! by Michael T. Nygard
Covers the design and architecture patterns that make production systems resilient. Topics include stability patterns (circuit breakers, bulkheads, timeouts), deployment patterns, and the operational realities of running software at scale. Essential reading before entering Phase 4, where the team has the capability to deploy any change on demand.
Most relevant to: Phase 4 – Deliver on Demand and Phase 2 – Rollback
The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis
A practical companion to The Phoenix Project. Covers the Three Ways (flow, feedback, and continuous learning) and provides detailed guidance on implementing DevOps practices. Useful as a reference throughout the migration.
Most relevant to: All phases
The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford
A novel that illustrates DevOps principles through the story of a fictional IT organization in crisis. Useful for building organizational understanding of why delivery improvement matters, especially for stakeholders who will not read a technical book.
Most relevant to: Building organizational buy-in during Phase 0

Testing

Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce
The definitive guide to test-driven development in practice. Goes beyond unit testing to cover acceptance testing, test doubles, and how TDD drives design. Essential reading for Phase 1 testing fundamentals.
Most relevant to: Phase 1 – Testing Fundamentals
Working Effectively with Legacy Code by Michael Feathers
Practical techniques for adding tests to untested code, breaking dependencies, and incrementally improving code that was not designed for testability. Indispensable if your migration starts with a codebase that has little or no automated testing.
Most relevant to: Phase 1 – Testing Fundamentals

Work Decomposition and Flow

User Story Mapping by Jeff Patton
A practical guide to breaking features into deliverable increments using story maps. Patton’s approach directly supports the vertical slicing discipline required for small batch delivery.
Most relevant to: Phase 1 – Work Decomposition
The Principles of Product Development Flow by Donald Reinertsen
A rigorous treatment of flow economics in product development. Covers queue theory, batch size economics, WIP limits, and the cost of delay. Dense but transformative. Reading this book will change how you think about every aspect of your delivery process.
Most relevant to: Phase 3 – Optimize
Making Work Visible by Dominica DeGrandis
Focuses on identifying and eliminating the “time thieves” that steal productivity: too much WIP, unknown dependencies, unplanned work, conflicting priorities, and neglected work. A practical companion to the WIP limiting practices in Phase 3.
Most relevant to: Phase 3 – Limiting WIP

Architecture

Building Microservices by Sam Newman
Covers the architectural patterns that enable independent deployment, including service boundaries, API design, data management, and testing strategies for distributed systems.
Most relevant to: Phase 3 – Architecture Decoupling
Team Topologies by Matthew Skelton and Manuel Pais
Addresses the relationship between team structure and software architecture (Conway’s Law in practice). Covers team types, interaction modes, and how to evolve team structures to support fast flow. Valuable for addressing the organizational blockers that surface throughout the migration.
Most relevant to: Organizational design across all phases

Websites

MinimumCD.org
Defines the minimum set of practices required to claim you are doing continuous delivery. This migration guide uses the MinimumCD definition as its target state. Start here to understand what CD actually requires.
Dojo Consortium
A community-maintained collection of CD practices, metrics definitions, and improvement patterns. Many of the definitions and frameworks in this guide are adapted from the Dojo Consortium’s work.
DORA (dora.dev)
The DevOps Research and Assessment site, which publishes the annual State of DevOps report and provides resources for measuring and improving delivery performance.
Trunk-Based Development
The comprehensive reference for trunk-based development patterns. Covers short-lived feature branches, feature flags, branch by abstraction, and release branching strategies.
Martin Fowler’s blog (martinfowler.com)
Martin Fowler’s site contains authoritative articles on continuous integration, continuous delivery, microservices, refactoring, and software design. Key articles include “Continuous Integration” and “Continuous Delivery.”
Google Cloud Architecture Center – DevOps
Google’s public documentation of the DORA capabilities, including self-assessment tools and implementation guidance.

Videos

“Continuous Delivery” by Dave Farley (YouTube channel)
Dave Farley’s YouTube channel provides weekly videos covering CD practices, pipeline design, testing strategies, and software engineering principles. Accessible and practical.
Most relevant to: All phases
“Continuous Delivery” by Jez Humble (various conference talks)
Jez Humble’s conference presentations cover the principles and research behind CD. His talk “Why Continuous Delivery?” is an excellent introduction for teams and stakeholders who are new to the concept.
Most relevant to: Building understanding during Phase 0
“Refactoring” and “TDD” talks by Martin Fowler and Kent Beck
Foundational talks on the development practices that support CD. Understanding TDD and refactoring is essential for Phase 1 testing fundamentals.
Most relevant to: Phase 1 – Foundations
“The Smallest Thing That Could Possibly Work” by Bryan Finster
Covers the work decomposition and small batch delivery practices that are central to this migration guide. Focuses on practical techniques for breaking work into vertical slices.
Most relevant to: Phase 1 – Work Decomposition and Phase 3 – Small Batches

If you are starting your migration and want to read in the most useful order:

  1. Accelerate – to understand the research and build the business case
  2. Continuous Delivery (Humble & Farley) – to understand the full picture
  3. Continuous Delivery Pipelines (Farley) – for practical pipeline implementation
  4. Working Effectively with Legacy Code – if your codebase lacks tests
  5. The Principles of Product Development Flow – to understand flow optimization
  6. Release It! – before moving to continuous deployment

This content is adapted from MinimumCD.org, licensed under CC BY 4.0.

6 - Metrics

Detailed definitions for key delivery metrics. Understand what to measure and why.

Adapted from Dojo Consortium

These metrics help you assess your current delivery performance and track improvement over time. Start with the metrics most relevant to your current phase.

Key Metrics

Metric What It Measures
Integration Frequency How often code is integrated to trunk
Build Duration Time from commit to artifact creation
Development Cycle Time Time from starting work to delivery
Lead Time Time from request to delivery
Change Fail Rate Percentage of changes requiring remediation
Mean Time to Repair Time to restore service after failure
Release Frequency How often releases reach production
Work in Progress Amount of started but unfinished work

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6.1 - Integration Frequency

How often developers integrate code changes to the trunk – a leading indicator of CI maturity and small batch delivery.

Adapted from Dojo Consortium

Definition

Integration Frequency measures the average number of production-ready pull requests a team merges to trunk per day, normalized by team size. On a team of five developers, healthy continuous integration practice produces at least five integrations per day – roughly one per developer.

This metric is a direct indicator of how well a team practices Continuous Integration. Teams that integrate frequently work in small batches, receive fast feedback, and reduce the risk associated with large, infrequent merges.

integrationFrequency = mergedPullRequests / day / numberOfDevelopers

A value of 1.0 or higher per developer per day indicates that work is being decomposed into small, independently deliverable increments.

How to Measure

  1. Count trunk merges. Track the number of pull requests (or direct commits) merged to main or trunk each day.
  2. Normalize by team size. Divide the daily count by the number of developers actively contributing that day.
  3. Calculate the rolling average. Use a 5-day or 10-day rolling window to smooth daily variation and surface meaningful trends.

Most source control platforms expose this data through their APIs:

  • GitHub – list merged pull requests via the REST or GraphQL API.
  • GitLab – query merged merge requests per project.
  • Bitbucket – use the pull request activity endpoint.

Alternatively, count commits to the default branch if pull requests are not used.

Targets

Level Integration Frequency (per developer per day)
Low Less than 1 per week
Medium A few times per week
High Once per day
Elite Multiple times per day

The elite target aligns with trunk-based development, where developers push small changes to the trunk multiple times daily and rely on automated testing and feature flags to manage risk.

Common Pitfalls

  • Meaningless commits. Teams may inflate the count by integrating trivial or empty changes. Pair this metric with code review quality and defect rate.
  • Breaking the trunk. Pushing faster without adequate test coverage leads to a red build and slows the entire team. Always pair Integration Frequency with build success rate and Change Fail Rate.
  • Counting the wrong thing. Merges to long-lived feature branches do not count. Only merges to the trunk or main integration branch reflect true CI practice.
  • Ignoring quality. If defect rates rise as integration frequency increases, the team is skipping quality steps. Use defect rate as a guardrail metric.

Connection to CD

Integration Frequency is the foundational metric for Continuous Delivery. Without frequent integration, every downstream metric suffers:

  • Smaller batches reduce risk. Each integration carries less change, making failures easier to diagnose and fix.
  • Faster feedback loops. Frequent integration means the CI pipeline runs more often, catching issues within minutes instead of days.
  • Enables trunk-based development. High integration frequency is incompatible with long-lived branches. Teams naturally move toward short-lived branches or direct trunk commits.
  • Reduces merge conflicts. The longer code stays on a branch, the more likely it diverges from trunk. Frequent integration keeps the delta small.
  • Prerequisite for deployment frequency. You cannot deploy more often than you integrate. Improving this metric directly unblocks improvements to Release Frequency.

To improve Integration Frequency:

  • Decompose stories into smaller increments using Behavior-Driven Development.
  • Use Test-Driven Development to produce modular, independently testable code.
  • Adopt feature flags or branch by abstraction to decouple integration from release.
  • Practice Trunk-Based Development with short-lived branches lasting less than one day.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6.2 - Build Duration

Time from code commit to a deployable artifact – a critical constraint on feedback speed and mean time to repair.

Adapted from Dojo Consortium

Definition

Build Duration measures the elapsed time from when a developer pushes a commit until the CI pipeline produces a deployable artifact and all automated quality gates have passed. This includes compilation, unit tests, integration tests, static analysis, security scans, and artifact packaging.

Build Duration represents the minimum possible time between deciding to make a change and having that change ready for production. It sets a hard floor on Lead Time and directly constrains how quickly a team can respond to production incidents.

buildDuration = artifactReadyTimestamp - commitPushTimestamp

This metric is sometimes referred to as “pipeline cycle time” or “CI cycle time.” The book Accelerate references it as part of “hard lead time.”

How to Measure

  1. Record the commit timestamp. Capture when the commit arrives at the CI server (webhook receipt or pipeline trigger time).
  2. Record the artifact-ready timestamp. Capture when the final pipeline stage completes successfully and the deployable artifact is published.
  3. Calculate the difference. Subtract the commit timestamp from the artifact-ready timestamp.
  4. Track the median and p95. The median shows typical performance. The 95th percentile reveals worst-case builds that block developers.

Most CI platforms expose build duration natively:

  • GitHub ActionscreatedAt and updatedAt on workflow runs.
  • GitLab CI – pipeline created_at and finished_at.
  • Jenkins – build start time and duration fields.
  • CircleCI – workflow duration in the Insights dashboard.

Set up alerts when builds exceed your target threshold so the team can investigate regressions immediately.

Targets

Level Build Duration
Low More than 30 minutes
Medium 10 – 30 minutes
High 5 – 10 minutes
Elite Less than 5 minutes

The ten-minute threshold is a widely recognized guideline. Builds longer than ten minutes break developer flow, discourage frequent integration, and increase the cost of fixing failures.

Common Pitfalls

  • Removing tests to hit targets. Reducing test count or skipping test types (integration, security) lowers build duration but degrades quality. Always pair this metric with Change Fail Rate and defect rate.
  • Ignoring queue time. If builds wait in a queue before execution, the developer experiences the queue time as part of the feedback delay even though it is not technically “build” time. Measure wall-clock time from commit to result.
  • Optimizing the wrong stage. Profile the pipeline before optimizing. Often a single slow test suite or a sequential step that could run in parallel dominates the total duration.
  • Flaky tests. Tests that intermittently fail cause retries, effectively doubling or tripling build duration. Track flake rate alongside build duration.

Connection to CD

Build Duration is a critical bottleneck in the Continuous Delivery pipeline:

  • Constrains Mean Time to Repair. When production is down, the build pipeline is the minimum time to get a fix deployed. A 30-minute build means at least 30 minutes of downtime for any fix, no matter how small. Reducing build duration directly improves MTTR.
  • Enables frequent integration. Developers are unlikely to integrate multiple times per day if each integration takes 30 minutes to validate. Short builds encourage higher Integration Frequency.
  • Shortens feedback loops. The sooner a developer learns that a change broke something, the less context they have lost and the cheaper the fix. Builds under ten minutes keep developers in flow.
  • Supports continuous deployment. Automated deployment pipelines cannot deliver changes rapidly if the build stage is slow. Build duration is often the largest component of Lead Time.

To improve Build Duration:

  • Parallelize stages. Run unit tests, linting, and security scans concurrently rather than sequentially.
  • Replace slow end-to-end tests. Move heavyweight end-to-end tests to an asynchronous post-deploy verification stage. Use contract tests and service virtualization in the main pipeline.
  • Decompose large services. Smaller codebases compile and test faster. If build duration is stubbornly high, consider breaking the service into smaller domains.
  • Cache aggressively. Cache dependencies, Docker layers, and compilation artifacts between builds.
  • Set a build time budget. Alert the team whenever a new test or step pushes the build past your target, so test efficiency is continuously maintained.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6.3 - Development Cycle Time

Average time from when work starts until it is running in production – a key flow metric for identifying delivery bottlenecks.

Adapted from Dojo Consortium

Definition

Development Cycle Time measures the elapsed time from when a developer begins work on a story or task until that work is deployed to production and available to users. It captures the full construction phase of delivery: coding, code review, testing, integration, and deployment.

developmentCycleTime = productionDeployTimestamp - workStartedTimestamp

This is distinct from Lead Time, which includes the time a request spends waiting in the backlog before work begins. Development Cycle Time focuses exclusively on the active delivery phase.

The Accelerate research uses “lead time for changes” (measured from commit to production) as a key DORA metric. Development Cycle Time extends this slightly further back to when work starts, capturing the full development process including any time between starting work and the first commit.

How to Measure

  1. Record when work starts. Capture the timestamp when a story moves to “In Progress” in your issue tracker, or when the first commit for the story appears.
  2. Record when work reaches production. Capture the timestamp of the production deployment that includes the completed story.
  3. Calculate the difference. Subtract the start time from the production deploy time.
  4. Report the median and distribution. The median provides a typical value. The distribution (or a control chart) reveals variability and outliers that indicate process problems.

Sources for this data include:

  • Issue trackers (Jira, GitHub Issues, Azure Boards) – status transition timestamps.
  • Source control – first commit timestamp associated with a story.
  • Deployment logs – timestamp of production deployments linked to stories.

Linking stories to deployments is essential. Use commit message conventions (e.g., story IDs in commit messages) or deployment metadata to create this connection.

Targets

Level Development Cycle Time
Low More than 2 weeks
Medium 1 – 2 weeks
High 2 – 7 days
Elite Less than 2 days

Elite teams deliver completed work to production within one to two days of starting it. This is achievable only when work is decomposed into small increments, the pipeline is fast, and deployment is automated.

Common Pitfalls

  • Marking work “Done” before it reaches production. If “Done” means “code complete” rather than “deployed,” the metric understates actual cycle time. The Definition of Done must include production deployment.
  • Skipping the backlog. Moving items from “Backlog” directly to “Done” after deploying hides the true wait time and development duration. Ensure stories pass through the standard workflow stages.
  • Splitting work into functional tasks. Breaking a story into separate “development,” “testing,” and “deployment” tasks obscures the end-to-end cycle time. Measure at the story or feature level.
  • Ignoring variability. A low average can hide a bimodal distribution where some stories take hours and others take weeks. Use a control chart or histogram to expose the full picture.
  • Optimizing for speed without quality. If cycle time drops but Change Fail Rate rises, the team is cutting corners. Use quality metrics as guardrails.

Connection to CD

Development Cycle Time is the most comprehensive measure of delivery flow and sits at the heart of Continuous Delivery:

  • Exposes bottlenecks. A long cycle time reveals where work gets stuck – waiting for code review, queued for testing, blocked by a manual approval, or delayed by a slow pipeline. Each bottleneck is a target for improvement.
  • Drives smaller batches. The only way to achieve a cycle time under two days is to decompose work into very small increments. This naturally leads to smaller changes, less risk, and faster feedback.
  • Reduces waste from changing priorities. Long cycle times mean work in progress is exposed to priority changes, context switches, and scope creep. Shorter cycles reduce the window of vulnerability.
  • Improves feedback quality. The sooner a change reaches production, the sooner the team gets real user feedback. Short cycle times enable rapid learning and course correction.
  • Subsumes other metrics. Cycle time is affected by Integration Frequency, Build Duration, and Work in Progress. Improving any of these upstream metrics will reduce cycle time.

To improve Development Cycle Time:

  • Decompose work into stories that can be completed and deployed within one to two days.
  • Remove handoffs between teams (e.g., separate dev and QA teams).
  • Automate the build and deploy pipeline to eliminate manual steps.
  • Improve test design so the pipeline runs faster without sacrificing coverage.
  • Limit Work in Progress so the team focuses on finishing work rather than starting new items.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6.4 - Lead Time

Total time from when a change is committed until it is running in production – a DORA key metric for delivery throughput.

Adapted from Dojo Consortium

Definition

Lead Time measures the total elapsed time from when a code change is committed to the version control system until that change is successfully running in production. This is one of the four key metrics identified by the DORA (DevOps Research and Assessment) team as a predictor of software delivery performance.

leadTime = productionDeployTimestamp - commitTimestamp

In the broader value stream, “lead time” can also refer to the time from a customer request to delivery. The DORA definition focuses specifically on the segment from commit to production, which the Accelerate research calls “lead time for changes.” This narrower definition captures the efficiency of your delivery pipeline and deployment process.

Lead Time includes Build Duration plus any additional time for deployment, approval gates, environment provisioning, and post-deploy verification. It is a superset of build time and a subset of Development Cycle Time, which also includes the coding phase before the first commit.

How to Measure

  1. Record the commit timestamp. Use the timestamp of the commit as recorded in source control (not the local author timestamp, but the time it was pushed or merged to the trunk).
  2. Record the production deployment timestamp. Capture when the deployment containing that commit completes successfully in production.
  3. Calculate the difference. Subtract the commit time from the deploy time.
  4. Aggregate across commits. Report the median lead time across all commits deployed in a given period (daily, weekly, or per release).

Data sources:

  • Source control – commit or merge timestamps from Git, GitHub, GitLab, etc.
  • CI/CD platform – pipeline completion times from Jenkins, GitHub Actions, GitLab CI, etc.
  • Deployment tooling – production deployment timestamps from Argo CD, Spinnaker, Flux, or custom scripts.

For teams practicing continuous deployment, lead time may be nearly identical to build duration. For teams with manual approval gates or scheduled release windows, lead time will be significantly longer.

Targets

Level Lead Time for Changes
Low More than 6 months
Medium 1 – 6 months
High 1 day – 1 week
Elite Less than 1 hour

These levels are drawn from the DORA State of DevOps research. Elite performers deliver changes to production in under an hour from commit, enabled by fully automated pipelines and continuous deployment.

Common Pitfalls

  • Measuring only build time. Lead time includes everything after the commit, not just the CI pipeline. Manual approval gates, scheduled deployment windows, and environment provisioning delays must all be included.
  • Ignoring waiting time. A change may sit in a queue waiting for a release train, a change advisory board (CAB) review, or a deployment window. This wait time is part of lead time and often dominates the total.
  • Tracking requests instead of commits. Some teams measure from customer request to delivery. While valuable, this conflates backlog prioritization with delivery efficiency. Keep this metric focused on the commit-to-production segment.
  • Hiding items from the backlog. Requests tracked in spreadsheets or side channels before entering the backlog distort lead time measurements. Ensure all work enters the system of record promptly.
  • Reducing quality to reduce lead time. Shortening approval processes or skipping test stages reduces lead time at the cost of quality. Pair this metric with Change Fail Rate as a guardrail.

Connection to CD

Lead Time is one of the four DORA metrics and a direct measure of your delivery pipeline’s end-to-end efficiency:

  • Reveals pipeline bottlenecks. A large gap between build duration and lead time points to manual processes, approval queues, or deployment delays that the team can target for automation.
  • Measures the cost of failure recovery. When production breaks, lead time is the minimum time to deliver a fix (unless you roll back). This makes lead time a direct input to Mean Time to Repair.
  • Drives automation. The primary way to reduce lead time is to automate every step between commit and production: build, test, security scanning, environment provisioning, deployment, and verification.
  • Reflects deployment strategy. Teams using continuous deployment have lead times measured in minutes. Teams using weekly release trains have lead times measured in days. The metric makes the cost of batching visible.
  • Connects speed and stability. The DORA research shows that elite performers achieve both low lead time and low Change Fail Rate. Speed and quality are not trade-offs – they reinforce each other when the delivery system is well-designed.

To improve Lead Time:

  • Automate the deployment pipeline end to end, eliminating manual gates.
  • Replace change advisory board (CAB) reviews with automated policy checks and peer review.
  • Deploy on every successful build rather than batching changes into release trains.
  • Reduce Build Duration to shrink the largest component of lead time.
  • Monitor and eliminate environment provisioning delays.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6.5 - Change Fail Rate

Percentage of production deployments that cause a failure or require remediation – a DORA key metric for delivery stability.

Adapted from Dojo Consortium

Definition

Change Fail Rate measures the percentage of deployments to production that result in degraded service, negative customer impact, or require immediate remediation such as a rollback, hotfix, or patch.

changeFailRate = failedChangeCount / totalChangeCount * 100

A “failed change” includes any deployment that:

  • Is rolled back.
  • Requires a hotfix deployed within a short window (commonly 24 hours).
  • Triggers a production incident attributed to the change.
  • Requires manual intervention to restore service.

This is one of the four DORA key metrics. It measures the stability side of delivery performance, complementing the throughput metrics of Lead Time and Release Frequency.

How to Measure

  1. Count total production deployments over a defined period (weekly, monthly).
  2. Count deployments classified as failures using the criteria above.
  3. Divide failures by total deployments and express as a percentage.

Data sources:

  • Deployment logs – total deployment count from your CD platform.
  • Incident management – incidents linked to specific deployments (PagerDuty, Opsgenie, ServiceNow).
  • Rollback records – deployments that were reverted, either manually or by automated rollback.
  • Hotfix tracking – deployments tagged as hotfixes or emergency changes.

Automate the classification where possible. For example, if a deployment is followed by another deployment of the same service within a defined window (e.g., one hour), flag the original as a potential failure for review.

Targets

Level Change Fail Rate
Low 46 – 60%
Medium 16 – 45%
High 0 – 15%
Elite 0 – 5%

These levels are drawn from the DORA State of DevOps research. Elite performers maintain a change fail rate below 5%, meaning fewer than 1 in 20 deployments causes a problem.

Common Pitfalls

  • Not recording failures. Deploying fixes without logging the original failure understates the true rate. Ensure every incident and rollback is tracked.
  • Reclassifying defects. Creating review processes that reclassify production defects as “feature requests” or “known limitations” hides real failures.
  • Inflating deployment count. Re-deploying the same working version to increase the denominator artificially lowers the rate. Only count deployments that contain new changes.
  • Pursuing zero defects at the cost of speed. An obsessive focus on eliminating all failures can slow Release Frequency to a crawl. A small failure rate with fast recovery is preferable to near-zero failures with monthly deployments.
  • Ignoring near-misses. Changes that cause degraded performance but do not trigger a full incident are still failures. Define clear criteria for what constitutes a failed change and apply them consistently.

Connection to CD

Change Fail Rate is the primary quality signal in a Continuous Delivery pipeline:

  • Validates pipeline quality gates. A rising change fail rate indicates that the automated tests, security scans, and quality checks in the pipeline are not catching enough defects. Each failure is an opportunity to add or improve a quality gate.
  • Enables confidence in frequent releases. Teams will only deploy frequently if they trust the pipeline. A low change fail rate builds this trust and supports higher Release Frequency.
  • Smaller changes fail less. The DORA research consistently shows that smaller, more frequent deployments have lower failure rates than large, infrequent releases. Improving Integration Frequency naturally improves this metric.
  • Drives root cause analysis. Each failed change should trigger a blameless investigation: what automated check could have caught this? The answers feed directly into pipeline improvements.
  • Balances throughput metrics. Change Fail Rate is the essential guardrail for Lead Time and Release Frequency. If those metrics improve while change fail rate worsens, the team is trading quality for speed.

To improve Change Fail Rate:

  • Deploy smaller changes more frequently to reduce the blast radius of failures.
  • Identify the root cause of each failure and add automated checks to prevent recurrence.
  • Strengthen the test suite, particularly integration and contract tests that validate interactions between services.
  • Implement progressive delivery (canary releases, feature flags) to limit the impact of defective changes before they reach all users.
  • Conduct blameless post-incident reviews and feed learnings back into the delivery pipeline.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6.6 - Mean Time to Repair

Average time from when a production incident is detected until service is restored – a DORA key metric for recovery capability.

Adapted from Dojo Consortium

Definition

Mean Time to Repair (MTTR) measures the average elapsed time between when a production incident is detected and when it is fully resolved and service is restored to normal operation.

mttr = sum(resolvedTimestamp - detectedTimestamp) / incidentCount

MTTR reflects an organization’s ability to recover from failure. It encompasses detection, diagnosis, fix development, build, deployment, and verification. A short MTTR depends on the entire delivery system working well – fast builds, automated deployments, good observability, and practiced incident response.

The Accelerate research identifies MTTR as one of the four key DORA metrics and notes that “software delivery performance is a combination of lead time, release frequency, and MTTR.” It is the stability counterpart to the throughput metrics.

How to Measure

  1. Record the detection timestamp. This is when the team first becomes aware of the incident – typically when an alert fires, a customer reports an issue, or monitoring detects an anomaly.
  2. Record the resolution timestamp. This is when the incident is resolved and service is confirmed to be operating normally. Resolution means the customer impact has ended, not merely that a fix has been deployed.
  3. Calculate the duration for each incident.
  4. Compute the average across all incidents in a given period.

Data sources:

  • Incident management platforms – PagerDuty, Opsgenie, ServiceNow, or Statuspage provide incident lifecycle timestamps.
  • Monitoring and alerting – alert trigger times from Datadog, Prometheus Alertmanager, CloudWatch, or equivalent.
  • Deployment logs – timestamps of rollbacks or hotfix deployments.

Report both the mean and the median. The mean can be skewed by a single long outage, so the median gives a better sense of typical recovery time. Also track the maximum MTTR per period to highlight worst-case incidents.

Targets

Level Mean Time to Repair
Low More than 1 week
Medium 1 day – 1 week
High Less than 1 day
Elite Less than 1 hour

Elite performers restore service in under one hour. This requires automated rollback or roll-forward capability, fast build pipelines, and well-practiced incident response processes.

Common Pitfalls

  • Closing incidents prematurely. Marking an incident as resolved before the customer impact has actually ended artificially deflates MTTR. Define “resolved” clearly and verify that service is truly restored.
  • Not counting detection time. If the team discovers a problem informally (e.g., a developer notices something odd) and fixes it before opening an incident, the time is not captured. Encourage consistent incident reporting.
  • Ignoring recurring incidents. If the same issue keeps reappearing, each individual MTTR may be short, but the cumulative impact is high. Track recurrence as a separate quality signal.
  • Conflating MTTR with MTTD. Mean Time to Detect (MTTD) and Mean Time to Repair overlap but are distinct. If you only measure from alert to resolution, you miss the detection gap – the time between when the problem starts and when it is detected. Both matter.
  • Optimizing MTTR without addressing root causes. Getting faster at fixing recurring problems is good, but preventing those problems in the first place is better. Pair MTTR with Change Fail Rate to ensure the number of incidents is also decreasing.

Connection to CD

MTTR is a direct measure of how well the entire Continuous Delivery system supports recovery:

  • Pipeline speed is the floor. The minimum possible MTTR for a roll-forward fix is the Build Duration plus deployment time. A 30-minute build means you cannot restore service via a code fix in less than 30 minutes. Reducing build duration directly reduces MTTR.
  • Automated deployment enables fast recovery. Teams that can deploy with one click or automatically can roll back or roll forward in minutes. Manual deployment processes add significant time to every incident.
  • Feature flags accelerate mitigation. If a failing change is behind a feature flag, the team can disable it in seconds without deploying new code. This can reduce MTTR from minutes to seconds for flag-protected changes.
  • Observability shortens detection and diagnosis. Good logging, metrics, and tracing help the team identify the cause of an incident quickly. Without observability, diagnosis dominates the repair timeline.
  • Practice improves performance. Teams that deploy frequently have more experience responding to issues. High Release Frequency correlates with lower MTTR because the team has well-rehearsed recovery procedures.
  • Trunk-based development simplifies rollback. When trunk is always deployable, the team can roll back to the previous commit. Long-lived branches and complex merge histories make rollback risky and slow.

To improve MTTR:

  • Keep the pipeline always deployable so a fix can be deployed at any time.
  • Reduce Build Duration to enable faster roll-forward.
  • Implement feature flags for large changes so they can be disabled without redeployment.
  • Invest in observability – structured logging, distributed tracing, and meaningful alerting.
  • Practice incident response regularly, including deploying rollbacks and hotfixes.
  • Conduct blameless post-incident reviews and feed learnings back into the pipeline and monitoring.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6.7 - Release Frequency

How often changes are deployed to production – a DORA key metric for delivery throughput and team capability.

Adapted from Dojo Consortium

Definition

Release Frequency (also called Deployment Frequency) measures how often a team successfully deploys changes to production. It is expressed as deployments per day, per week, or per month, depending on the team’s current cadence.

releaseFrequency = productionDeployments / timePeriod

This is one of the four DORA key metrics. It measures the throughput side of delivery performance – how rapidly the team can get completed work into the hands of users. Higher release frequency enables faster feedback, smaller batch sizes, and reduced deployment risk.

Each deployment should deliver a meaningful change. Re-deploying the same artifact or deploying empty changes does not count.

How to Measure

  1. Count production deployments. Record each successful deployment to the production environment over a defined period.
  2. Exclude non-changes. Do not count re-deployments of unchanged artifacts, infrastructure-only changes (unless relevant), or deployments to non-production environments.
  3. Calculate frequency. Divide the count by the time period. Express as deployments per day (for high performers) or per week/month (for teams earlier in their journey).

Data sources:

  • CD platforms – Argo CD, Spinnaker, Flux, Octopus Deploy, or similar tools track every deployment.
  • CI/CD pipeline logs – GitHub Actions, GitLab CI, Jenkins, and CircleCI record deployment job executions.
  • Cloud provider logs – AWS CodeDeploy, Azure DevOps, GCP Cloud Deploy, and Kubernetes audit logs.
  • Custom deployment scripts – Add a logging line that records the timestamp, service name, and version to a central log or metrics system.

Targets

Level Release Frequency
Low Less than once per 6 months
Medium Once per month to once per 6 months
High Once per week to once per month
Elite Multiple times per day

These levels are drawn from the DORA State of DevOps research. Elite performers deploy on demand, multiple times per day, with each deployment containing a small set of changes.

Common Pitfalls

  • Counting empty deployments. Re-deploying the same artifact or building artifacts that contain no changes inflates the metric without delivering value. Count only deployments with meaningful changes.
  • Ignoring failed deployments. If you count deployments that are immediately rolled back, the frequency looks good but the quality is poor. Pair with Change Fail Rate to get the full picture.
  • Equating frequency with value. Deploying frequently is a means, not an end. Deploying 10 times a day delivers no value if the changes do not meet user needs. Release Frequency measures capability, not outcome.
  • Batch releasing to hit a target. Combining multiple changes into a single release to deploy “more often” defeats the purpose. The goal is small, individual changes flowing through the pipeline independently.
  • Focusing on speed without quality. If release frequency increases but Change Fail Rate also increases, the team is releasing faster than its quality processes can support. Slow down and improve the pipeline.

Connection to CD

Release Frequency is the ultimate output metric of a Continuous Delivery pipeline:

  • Validates the entire delivery system. High release frequency is only possible when the pipeline is fast, tests are reliable, deployment is automated, and the team has confidence in the process. It is the end-to-end proof that CD is working.
  • Reduces deployment risk. Each deployment carries less change when deployments are frequent. Less change means less risk, easier rollback, and simpler debugging when something goes wrong.
  • Enables rapid feedback. Frequent releases get features and fixes in front of users sooner. This shortens the feedback loop and allows the team to course-correct before investing heavily in the wrong direction.
  • Exercises recovery capability. Teams that deploy frequently practice the deployment process daily. When a production incident occurs, the deployment process is well-rehearsed and reliable, directly improving Mean Time to Repair.
  • Decouples deploy from release. At high frequency, teams separate the act of deploying code from the act of enabling features for users. Feature flags, progressive delivery, and dark launches become standard practice.

To improve Release Frequency:

  • Reduce Development Cycle Time by decomposing work into smaller increments.
  • Remove manual handoffs to other teams (e.g., ops, QA, change management).
  • Automate every step of the deployment process, from build through production verification.
  • Replace manual change approval boards with automated policy checks and peer review.
  • Convert hard dependencies on other teams or services into soft dependencies using feature flags and service virtualization.
  • Adopt Trunk-Based Development so that trunk is always in a deployable state.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6.8 - Work in Progress

Number of work items started but not yet completed – a leading indicator of flow problems, context switching, and delivery delays.

Adapted from Dojo Consortium

Definition

Work in Progress (WIP) is the total count of work items that have been started but not yet completed and delivered to production. This includes all types of work: stories, defects, tasks, spikes, and any other items that a team member has begun but not finished.

wip = countOf(items where status is between "started" and "done")

WIP is a leading indicator from Lean manufacturing. Unlike trailing metrics such as Development Cycle Time or Lead Time, WIP tells you about problems that are happening right now. High WIP predicts future delivery delays, increased cycle time, and lower quality.

Little’s Law provides the mathematical relationship:

cycleTime = wip / throughput

If throughput (the rate at which items are completed) stays constant, increasing WIP directly increases cycle time. The only way to reduce cycle time without working faster is to reduce WIP.

How to Measure

  1. Count all in-progress items. At a regular cadence (daily or at each standup), count the number of items in any active state on your team’s board. Include everything between “To Do” and “Done.”
  2. Normalize by team size. Divide WIP by the number of team members to get a per-person ratio. This makes the metric comparable across teams of different sizes.
  3. Track over time. Record the WIP count daily and observe trends. A rising WIP count is an early warning of delivery problems.

Data sources:

  • Kanban boards – Jira, Azure Boards, Trello, GitHub Projects, or physical boards. Count cards in any column between the backlog and done.
  • Issue trackers – Query for items with an “In Progress,” “In Review,” “In QA,” or equivalent active status.
  • Manual count – At standup, ask: “How many things are we actively working on right now?”

The simplest and most effective approach is to make WIP visible by keeping the team board up to date and counting active items daily.

Targets

Level WIP per Team
Low More than 2x team size
Medium Between 1x and 2x team size
High Equal to team size
Elite Less than team size (ideally half)

The guiding principle is that WIP should never exceed team size. A team of five should have at most five items in progress at any time. Elite teams often work in pairs, bringing WIP to roughly half the team size.

Common Pitfalls

  • Hiding work. Not moving items to “In Progress” when working on them keeps WIP artificially low. The board must reflect reality. If someone is working on it, it should be visible.
  • Marking items done prematurely. Moving items to “Done” before they are deployed to production understates WIP. The Definition of Done must include production deployment.
  • Creating micro-tasks. Splitting a single story into many small tasks (development, testing, code review, deployment) and tracking each separately inflates the item count without changing the actual work. Measure WIP at the story or feature level.
  • Ignoring unplanned work. Production support, urgent requests, and interruptions consume capacity but are often not tracked on the board. If the team is spending time on it, it is WIP and should be visible.
  • Setting WIP limits but not enforcing them. WIP limits only work if the team actually stops starting new work when the limit is reached. Treat WIP limits as a hard constraint, not a suggestion.

Connection to CD

WIP is the most actionable flow metric and directly impacts every aspect of Continuous Delivery:

  • Predicts cycle time. Per Little’s Law, WIP and cycle time are directly proportional. Reducing WIP is the fastest way to reduce Development Cycle Time without changing anything else about the delivery process.
  • Reduces context switching. When developers juggle multiple items, they lose time switching between contexts. Research consistently shows that each additional item in progress reduces effective productivity. Low WIP means more focus and faster completion.
  • Exposes blockers. When WIP limits are in place and an item gets blocked, the team cannot simply start something new. They must resolve the blocker first. This forces the team to address systemic problems rather than working around them.
  • Enables continuous flow. CD depends on a steady flow of small changes moving through the pipeline. High WIP creates irregular, bursty delivery. Low WIP creates smooth, predictable flow.
  • Improves quality. When teams focus on fewer items, each item gets more attention. Code reviews happen faster, testing is more thorough, and defects are caught sooner. This naturally reduces Change Fail Rate.
  • Supports trunk-based development. High WIP often correlates with many long-lived branches. Reducing WIP encourages developers to complete and integrate work before starting something new, which aligns with Integration Frequency goals.

To reduce WIP:

  • Set explicit WIP limits for the team and enforce them. Start with a limit equal to team size and reduce it over time.
  • Prioritize finishing work over starting new work. At standup, ask “What can I help finish?” before “What should I start?”
  • Prioritize code review and pairing to unblock teammates over picking up new items.
  • Make the board visible and accurate. Use it as the single source of truth for what the team is working on.
  • Identify and address recurring blockers that cause items to stall in progress.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7 - Testing

Testing types, patterns, and best practices for building confidence in your delivery pipeline.

Adapted from Dojo Consortium

A reliable test suite is essential for continuous delivery. These pages cover the different types of tests, when to use each, and best practices for test architecture.

Test Types

Type Purpose
Unit Tests Verify individual components in isolation
Integration Tests Verify components work together
Functional Tests Verify user-facing behavior
End-to-End Tests Verify complete user workflows
Contract Tests Verify API contracts between services
Static Analysis Catch issues without running code
Test Doubles Patterns for isolating dependencies in tests

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7.1 - Unit Tests

Fast, deterministic tests that verify individual functions, methods, or components in isolation with test doubles for dependencies.

Adapted from Dojo Consortium

Definition

A unit test is a deterministic test that exercises a discrete unit of the application – such as a function, method, or UI component – in isolation to determine whether it behaves as expected. All external dependencies are replaced with test doubles so the test runs quickly and produces the same result every time.

When testing the behavior of functions, prefer testing public APIs (methods, interfaces, exported functions) over private internals. Testing private implementation details creates change-detector tests that break during routine refactoring without adding safety.

The purpose of unit tests is to:

  • Verify the functionality of a single unit (method, class, function) in isolation.
  • Cover high-complexity logic where many input permutations exist, such as business rules, calculations, and state transitions.
  • Keep cyclomatic complexity visible and manageable through good separation of concerns.

When to Use

  • During development – run the relevant subset of unit tests continuously while writing code. TDD (Red-Green-Refactor) is the most effective workflow.
  • On every commit – use pre-commit hooks or watch-mode test runners so broken tests never reach the remote repository.
  • In CI – execute the full unit test suite on every pull request and on the trunk after merge to verify nothing was missed locally.

Unit tests are the right choice when the behavior under test can be exercised without network access, file system access, or database connections. If you need any of those, you likely need an integration test or a functional test instead.

Characteristics

Property Value
Speed Milliseconds per test
Determinism Always deterministic
Scope Single function, method, or component
Dependencies All replaced with test doubles
Network None
Database None
Breaks build Yes

Examples

A JavaScript unit test verifying a pure utility function:

// castArray.test.js
describe("castArray", () => {
  it("should wrap non-array items in an array", () => {
    expect(castArray(1)).toEqual([1]);
    expect(castArray("a")).toEqual(["a"]);
    expect(castArray({ a: 1 })).toEqual([{ a: 1 }]);
  });

  it("should return array values by reference", () => {
    const array = [1];
    expect(castArray(array)).toBe(array);
  });

  it("should return an empty array when no arguments are given", () => {
    expect(castArray()).toEqual([]);
  });
});

A Java unit test using Mockito to isolate the system under test:

@Test
public void shouldReturnUserDetails() {
    // Arrange
    User mockUser = new User("Ada", "Engineering");
    when(userService.getUserInfo("u123")).thenReturn(mockUser);

    // Act
    User result = userController.getUser("u123");

    // Assert
    assertEquals("Ada", result.getName());
    assertEquals("Engineering", result.getDepartment());
}

Anti-Patterns

  • Testing private methods – private implementations are meant to change. Test the public interface that calls them instead.
  • No assertions – a test that runs code without asserting anything provides false confidence. Lint rules like jest/expect-expect can catch this.
  • Disabling or skipping tests – skipped tests erode confidence over time. Fix or remove them.
  • Testing implementation details – asserting on internal state or call order rather than observable output creates brittle tests that break during refactoring.
  • Ice cream cone testing – relying primarily on slow E2E tests while neglecting fast unit tests inverts the test pyramid and slows feedback.
  • Chasing coverage numbers – gaming coverage metrics (e.g., running code paths without meaningful assertions) creates a false sense of confidence. Focus on use-case coverage instead.

Connection to CD Pipeline

Unit tests occupy the base of the test pyramid. They run in the earliest stages of the CI/CD pipeline and provide the fastest feedback loop:

  1. Local development – watch mode reruns tests on every save.
  2. Pre-commit – hooks run the suite before code reaches version control.
  3. PR verification – CI runs the full suite and blocks merge on failure.
  4. Trunk verification – CI reruns tests on the merged HEAD to catch integration issues.

Because unit tests are fast and deterministic, they should always break the build on failure. A healthy CD pipeline depends on a large, reliable unit test suite that gives developers confidence to ship small changes frequently.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7.2 - Integration Tests

Deterministic tests that verify how units interact together or with external system boundaries using test doubles for non-deterministic dependencies.

Adapted from Dojo Consortium

Definition

An integration test is a deterministic test that verifies how the unit under test interacts with other units without directly accessing external sub-systems. It may validate multiple units working together (sometimes called a “sociable unit test”) or the portion of the code that interfaces with an external network dependency while using a test double to represent that dependency.

For clarity: an “integration test” is not a test that broadly integrates multiple sub-systems. That is an end-to-end test.

When to Use

Integration tests provide the best balance of speed, confidence, and cost. Use them when:

  • You need to verify that multiple units collaborate correctly – for example, a service calling a repository that calls a data mapper.
  • You need to validate the interface layer to an external system (HTTP client, message producer, database query) while keeping the external system replaced by a test double.
  • You want to confirm that a refactoring did not break behavior. Integration tests that avoid testing implementation details survive refactors without modification.
  • You are building a front-end component that composes child components and needs to verify the assembled behavior from the user’s perspective.

If the test requires a live network call to a system outside localhost, it is either a contract test or an E2E test.

Characteristics

Property Value
Speed Milliseconds to low seconds
Determinism Always deterministic
Scope Multiple units or a unit plus its boundary
Dependencies External systems replaced with test doubles
Network Localhost only
Database Localhost / in-memory only
Breaks build Yes

Examples

A JavaScript integration test verifying that a connector returns structured data:

describe("retrieving Hygieia data", () => {
  it("should return counts of merged pull requests per day", async () => {
    const result = await hygieiaConnector.getResultsByDay(
      hygieiaConfigs.integrationFrequencyRoute,
      testTeam,
      startDate,
      endDate
    );

    expect(result.status).toEqual(200);
    expect(result.data).toBeInstanceOf(Array);
    expect(result.data[0]).toHaveProperty("value");
    expect(result.data[0]).toHaveProperty("dateStr");
  });

  it("should return an empty array if the team does not exist", async () => {
    const result = await hygieiaConnector.getResultsByDay(
      hygieiaConfigs.integrationFrequencyRoute,
      0,
      startDate,
      endDate
    );
    expect(result.data).toEqual([]);
  });
});

Subcategories

Service integration tests – Validate how the system under test responds to information from an external service. Use virtual services or static mocks; pair with contract tests to keep the doubles current.

Database integration tests – Validate query logic against a controlled data store. Prefer in-memory databases, isolated DB instances, or personalized datasets over shared live data.

Front-end integration tests – Render the component tree and interact with it the way a user would. Follow the accessibility order of operations for element selection: visible text and labels first, ARIA roles second, test IDs only as a last resort.

Anti-Patterns

  • Peeking behind the curtain – using tools that expose component internals (e.g., Enzyme’s instance() or state()) instead of testing from the user’s perspective.
  • Mocking too aggressively – replacing every collaborator turns an integration test into a unit test and removes the value of testing real interactions. Only mock what is necessary to maintain determinism.
  • Testing implementation details – asserting on internal state, private methods, or call counts rather than observable output.
  • Introducing a test user – creating an artificial actor that would never exist in production. Write tests from the perspective of a real end-user or API consumer.
  • Tolerating flaky tests – non-deterministic integration tests erode trust. Fix or remove them immediately.
  • Duplicating E2E scope – if the test integrates multiple deployed sub-systems with live network calls, it belongs in the E2E category, not here.

Connection to CD Pipeline

Integration tests form the largest portion of a healthy test suite (the “trophy” or the middle of the pyramid). They run alongside unit tests in the earliest CI stages:

  1. Local development – run in watch mode or before committing.
  2. PR verification – CI executes the full suite; failures block merge.
  3. Trunk verification – CI reruns on the merged HEAD.

Because they are deterministic and fast, integration tests should always break the build. A team whose refactors break many tests likely has too few integration tests and too many fine-grained unit tests. As Kent C. Dodds advises: “Write tests, not too many, mostly integration.”


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7.3 - Functional Tests

Deterministic tests that verify all modules of a sub-system work together from the actor’s perspective, using test doubles for external dependencies.

Adapted from Dojo Consortium

Definition

A functional test is a deterministic test that verifies all modules of a sub-system are working together. It introduces an actor – typically a user interacting with the UI or a consumer calling an API – and validates the ingress and egress of that actor within the system boundary. External sub-systems are replaced with test doubles to keep the test deterministic.

Functional tests cover broad-spectrum behavior: UI interactions, presentation logic, and business logic flowing through the full sub-system. They differ from end-to-end tests in that side effects are mocked and never cross boundaries outside the system’s control.

Functional tests are sometimes called component tests.

When to Use

  • You need to verify a complete user-facing feature from input to output within a single deployable unit (e.g., a service or a front-end application).
  • You want to test how the UI, business logic, and data layers interact without depending on live external services.
  • You need to simulate realistic user workflows – filling in forms, navigating pages, submitting API requests – while keeping the test fast and repeatable.
  • You are validating acceptance criteria for a user story and want a test that maps directly to the specified behavior.

If the test needs to reach a live external dependency, it is an E2E test. If it tests a single unit in isolation, it is a unit test.

Characteristics

Property Value
Speed Seconds (slower than unit, faster than E2E)
Determinism Always deterministic
Scope All modules within a single sub-system
Dependencies External systems replaced with test doubles
Network Localhost only
Database Localhost / in-memory only
Breaks build Yes

Examples

A functional test for a REST API using an in-process server and mocked downstream services:

describe("POST /orders", () => {
  it("should create an order and return 201", async () => {
    // Arrange: mock the inventory service response
    nock("https://inventory.internal")
      .get("/stock/item-42")
      .reply(200, { available: true, quantity: 10 });

    // Act: send a request through the full application stack
    const response = await request(app)
      .post("/orders")
      .send({ itemId: "item-42", quantity: 2 });

    // Assert: verify the user-facing response
    expect(response.status).toBe(201);
    expect(response.body.orderId).toBeDefined();
    expect(response.body.status).toBe("confirmed");
  });

  it("should return 409 when inventory is insufficient", async () => {
    nock("https://inventory.internal")
      .get("/stock/item-42")
      .reply(200, { available: true, quantity: 0 });

    const response = await request(app)
      .post("/orders")
      .send({ itemId: "item-42", quantity: 2 });

    expect(response.status).toBe(409);
    expect(response.body.error).toMatch(/insufficient/i);
  });
});

A front-end functional test exercising a login flow with a mocked auth service:

describe("Login page", () => {
  it("should redirect to the dashboard after successful login", async () => {
    mockAuthService.login.mockResolvedValue({ token: "abc123" });

    render(<App />);
    await userEvent.type(screen.getByLabelText("Email"), "ada@example.com");
    await userEvent.type(screen.getByLabelText("Password"), "s3cret");
    await userEvent.click(screen.getByRole("button", { name: "Sign in" }));

    expect(await screen.findByText("Dashboard")).toBeInTheDocument();
  });
});

Anti-Patterns

  • Using live external services – this makes the test non-deterministic and slow. Use test doubles for anything outside the sub-system boundary.
  • Testing through the database – sharing a live database between tests introduces ordering dependencies and flakiness. Use in-memory databases or mocked data layers.
  • Ignoring the actor’s perspective – functional tests should interact with the system the way a user or consumer would. Reaching into internal APIs or bypassing the UI defeats the purpose.
  • Duplicating unit test coverage – functional tests should focus on feature-level behavior and happy/critical paths, not every edge case. Leave permutation testing to unit tests.
  • Slow test setup – if spinning up the sub-system takes too long, invest in faster bootstrapping (in-memory stores, lazy initialization) rather than skipping functional tests.

Connection to CD Pipeline

Functional tests run after unit and integration tests in the pipeline, typically as part of the same CI stage:

  1. PR verification – functional tests run against the sub-system in isolation, giving confidence that the feature works before merge.
  2. Trunk verification – the same tests run on the merged HEAD to catch conflicts.
  3. Pre-deployment gate – functional tests can serve as the final deterministic gate before a build artifact is promoted to a staging environment.

Because functional tests are deterministic, they should break the build on failure. They are more expensive than unit and integration tests, so teams should focus on happy-path and critical-path scenarios while keeping the total count manageable.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7.4 - End-to-End Tests

Non-deterministic tests that validate the entire software system along with its integration with external interfaces and production-like scenarios.

Adapted from Dojo Consortium

Definition

End-to-end (E2E) tests validate the entire software system, including its integration with external interfaces. They exercise complete production-like scenarios using real (or production-like) data and environments to simulate real-time settings. No test doubles are used – the test hits live services, databases, and third-party integrations just as a real user would.

Because they depend on external systems, E2E tests are typically non-deterministic: they can fail for reasons unrelated to code correctness, such as network instability or third-party outages.

When to Use

E2E tests should be the least-used test type due to their high cost in execution time and maintenance. Use them for:

  • Happy-path validation of critical business flows (e.g., user signup, checkout, payment processing).
  • Smoke testing a deployed environment to verify that key integrations are functioning.
  • Cross-team workflows that span multiple sub-systems and cannot be tested any other way.

Do not use E2E tests to cover edge cases, error handling, or input validation – those scenarios belong in unit, integration, or functional tests.

Vertical vs. Horizontal E2E Tests

Vertical E2E tests target features under the control of a single team:

  • Favoriting an item and verifying it persists across refresh.
  • Creating a saved list and adding items to it.

Horizontal E2E tests span multiple teams:

  • Navigating from the homepage through search, item detail, cart, and checkout.

Horizontal tests are significantly more complex and fragile. Due to their large failure surface area, they are not suitable for blocking release pipelines.

Characteristics

Property Value
Speed Seconds to minutes per test
Determinism Typically non-deterministic
Scope Full system including external integrations
Dependencies Real services, databases, third-party APIs
Network Full network access
Database Live databases
Breaks build Generally no (see guidance below)

Examples

A vertical E2E test verifying user lookup through a live web interface:

@Test
public void verifyValidUserLookup() throws Exception {
    // Act -- interact with the live application
    homePage.getUserData("validUserId");
    waitForElement(By.xpath("//span[@id='name']"));

    // Assert -- verify real data returned from the live backend
    assertEquals("Ada Lovelace", homePage.getName());
    assertEquals("Engineering", homePage.getOrgName());
    assertEquals("Grace Hopper", homePage.getManagerName());
}

A browser-based E2E test using a tool like Playwright:

test("user can add an item to cart and check out", async ({ page }) => {
  await page.goto("https://staging.example.com");
  await page.getByRole("link", { name: "Running Shoes" }).click();
  await page.getByRole("button", { name: "Add to Cart" }).click();

  await page.getByRole("link", { name: "Cart" }).click();
  await expect(page.getByText("Running Shoes")).toBeVisible();

  await page.getByRole("button", { name: "Checkout" }).click();
  await expect(page.getByText("Order confirmed")).toBeVisible();
});

Anti-Patterns

  • Using E2E tests as the primary safety net – this is the “ice cream cone” anti-pattern. E2E tests are slow and fragile; the majority of your confidence should come from unit and integration tests.
  • Blocking the pipeline with horizontal E2E tests – these tests span too many teams and failure surfaces. Run them asynchronously and review failures out of band.
  • Ignoring flaky failures – E2E tests often fail for environmental reasons. Track the frequency and root cause of failures. If a test is not providing signal, fix it or remove it.
  • Testing edge cases in E2E – exhaustive input validation and error-path testing should happen in cheaper, faster test types.
  • Not capturing failure context – E2E failures are expensive to debug. Capture screenshots, network logs, and video recordings automatically on failure.

Connection to CD Pipeline

E2E tests run in the later stages of the delivery pipeline, after the build artifact has passed all deterministic tests and has been deployed to a staging or pre-production environment:

  1. Post-deployment smoke tests – a small, fast suite of vertical E2E tests verifies that the deployment succeeded and critical paths work.
  2. Scheduled regression suites – broader E2E suites (including horizontal tests) run on a schedule rather than on every commit.
  3. Production monitoring – customer experience alarms (synthetic monitoring) are a form of continuous E2E testing that runs in production.

Because E2E tests are non-deterministic, they should not break the build in most cases. A team may choose to gate on a small set of highly reliable vertical E2E tests, but must invest in reducing false positives to make this valuable. CD pipelines should be optimized for rapid recovery of production issues rather than attempting to prevent all defects with slow, fragile E2E gates.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7.5 - Contract Tests

Non-deterministic tests that validate test doubles by verifying API contract format against live external systems.

Adapted from Dojo Consortium

Definition

A contract test validates that the test doubles used in integration tests still accurately represent the real external system. Contract tests run against the live external sub-system and exercise the portion of the code that interfaces with it. Because they depend on live services, contract tests are non-deterministic and should not break the build. Instead, failures should trigger a review to determine whether the contract has changed and the test doubles need updating.

A contract test validates contract format, not specific data. It verifies that response structures, field names, types, and status codes match expectations – not that particular values are returned.

Contract tests have two perspectives:

  • Provider – the team that owns the API verifies that all changes are backwards compatible (unless a new API version is introduced). Every build should validate the provider contract.
  • Consumer – the team that depends on the API verifies that they can still consume the properties they need, following Postel’s Law: “Be conservative in what you do, be liberal in what you accept from others.”

When to Use

  • You have integration tests that use test doubles (mocks, stubs, recorded responses) to represent external services, and you need assurance those doubles remain accurate.
  • You consume a third-party or cross-team API that may change without notice.
  • You provide an API to other teams and want to ensure that your changes do not break their expectations (consumer-driven contracts).
  • You are adopting contract-driven development, where contracts are defined during design so that provider and consumer teams can work in parallel using shared mocks and fakes.

Characteristics

Property Value
Speed Seconds (depends on network latency)
Determinism Non-deterministic (hits live services)
Scope Interface boundary between two systems
Dependencies Live external sub-system
Network Yes – calls the real dependency
Database Depends on the provider
Breaks build No – failures trigger review, not build failure

Examples

A provider contract test verifying that an API response matches the expected schema:

describe("GET /users/:id contract", () => {
  it("should return a response matching the user schema", async () => {
    const response = await fetch("https://api.partner.com/users/1");
    const body = await response.json();

    // Validate structure, not specific data
    expect(response.status).toBe(200);
    expect(body).toHaveProperty("id");
    expect(typeof body.id).toBe("number");
    expect(body).toHaveProperty("name");
    expect(typeof body.name).toBe("string");
    expect(body).toHaveProperty("email");
    expect(typeof body.email).toBe("string");
  });
});

A consumer-driven contract test using Pact:

describe("Order Service - Inventory Provider Contract", () => {
  it("should receive stock availability in the expected format", async () => {
    // Define the expected interaction
    await provider.addInteraction({
      state: "item-42 is in stock",
      uponReceiving: "a request for item-42 stock",
      withRequest: { method: "GET", path: "/stock/item-42" },
      willRespondWith: {
        status: 200,
        body: {
          available: Matchers.boolean(true),
          quantity: Matchers.integer(10),
        },
      },
    });

    // Exercise the consumer code against the mock provider
    const result = await inventoryClient.checkStock("item-42");
    expect(result.available).toBe(true);
  });
});

Anti-Patterns

  • Using contract tests to validate business logic – contract tests verify structure and format, not behavior. Business logic belongs in functional tests.
  • Breaking the build on contract test failure – because these tests hit live systems, failures may be caused by network issues or temporary outages, not actual contract changes. Treat failures as signals to investigate.
  • Neglecting to update test doubles – when a contract test fails because the upstream API changed, the test doubles in your integration tests must be updated to match. Ignoring failures defeats the purpose.
  • Running contract tests too infrequently – the frequency should be proportional to the volatility of the interface. Highly active APIs need more frequent contract validation.
  • Testing specific data values – asserting that name equals "Alice" makes the test brittle. Assert on types, required fields, and response codes instead.

Connection to CD Pipeline

Contract tests run asynchronously from the main CI build, typically on a schedule:

  1. Provider side – provider contract tests (schema validation, response code checks) are often implemented as deterministic unit tests and run on every commit as part of the provider’s CI pipeline.
  2. Consumer side – consumer contract tests run on a schedule (e.g., hourly or daily) against the live provider. Failures are reviewed and may trigger updates to test doubles or conversations between teams.
  3. Consumer-driven contracts – when using tools like Pact, the consumer publishes contract expectations and the provider runs them continuously. Both teams communicate when contracts break.

Contract tests are the bridge that keeps your fast, deterministic integration test suite honest. Without them, test doubles can silently drift from reality, and your integration tests provide false confidence.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7.6 - Static Analysis

Code analysis tools that evaluate non-running code for security vulnerabilities, complexity, and best practice violations.

Adapted from Dojo Consortium

Definition

Static analysis (also called static testing) evaluates non-running code against rules for known good practices. Unlike other test types that execute code and observe behavior, static analysis inspects source code, configuration files, and dependency manifests to detect problems before the code ever runs.

Static analysis serves several key purposes:

  • Catches errors that would otherwise surface at runtime.
  • Warns of excessive complexity that degrades the ability to change code safely.
  • Identifies security vulnerabilities and coding patterns that provide attack vectors.
  • Enforces coding standards by removing subjective style debates from code reviews.
  • Alerts to dependency issues – outdated packages, known CVEs, license incompatibilities, or supply-chain compromises.

When to Use

Static analysis should run continuously, at every stage where feedback is possible:

  • In the IDE – real-time feedback as developers type, via editor plugins and language server integrations.
  • On save – format-on-save and lint-on-save catch issues immediately.
  • Pre-commit – hooks prevent problematic code from entering version control.
  • In CI – the full suite of static checks runs on every PR and on the trunk after merge, verifying that earlier local checks were not bypassed.

Static analysis is always applicable. Every project, regardless of language or platform, benefits from linting, formatting, and dependency scanning.

Characteristics

Property Value
Speed Seconds (typically the fastest test category)
Determinism Always deterministic
Scope Entire codebase (source, config, dependencies)
Dependencies None (analyzes code at rest)
Network None (except dependency scanners)
Database None
Breaks build Yes

Examples

Linting

A .eslintrc.json configuration enforcing test quality rules:

{
  "rules": {
    "jest/no-disabled-tests": "warn",
    "jest/expect-expect": "error",
    "jest/no-commented-out-tests": "error",
    "jest/valid-expect": "error",
    "no-unused-vars": "error",
    "no-console": "warn"
  }
}

Type Checking

TypeScript catches type mismatches at compile time, eliminating entire classes of runtime errors:

function calculateTotal(price: number, quantity: number): number {
  return price * quantity;
}

// Static analysis error: Argument of type 'string' is not assignable
// to parameter of type 'number'.
calculateTotal("19.99", 3);

Dependency Scanning

Tools like npm audit, Snyk, or Dependabot scan for known vulnerabilities:

$ npm audit
found 2 vulnerabilities (1 moderate, 1 high)
  moderate: Prototype Pollution in lodash < 4.17.21
  high:     Remote Code Execution in log4j < 2.17.1

Types of Static Analysis

Type Purpose
Linting Catches common errors and enforces best practices
Formatting Enforces consistent code style, removing subjective debates
Complexity analysis Flags overly deep or long code blocks that breed defects
Type checking Prevents type-related bugs, replacing some unit tests
Security scanning Detects known vulnerabilities and dangerous coding patterns
Dependency scanning Checks for outdated, hijacked, or insecurely licensed deps

Anti-Patterns

  • Disabling rules instead of fixing code – suppressing linter warnings or ignoring security findings erodes the value of static analysis over time.
  • Not customizing rules – default rulesets are a starting point. Write custom rules for patterns that come up repeatedly in code reviews.
  • Running static analysis only in CI – by the time CI reports a formatting error, the developer has context-switched. IDE plugins and pre-commit hooks provide immediate feedback.
  • Ignoring dependency vulnerabilities – known CVEs in dependencies are a direct attack vector. Treat high-severity findings as build-breaking.
  • Treating static analysis as optional – static checks should be mandatory and enforced. If developers can bypass them, they will.

Connection to CD Pipeline

Static analysis is the first gate in the CD pipeline, providing the fastest feedback:

  1. IDE / local development – plugins run in real time as code is written.
  2. Pre-commit – hooks run linters and formatters, blocking commits that violate rules.
  3. PR verification – CI runs the full static analysis suite (linting, type checking, security scanning, dependency auditing) and blocks merge on failure.
  4. Trunk verification – the same checks re-run on the merged HEAD to catch anything missed.
  5. Scheduled scans – dependency and security scanners run on a schedule to catch newly disclosed vulnerabilities in existing dependencies.

Because static analysis requires no running code, no test environment, and no external dependencies, it is the cheapest and fastest form of quality verification. A mature CD pipeline treats static analysis failures the same as test failures: they break the build.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7.7 - Test Doubles

Patterns for isolating dependencies in tests: stubs, mocks, fakes, spies, and dummies.

Adapted from Dojo Consortium

Definition

Test doubles are stand-in objects that replace real production dependencies during testing. The term comes from the film industry’s “stunt double” – just as a stunt double replaces an actor for dangerous scenes, a test double replaces a costly or non-deterministic dependency to make tests fast, isolated, and reliable.

Test doubles allow you to:

  • Remove non-determinism by replacing network calls, databases, and file systems with predictable substitutes.
  • Control test conditions by forcing specific states, error conditions, or edge cases that would be difficult to reproduce with real dependencies.
  • Increase speed by eliminating slow I/O operations.
  • Isolate the system under test so that failures point directly to the code being tested, not to an external dependency.

Types of Test Doubles

Type Description Example Use Case
Dummy Passed around but never actually used. Fills parameter lists. A required logger parameter in a constructor.
Stub Provides canned answers to calls made during the test. Does not respond to anything outside what is programmed. Returning a fixed user object from a repository.
Spy A stub that also records information about how it was called (arguments, call count, order). Verifying that an analytics event was sent once.
Mock Pre-programmed with expectations about which calls will be made. Verification happens on the mock itself. Asserting that sendEmail() was called with specific arguments.
Fake Has a working implementation, but takes shortcuts not suitable for production. An in-memory database replacing PostgreSQL.

Choosing the Right Double

  • Use stubs when you need to supply data but do not care how it was requested.
  • Use spies when you need to verify call arguments or call count.
  • Use mocks when the interaction itself is the primary thing being verified.
  • Use fakes when you need realistic behavior but cannot use the real system.
  • Use dummies when a parameter is required by the interface but irrelevant to the test.

When to Use

Test doubles are used in every layer of deterministic testing:

  • Unit tests – nearly all dependencies are replaced with test doubles to achieve full isolation.
  • Integration tests – external sub-systems (APIs, databases, message queues) are replaced, but internal collaborators remain real.
  • Functional tests – dependencies that cross the sub-system boundary are replaced to maintain determinism.

Test doubles should be used less in later pipeline stages. End-to-end tests use no test doubles by design.

Examples

A JavaScript stub providing a canned response:

// Stub: return a fixed user regardless of input
const userRepository = {
  findById: jest.fn().mockResolvedValue({
    id: "u1",
    name: "Ada Lovelace",
    email: "ada@example.com",
  }),
};

const user = await userService.getUser("u1");
expect(user.name).toBe("Ada Lovelace");

A Java spy verifying interaction:

@Test
public void shouldCallUserServiceExactlyOnce() {
    UserService spyService = Mockito.spy(userService);
    doReturn(testUser).when(spyService).getUserInfo("u123");

    User result = spyService.getUserInfo("u123");

    verify(spyService, times(1)).getUserInfo("u123");
    assertEquals("Ada", result.getName());
}

A fake in-memory repository:

class FakeUserRepository {
  constructor() {
    this.users = new Map();
  }
  save(user) {
    this.users.set(user.id, user);
  }
  findById(id) {
    return this.users.get(id) || null;
  }
}

Anti-Patterns

  • Mocking what you do not own – wrapping a third-party API in a thin adapter and mocking the adapter is safer than mocking the third-party API directly. Direct mocks couple your tests to the library’s implementation.
  • Over-mocking – replacing every collaborator with a mock turns the test into a mirror of the implementation. Tests become brittle and break on every refactor. Only mock what is necessary to maintain determinism.
  • Not validating test doubles – if the real dependency changes its contract, your test doubles silently drift. Use contract tests to keep doubles honest.
  • Complex mock setup – if setting up mocks requires dozens of lines, the system under test may have too many dependencies. Consider refactoring the production code rather than adding more mocks.
  • Using mocks to test implementation details – asserting on the exact sequence and count of internal method calls creates change-detector tests. Prefer asserting on observable output.

Connection to CD Pipeline

Test doubles are a foundational technique that enables the fast, deterministic tests required for continuous delivery:

  • Early pipeline stages (static analysis, unit tests, integration tests) rely heavily on test doubles to stay fast and deterministic. This is where the majority of defects are caught.
  • Later pipeline stages (E2E tests, production monitoring) use fewer or no test doubles, trading speed for realism.
  • Contract tests run asynchronously to validate that test doubles still match reality, closing the gap between the deterministic and non-deterministic stages of the pipeline.

The guiding principle from Justin Searls applies: “Don’t poke too many holes in reality.” Use test doubles when you must, but prefer real implementations when they are fast and deterministic.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.