This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Phase 2: Pipeline

Build the automated path from commit to production: a single, deterministic pipeline that deploys immutable artifacts.

Key question: “Can we deploy any commit automatically?”

This phase creates the delivery pipeline - the automated path that takes every commit through build, test, and deployment stages. When done right, the pipeline is the only way changes reach production.

What You’ll Do

  1. Establish a single path to production - One pipeline for all changes
  2. Make the pipeline deterministic - Same inputs always produce same outputs
  3. Define “deployable” - Clear criteria for what’s ready to ship
  4. Use immutable artifacts - Build once, deploy everywhere
  5. Externalize application config - Separate config from code
  6. Use production-like environments - Test in environments that match production
  7. Design your pipeline architecture - Efficient quality gates for your context
  8. Enable rollback - Fast recovery from any deployment

Why This Phase Matters

The pipeline is the backbone of continuous delivery. It replaces manual handoffs with automated quality gates, ensures every change goes through the same validation process, and makes deployment a routine, low-risk event.

When You’re Ready to Move On

You’re ready for Phase 3: Optimize when:

  • Every change reaches production through the same automated pipeline
  • The pipeline produces the same result for the same inputs
  • You can deploy any green build to production with confidence
  • Rollback takes minutes, not hours

1 - Single Path to Production

All changes reach production through the same automated pipeline - no exceptions.

Phase 2 - Pipeline

Definition

A single path to production means that every change - whether it is a feature, a bug fix, a configuration update, or an infrastructure change - follows the same automated pipeline to reach production. There is exactly one route from a developer’s commit to a running production system. No side doors. No emergency shortcuts. No “just this once” manual deployments.

This is the most fundamental constraint of a continuous delivery pipeline. If you allow multiple paths, you cannot reason about the state of production. You lose the ability to guarantee that every change has been validated, and you undermine every other practice in this phase.

Why It Matters for CD Migration

Teams migrating to continuous delivery often carry legacy deployment processes - a manual runbook for “emergency” fixes, a separate path for database changes, or a distinct workflow for infrastructure updates. Each additional path is a source of unvalidated risk.

Establishing a single path to production is the first pipeline practice because every subsequent practice depends on it. A deterministic pipeline only works if all changes flow through it. Immutable artifacts are only trustworthy if no other mechanism can alter what reaches production. Your deployable definition is meaningless if changes can bypass the gates.

Key Principles

One pipeline for all changes

Every type of change uses the same pipeline:

  • Application code - features, fixes, refactors
  • Infrastructure as Code - Terraform, CloudFormation, Pulumi, Ansible
  • Pipeline definitions - the pipeline itself is versioned and deployed through the pipeline
  • Configuration changes - environment variables, feature flags, routing rules
  • Database migrations - schema changes, data migrations

Same pipeline for all environments

The pipeline that deploys to development is the same pipeline that deploys to staging and production. The only difference between environments is the configuration injected at deployment time. If your staging deployment uses a different mechanism than your production deployment, you are not testing the deployment process itself.

No manual deployments

If a human can bypass the pipeline and push a change directly to production, the single path is broken. This includes:

  • SSH access to production servers for ad-hoc changes
  • Direct container image pushes outside the pipeline
  • Console-based configuration changes that are not captured in version control
  • “Break glass” procedures that skip validation stages

Anti-Patterns

Integration branches and multi-branch deployment paths

Using separate branches (such as develop, release, hotfix) that each have their own deployment workflow creates multiple paths. GitFlow is a common source of this anti-pattern. When a hotfix branch deploys through a different pipeline than the develop branch, you cannot be confident that the hotfix has undergone the same validation.

Integration Branch:

trunk -> integration <- features

This creates two merge structures instead of one. When trunk changes, you merge to the integration branch immediately. When features change, you merge to integration at least daily. The integration branch lives a parallel life to trunk, acting as a temporary container for partially finished features. This attempts to mimic feature flags to keep inactive features out of production but adds complexity and accumulates abandoned features that stay unfinished forever.

GitFlow (multiple long-lived branches):

master (production)
  |
develop (integration)
  |
feature branches -> develop
  |
release branches -> master
  |
hotfix branches -> master -> develop

GitFlow creates multiple merge patterns depending on change type:

  • Features: feature -> develop -> release -> master
  • Hotfixes: hotfix -> master AND hotfix -> develop
  • Releases: develop -> release -> master

Different types of changes follow different paths to production. Multiple long-lived branches (master, develop, release) create merge complexity. Hotfixes have a different path than features, release branches delay integration and create batch deployments, and merge conflicts multiply across integration points.

The correct approach is direct trunk integration - all work integrates directly to trunk using the same process:

trunk <- features
trunk <- bugfixes
trunk <- hotfixes

Environment-specific pipelines

Building a separate pipeline for staging versus production - or worse, manually deploying to staging and only using automation for production - means you are not testing your deployment process in lower environments.

“Emergency” manual deployments

The most dangerous anti-pattern is the manual deployment reserved for emergencies. Under pressure, teams bypass the pipeline “just this once,” introducing an unvalidated change into production. The fix for this is not to allow exceptions - it is to make the pipeline fast enough that it is always the fastest path to production.

Separate pipelines for different change types

Having one pipeline for application code, another for infrastructure, and yet another for database changes means that coordinated changes across these layers are never validated together.

Good Patterns

Feature flags

Use feature flags to decouple deployment from release. Code can be merged and deployed through the pipeline while the feature remains hidden behind a flag. This eliminates the need for long-lived branches and separate deployment paths for “not-ready” features.

// Feature code lives in trunk, controlled by flags
if (featureFlags.newCheckout) {
  return renderNewCheckout()
}
return renderOldCheckout()

Branch by abstraction

For large-scale refactors or technology migrations, use branch by abstraction to make incremental changes that can be deployed through the standard pipeline at every step. Create an abstraction layer, build the new implementation behind it, switch over incrementally, and remove the old implementation - all through the same pipeline.

// Old behavior behind abstraction
class PaymentProcessor {
  process() {
    // Gradually replace implementation while maintaining interface
  }
}

Dark launching

Deploy new functionality to production without exposing it to users. The code runs in production, processes real data, and generates real metrics - but its output is not shown to users. This validates the change under production conditions while managing risk.

// New API route exists but isn't exposed to users
router.post('/api/v2/checkout', newCheckoutHandler)

// Final commit: update client to use new route

Connect tests last

When building a new integration, start by deploying the code without connecting it to the live dependency. Validate the deployment, the configuration, and the basic behavior first. Connect to the real dependency as the final step. This keeps the change deployable through the pipeline at every stage of development.

// Build new feature code, integrate to trunk
// Connect to UI/API only in final commit
function newCheckoutFlow() {
  // Complete implementation ready
}

// Final commit: wire it up
<button onClick={newCheckoutFlow}>Checkout</button>

How to Get Started

Step 1: Map your current deployment paths

Document every way that changes currently reach production. Include manual processes, scripts, CI/CD pipelines, direct deployments, and any emergency procedures. You will likely find more paths than you expected.

Step 2: Identify the primary path

Choose or build one pipeline that will become the single path. This pipeline should be the most automated and well-tested path you have. All other paths will converge into it.

Step 3: Eliminate the easiest alternate paths first

Start by removing the deployment paths that are used least frequently or are easiest to replace. For each path you eliminate, migrate its changes into the primary pipeline.

Step 4: Make the pipeline fast enough for emergencies

The most common reason teams maintain manual deployment shortcuts is that the pipeline is too slow for urgent fixes. If your pipeline takes 45 minutes and an incident requires a fix in 10, the team will bypass the pipeline. Invest in pipeline speed so that the automated path is always the fastest option.

Step 5: Remove break-glass access

Once the pipeline is fast and reliable, remove the ability to deploy outside of it. Revoke direct production access. Disable manual deployment scripts. Make the pipeline the only way.

Example Implementation

Single Pipeline for Everything

# .github/workflows/deploy.yml
name: Deployment Pipeline

on:
  push:
    branches: [main]
  workflow_dispatch: # Manual trigger for rollbacks

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm ci
      - run: npm test
      - run: npm run lint
      - run: npm run security-scan

  build:
    needs: validate
    runs-on: ubuntu-latest
    steps:
      - run: npm run build
      - run: docker build -t app:${{ github.sha }} .
      - run: docker push app:${{ github.sha }}

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - run: kubectl set image deployment/app app=app:${{ github.sha }}
      - run: kubectl rollout status deployment/app

  smoke-test:
    needs: deploy-staging
    runs-on: ubuntu-latest
    steps:
      - run: npm run smoke-test:staging

  deploy-production:
    needs: smoke-test
    runs-on: ubuntu-latest
    steps:
      - run: kubectl set image deployment/app app=app:${{ github.sha }}
      - run: kubectl rollout status deployment/app

Every deployment - normal, hotfix, or rollback - uses this pipeline. Consistent, validated, traceable.

FAQ

What if the pipeline is broken and we need to deploy a critical fix?

Fix the pipeline first. If your pipeline is so fragile that it cannot deploy critical fixes, that is a pipeline problem, not a process problem. Invest in pipeline reliability.

What about emergency hotfixes that cannot wait for the full pipeline?

The pipeline should be fast enough to handle emergencies. If it is not, optimize the pipeline. A “fast-track” mode that skips some tests is acceptable, but it must still be the same pipeline, not a separate manual process.

Can we manually patch production “just this once”?

No. “Just this once” becomes “just this once again.” Manual production changes always create problems. Commit the fix, push through the pipeline, deploy.

What if deploying through the pipeline takes too long?

Optimize your pipeline:

  1. Parallelize tests
  2. Use faster test environments
  3. Implement progressive deployment (canary, blue-green)
  4. Cache dependencies
  5. Optimize build times

A well-optimized pipeline should deploy to production in under 30 minutes.

Can operators make manual changes for maintenance?

Infrastructure maintenance (patching servers, scaling resources) is separate from application deployment. However, application deployment must still only happen through the pipeline.

Health Metrics

  • Pipeline deployment rate: Should be 100% (all deployments go through pipeline)
  • Manual override rate: Should be 0%
  • Hotfix pipeline time: Should be less than 30 minutes
  • Rollback success rate: Should be greater than 99%
  • Deployment frequency: Should increase over time as confidence grows

Connection to the Pipeline Phase

Single path to production is the foundation of Phase 2. Without it, every other pipeline practice is compromised:

  • Deterministic pipeline requires all changes to flow through it to provide guarantees
  • Deployable definition must be enforced by a single set of gates
  • Immutable artifacts are only trustworthy when produced by a known, consistent process
  • Rollback relies on the pipeline to deploy the previous version through the same path

Establishing this practice first creates the constraint that makes the rest of the pipeline meaningful.

2 - Deterministic Pipeline

The same inputs to the pipeline always produce the same outputs.

Phase 2 - Pipeline

Definition

A deterministic pipeline produces consistent, repeatable results. Given the same commit, the same environment definition, and the same configuration, the pipeline will build the same artifact, run the same tests, and produce the same outcome - every time. There is no variance introduced by uncontrolled dependencies, environmental drift, manual intervention, or non-deterministic test behavior.

Determinism is what transforms a pipeline from “a script that usually works” into a reliable delivery system. When the pipeline is deterministic, a green build means something. A failed build points to a real problem. Teams can trust the signal.

Why It Matters for CD Migration

Non-deterministic pipelines are the single largest source of wasted time in delivery organizations. When builds fail randomly, teams learn to ignore failures. When the same commit passes on retry, teams stop investigating root causes. When different environments produce different results, teams lose confidence in pre-production validation.

During a CD migration, teams are building trust in automation. Every flaky test, every “works on my machine” failure, and every environment-specific inconsistency erodes that trust. A deterministic pipeline is what earns the team’s confidence that automation can replace manual verification.

Key Principles

Version control everything

Every input to the pipeline must be version controlled:

  • Application source code - the obvious one
  • Infrastructure as Code - the environment definitions themselves
  • Pipeline definitions - the CI/CD configuration files
  • Test data and fixtures - the data used by automated tests
  • Dependency lockfiles - exact versions of every dependency (e.g., package-lock.json, Pipfile.lock, go.sum)
  • Tool versions - the versions of compilers, runtimes, linters, and build tools

If an input to the pipeline is not version controlled, it can change without notice, and the pipeline is no longer deterministic.

Lock dependency versions

Floating dependency versions (version ranges, “latest” tags) are a common source of non-determinism. A build that worked yesterday can break today because a transitive dependency released a new version overnight.

Use lockfiles to pin exact versions of every dependency. Commit lockfiles to version control. Update dependencies intentionally through pull requests, not implicitly through builds.

Eliminate environmental variance

The pipeline should run in a controlled, reproducible environment. Containerize build steps so that the build environment is defined in code and does not drift over time. Use the same base images in CI as in production. Pin tool versions explicitly rather than relying on whatever is installed on the build agent.

Remove human intervention

Any manual step in the pipeline is a source of variance. A human choosing which tests to run, deciding whether to skip a stage, or manually approving a step introduces non-determinism. The pipeline should run from commit to deployment without human decisions.

This does not mean humans have no role - it means the pipeline’s behavior is fully determined by its inputs, not by who is watching it run.

Fix flaky tests immediately

A flaky test is a test that sometimes passes and sometimes fails for the same code. Flaky tests are the most insidious form of non-determinism because they train teams to distrust the test suite.

When a flaky test is detected, the response must be immediate:

  1. Quarantine the test - remove it from the pipeline so it does not block other changes
  2. Fix it or delete it - flaky tests provide negative value; they are worse than no test
  3. Investigate the root cause - flakiness often indicates a real problem (race conditions, shared state, time dependencies, external service reliance)

Never allow a culture of “just re-run it” to take hold. Every re-run masks a real problem.

Example: Non-Deterministic vs Deterministic Pipeline

Seeing anti-patterns and good patterns side by side makes the difference concrete.

Anti-Pattern: Non-Deterministic Pipeline

# Bad: Uses floating versions
dependencies:
  nodejs: "latest"
  postgres: "14"  # No minor/patch version

# Bad: Relies on external state
test:
  - curl https://api.example.com/test-data
  - run_tests --use-production-data

# Bad: Time-dependent tests
test('shows current date', () => {
  expect(getDate()).toBe(new Date())  # Fails at midnight!
})

# Bad: Manual steps
deploy:
  - echo "Manually verify staging before approving"
  - wait_for_approval

Results vary based on when the pipeline runs, what is in production, which dependency versions are “latest,” and human availability.

Good Pattern: Deterministic Pipeline

# Good: Pinned versions
dependencies:
  nodejs: "18.17.1"
  postgres: "14.9"

# Good: Version-controlled test data
test:
  - docker-compose up -d
  - ./scripts/seed-test-data.sh  # From version control
  - npm run test

# Good: Deterministic time handling
test('shows date', () => {
  const mockDate = new Date('2024-01-15')
  jest.useFakeTimers().setSystemTime(mockDate)
  expect(getDate()).toBe(mockDate)
})

# Good: Automated verification
deploy:
  - deploy_to_staging
  - run_smoke_tests
  - if: smoke_tests_pass
    deploy_to_production

Same inputs always produce same outputs. Pipeline results are trustworthy and reproducible.

Anti-Patterns

Unpinned dependencies

Using version ranges like ^1.2.0 or >=2.0 in dependency declarations without a lockfile means the build resolves different versions on different days. This applies to application dependencies, build plugins, CI tool versions, and base container images.

Shared, mutable build environments

Build agents that accumulate state between builds (cached files, installed packages, leftover containers) produce different results depending on what ran previously. Each build should start from a clean, known state.

Tests that depend on external services

Tests that call live external APIs, depend on shared databases, or rely on network resources introduce uncontrolled variance. External services change, experience outages, and respond with different latency - all of which make the pipeline non-deterministic.

Time-dependent tests

Tests that depend on the current time, current date, or elapsed time are inherently non-deterministic. A test that passes at 2:00 PM and fails at midnight is not testing your application - it is testing the clock.

Manual retry culture

Teams that routinely re-run failed pipelines without investigating the failure have accepted non-determinism as normal. This is a cultural anti-pattern that must be addressed alongside the technical ones.

Good Patterns

Containerized build environments

Define your build environment as a container image. Pin the base image version. Install exact versions of all tools. Run every build in a fresh instance of this container. This eliminates variance from the build environment.

Hermetic builds

A hermetic build is one that does not access the network during the build process. All dependencies are pre-fetched and cached. The build can run identically on any machine, at any time, with or without network access.

Contract tests for external dependencies

Replace live calls to external services with contract tests. These tests verify that your code interacts correctly with an external service’s API contract without actually calling the service. Combine with service virtualization or test doubles for integration tests.

Deterministic test ordering

Run tests in a fixed, deterministic order - or better, ensure every test is independent and can run in any order. Many test frameworks default to random ordering to detect inter-test dependencies; use this during development but ensure no ordering dependencies exist.

Immutable CI infrastructure

Treat CI build agents as cattle, not pets. Provision them from images. Replace them rather than updating them. Never allow state to accumulate on a build agent between pipeline runs.

Tactical Patterns

Immutable Build Containers

Define your build environment as a versioned container image with every dependency pinned:

# Dockerfile.build - version controlled
FROM node:18.17.1-alpine3.18

RUN apk add --no-cache \
    python3=3.11.5-r0 \
    make=4.4.1-r1

WORKDIR /app
COPY package-lock.json .
RUN npm ci --frozen-lockfile

Every build runs inside a fresh instance of this image. No drift, no accumulated state.

Dependency Lockfiles

Always use dependency lockfiles. This is essential for deterministic builds:

// package-lock.json (ALWAYS commit to version control)
{
  "dependencies": {
    "express": {
      "version": "4.18.2",
      "resolved": "https://registry.npmjs.org/express/-/express-4.18.2.tgz",
      "integrity": "sha512-5/PsL6iGPdfQ/..."
    }
  }
}

Rules for lockfiles:

  • Use npm ci in CI (not npm install) - npm ci installs exactly what the lockfile specifies
  • Never add lockfiles to .gitignore - they must be committed
  • Avoid version ranges in production dependencies - no ^, ~, or >= without a lockfile enforcing exact resolution
  • Never rely on “latest” tags for any dependency, base image, or tool

Quarantine Pattern for Flaky Tests

When a flaky test is detected, move it to quarantine immediately. Do not leave it in the main suite where it erodes trust in the pipeline:

// tests/quarantine/flaky-test.spec.js
describe.skip('Quarantined: Flaky integration test', () => {
  // Quarantined due to intermittent timeout
  // Tracking issue: #1234
  // Fix deadline: 2024-02-01
  it('should respond within timeout', () => {
    // Test code
  })
})

Quarantine is not a permanent home. Every quarantined test must have:

  1. A tracking issue linked in the test file
  2. A deadline for resolution (no more than one sprint)
  3. A clear root cause investigation plan

If a quarantined test cannot be fixed by the deadline, delete it and write a better test.

Hermetic Test Environments

Give each pipeline run a fresh, isolated environment with no shared state:

# GitHub Actions example
jobs:
  test:
    runs-on: ubuntu-22.04
    services:
      postgres:
        image: postgres:14.9
        env:
          POSTGRES_DB: testdb
          POSTGRES_PASSWORD: testpass
    steps:
      - uses: actions/checkout@v3
      - run: npm ci
      - run: npm test
      # Each workflow run gets a fresh database

How to Get Started

Step 1: Audit your pipeline inputs

List every input to your pipeline that is not version controlled. This includes dependency versions, tool versions, environment configurations, test data, and pipeline definitions themselves.

Step 2: Add lockfiles and pin versions

For every dependency manager in your project, ensure a lockfile is committed to version control. Pin CI tool versions explicitly. Pin base image versions in Dockerfiles.

Step 3: Containerize the build

Move your build steps into containers with explicitly defined environments. This is often the highest-leverage change for improving determinism.

Step 4: Identify and fix flaky tests

Review your test history for tests that have both passed and failed for the same commit. Quarantine them immediately and fix or remove them within a defined time window (such as one sprint).

Step 5: Monitor pipeline determinism

Track the rate of pipeline failures that are resolved by re-running without code changes. This metric (sometimes called the “re-run rate”) directly measures non-determinism. Drive it to zero.

FAQ

What if a test is occasionally flaky but hard to reproduce?

This is still a problem. Flaky tests indicate either a real bug in your code (race conditions, shared state) or a problem with your test (dependency on external state, timing sensitivity). Both need to be fixed. Quarantine the test, investigate thoroughly, and fix the root cause.

Can we use retries to handle flaky tests?

Retries mask problems rather than fixing them. A test that passes on retry is hiding a failure, not succeeding. Fix the flakiness instead of retrying.

How do we handle tests that involve randomness?

Seed your random number generators with a fixed seed in tests:

// Deterministic randomness
const rng = new Random(12345) // Fixed seed
const result = shuffle(array, rng)
expect(result).toEqual([3, 1, 4, 2]) // Predictable

What if our deployment requires manual verification?

Manual verification can happen after deployment, not before. Deploy automatically based on pipeline results, then verify in production using automated smoke tests or observability tooling. If verification fails, roll back automatically.

Should the pipeline ever be non-deterministic?

There are rare cases where controlled non-determinism is useful (chaos engineering, fuzz testing), but these should be:

  1. Explicitly designed and documented
  2. Separate from the core deployment pipeline
  3. Reproducible via saved seeds or recorded inputs

Health Metrics

Track these metrics to measure your pipeline’s determinism:

  • Test flakiness rate - percentage of test runs that produce different results for the same commit. Target less than 1%, ideally zero.
  • Pipeline re-run rate - percentage of pipeline failures resolved by re-running without code changes. This directly measures non-determinism. Target zero.
  • Time to fix flaky tests - elapsed time from detection to resolution. Target less than one day.
  • Manual override rate - how often someone manually approves, skips, or re-runs a stage. Target near zero.

Connection to the Pipeline Phase

Determinism is what gives the single path to production its authority. If the pipeline produces inconsistent results, teams will work around it. A deterministic pipeline is also the prerequisite for a meaningful deployable definition - your quality gates are only as reliable as the pipeline that enforces them.

When the pipeline is deterministic, immutable artifacts become trustworthy: you know that the artifact was built by a consistent, repeatable process, and its validation results are real.

3 - Deployable Definition

Clear, automated criteria that determine when a change is ready for production.

Phase 2 - Pipeline

Definition

A deployable definition is the set of automated quality criteria that every artifact must satisfy before it is considered ready for production. It is the pipeline’s answer to the question: “How do we know this is safe to deploy?”

This is not a checklist that a human reviews. It is a set of automated gates - executable validations built into the pipeline - that every change must pass. If the pipeline is green, the artifact is deployable. If the pipeline is red, it is not. There is no ambiguity, no judgment call, and no “looks good enough.”

Why It Matters for CD Migration

Without a clear, automated deployable definition, teams rely on human judgment to decide when something is ready to ship. This creates bottlenecks (waiting for approval), variance (different people apply different standards), and fear (nobody is confident the change is safe). All three are enemies of continuous delivery.

During a CD migration, the deployable definition replaces manual approval processes with automated confidence. It is what allows a team to say “any green build can go to production” - which is the prerequisite for continuous deployment.

Key Principles

The definition must be automated

Every criterion in the deployable definition is enforced by an automated check in the pipeline. If a requirement cannot be automated, either find a way to automate it or question whether it belongs in the deployment path.

The definition must be comprehensive

The deployable definition should cover all dimensions of quality that matter for production readiness:

Security

  • Static Application Security Testing (SAST) - scan source code for known vulnerability patterns
  • Dependency vulnerability scanning - check all dependencies against known vulnerability databases (CVE lists)
  • Secret detection - verify that no credentials, API keys, or tokens are present in the codebase
  • Container image scanning - if deploying containers, scan images for known vulnerabilities
  • License compliance - verify that dependency licenses are compatible with your distribution requirements

Functionality

  • Unit tests - fast, isolated tests that verify individual components behave correctly
  • Integration tests - tests that verify components work together correctly
  • End-to-end tests - tests that verify the system works from the user’s perspective
  • Regression tests - tests that verify previously fixed defects have not reappeared
  • Contract tests - tests that verify APIs conform to their published contracts

Compliance

  • Audit trail - the pipeline itself produces the compliance artifact: who changed what, when, and what validations it passed
  • Policy as code - organizational policies (e.g., “no deployments on Friday”) encoded as pipeline logic
  • Change documentation - automatically generated from commit metadata and pipeline results

Performance

  • Performance benchmarks - verify that key operations complete within acceptable thresholds
  • Load test baselines - verify that the system handles expected load without degradation
  • Resource utilization checks - verify that the change does not introduce memory leaks or excessive CPU usage

Reliability

  • Health check validation - verify that the application starts up correctly and responds to health checks
  • Graceful degradation tests - verify that the system behaves acceptably when dependencies fail
  • Rollback verification - verify that the deployment can be rolled back (see Rollback)

Code Quality

  • Linting and static analysis - enforce code style and detect common errors
  • Code coverage thresholds - not as a target, but as a safety net to detect large untested areas
  • Complexity metrics - flag code that exceeds complexity thresholds for review

The definition must be fast

A deployable definition that takes hours to evaluate will not support continuous delivery. The entire pipeline - including all deployable definition checks - should complete in minutes, not hours. This often requires running checks in parallel, investing in test infrastructure, and making hard choices about which slow checks provide enough value to keep.

The definition must be maintained

The deployable definition is a living document. As the system evolves, new failure modes emerge, and the definition should be updated to catch them. When a production incident occurs, the team should ask: “What automated check could have caught this?” and add it to the definition.

Anti-Patterns

Manual approval gates

Requiring a human to review and approve a deployment after the pipeline has passed all automated checks is an anti-pattern. It adds latency, creates bottlenecks, and implies that the automated checks are not sufficient. If a human must approve, it means your automated definition is incomplete - fix the definition rather than adding a manual gate.

“Good enough” tolerance

Allowing deployments when some checks fail because “that test always fails” or “it is only a warning” degrades the deployable definition to meaninglessness. Either the check matters and must pass, or it does not matter and should be removed.

Post-deployment validation only

Running validation only after deployment to production (production smoke tests, manual QA in production) means you are using production users to find problems. Pre-deployment validation must be comprehensive enough that post-deployment checks are a safety net, not the primary quality gate.

Inconsistent definitions across teams

When different teams have different deployable definitions, organizational confidence in deployment varies. While the specific checks may differ by service, the categories of validation (security, functionality, performance, compliance) should be consistent.

Good Patterns

Pipeline gates as policy

Encode the deployable definition as pipeline stages that block progression. A change cannot move from build to test, or from test to deployment, unless the preceding stage passes completely. The pipeline enforces the definition; no human override is possible.

Shift-left validation

Run the fastest, most frequently failing checks first. Unit tests and linting run before integration tests. Integration tests run before end-to-end tests. Security scans run in parallel with test stages. This gives developers the fastest possible feedback.

Continuous definition improvement

After every production incident, add or improve a check in the deployable definition that would have caught the issue. Over time, the definition becomes a comprehensive record of everything the team has learned about quality.

Progressive quality gates

Structure the pipeline to fail fast on quick checks, then run progressively more expensive validations. This gives developers the fastest possible feedback while still running comprehensive checks:

Stage 1: Fast Feedback (< 5 min)
  - Linting
  - Unit tests
  - Security scan

Stage 2: Integration (< 15 min)
  - Integration tests
  - Database migrations
  - API contract tests

Stage 3: Comprehensive (< 30 min)
  - E2E tests
  - Performance tests
  - Compliance checks

Each stage acts as a gate. If Stage 1 fails, the pipeline stops immediately rather than wasting time on slower checks that will not matter.

Context-specific definitions

While the categories of validation should be consistent across the organization, the specific checks may vary by deployment target. Define a base set of checks that always apply, then layer additional checks for higher-risk environments:

# Base definition (always required)
base_deployable:
  - unit_tests: pass
  - security_scan: pass
  - code_coverage: >= 80%

# Production-specific (additional requirements)
production_deployable:
  - load_tests: pass
  - disaster_recovery_tested: true
  - runbook_updated: true

# Feature branch (relaxed for experimentation)
feature_deployable:
  - unit_tests: pass
  - security_scan: no_critical

This approach lets teams move fast during development while maintaining rigorous standards for production deployments.

Error budget approach

Use error budgets to connect the deployable definition to production reliability. When the service is within its error budget, the pipeline allows normal deployment. When the error budget is exhausted, the pipeline shifts focus to reliability work:

definition_of_deployable:
  error_budget_remaining: > 0
  slo_compliance: >= 99.9%
  recent_incidents: < 2 per week

This creates a self-correcting system. Teams that ship changes causing incidents consume their error budget, which automatically tightens the deployment criteria until reliability improves.

Visible, shared definitions

Make the deployable definition visible to all team members. Display the current pipeline status on dashboards. When a check fails, provide clear, actionable feedback about what failed and why. The definition should be understood by everyone, not hidden in pipeline configuration.

How to Get Started

Step 1: Document your current “definition of done”

Write down every check that currently happens before a deployment - automated or manual. Include formal checks (tests, scans) and informal ones (someone eyeballs the logs, someone clicks through the UI).

Step 2: Classify each check

For each check, determine: Is it automated? Is it fast? Is it reliable? Is it actually catching real problems? This reveals which checks are already pipeline-ready and which need work.

Step 3: Automate the manual checks

For every manual check, determine how to automate it. A human clicking through the UI becomes an end-to-end test. A human reviewing logs becomes an automated log analysis step. A manager approving a deployment becomes a set of automated policy checks.

Step 4: Build the pipeline gates

Organize your automated checks into pipeline stages. Fast checks first, slower checks later. All checks must pass for the artifact to be considered deployable.

Step 5: Remove manual approvals

Once the automated definition is comprehensive enough that a green build genuinely means “safe to deploy,” remove manual approval gates. This is often the most culturally challenging step.

Connection to the Pipeline Phase

The deployable definition is the contract between the pipeline and the organization. It is what makes the single path to production trustworthy - because every change that passes through the path has been validated against a clear, comprehensive standard.

Combined with a deterministic pipeline, the deployable definition ensures that green means green and red means red. Combined with immutable artifacts, it ensures that the artifact you validated is the artifact you deploy. It is the bridge between automated process and organizational confidence.

Health Metrics

Track these metrics to evaluate whether your deployable definition is well-calibrated:

  • Pipeline pass rate - should be 70-90%. Too high suggests tests are too lax and not catching real problems. Too low suggests tests are too strict or too flaky, causing unnecessary rework.
  • Pipeline execution time - should be under 30 minutes for full validation. Longer pipelines slow feedback and discourage frequent commits.
  • Production incident rate - should decrease over time as the definition improves and catches more failure modes before deployment.
  • Manual override rate - should be near zero. Frequent manual overrides indicate the automated definition is incomplete or that the team does not trust it.

FAQ

Who decides what goes in the deployable definition?

The entire team - developers, QA, operations, security, and product - should collaboratively define these standards. The definition should reflect genuine risks and requirements, not arbitrary bureaucracy. If a check does not prevent a real production problem, question whether it belongs.

What if the pipeline passes but a bug reaches production?

This indicates a gap in the deployable definition. Add a test that catches that class of failure in the future. Over time, every production incident should result in a stronger definition. This is how the definition becomes a comprehensive record of everything the team has learned about quality.

Can we skip pipeline checks for urgent hotfixes?

No. If the pipeline cannot validate a hotfix quickly enough, the problem is with the pipeline, not the process. Fix the pipeline speed rather than bypassing quality checks. Bypassing checks for “urgent” changes is how critical bugs compound in production.

How strict should the definition be?

Strict enough to prevent production incidents, but not so strict that it becomes a bottleneck. If the pipeline rejects 90% of commits, standards may be too rigid or tests may be too flaky. If production incidents are frequent, standards are too lax. Use the health metrics above to calibrate.

Should manual testing be part of the definition?

Manual exploratory testing is valuable for discovering edge cases, but it should inform the definition, not be the definition. When manual testing discovers a defect, automate a test for that failure mode. Over time, manual testing shifts from gatekeeping to exploration.

What about requirements that cannot be tested automatically?

Some requirements - like UX quality or nuanced accessibility - are harder to automate fully. For these:

  1. Automate what you can (accessibility scanners, visual regression tests)
  2. Make remaining manual checks lightweight and concurrent, not deployment blockers
  3. Continuously work to automate more as tooling improves
  • Hardening Sprints - a symptom indicating the deployable definition is incomplete, forcing manual quality efforts before release
  • Infrequent Releases - often caused by unclear or manual criteria for what is ready to ship
  • Manual Deployments - an anti-pattern that automated quality gates in the deployable definition replace
  • Deterministic Pipeline - the Pipeline practice that ensures deployable definition checks produce reliable results
  • Change Fail Rate - a key metric that improves as the deployable definition becomes more comprehensive
  • Testing Fundamentals - the Foundations practice that provides the test suite enforced by the deployable definition

4 - Immutable Artifacts

Build once, deploy everywhere. The same artifact is used in every environment.

Phase 2 - Pipeline

Definition

An immutable artifact is a build output that is created exactly once and deployed to every environment without modification. The binary, container image, or package that runs in production is byte-for-byte identical to the one that passed through testing. Nothing is recompiled, repackaged, or altered between environments.

“Build once, deploy everywhere” is the core principle. The artifact is sealed at build time. Configuration is injected at deployment time (see Application Configuration), but the artifact itself never changes.

Why It Matters for CD Migration

If you build a separate artifact for each environment - or worse, make manual adjustments to artifacts at deployment time - you can never be certain that what you tested is what you deployed. Every rebuild introduces the possibility of variance: a different dependency resolved, a different compiler flag applied, a different snapshot of the source.

Immutable artifacts eliminate an entire class of “works in staging, fails in production” problems. They provide confidence that the pipeline results are real: the artifact that passed every quality gate is the exact artifact running in production.

For teams migrating to CD, this practice is a concrete, mechanical step that delivers immediate trust. Once the team sees that the same container image flows from CI to staging to production, the deployment process becomes verifiable instead of hopeful.

Key Principles

Build once

The artifact is produced exactly once, during the build stage of the pipeline. It is stored in an artifact repository (such as a container registry, Maven repository, npm registry, or object store) and every subsequent stage of the pipeline - and every environment - pulls and deploys that same artifact.

No manual adjustments

Artifacts are never modified after creation. This means:

  • No recompilation for different environments
  • No patching binaries in staging to fix a test failure
  • No adding environment-specific files into a container image after the build
  • No editing properties files inside a deployed artifact

Version everything that goes into the build

Because the artifact is built once and cannot be changed, every input must be correct at build time:

  • Source code - committed to version control at a specific commit hash
  • Dependencies - locked to exact versions via lockfiles
  • Build tools - pinned to specific versions
  • Build configuration - stored in version control alongside the source

Tag and trace

Every artifact must be traceable back to the exact commit, pipeline run, and set of inputs that produced it. Use content-addressable identifiers (such as container image digests), semantic version tags, or build metadata that links the artifact to its source.

Anti-Patterns

Rebuilding per environment

Building the artifact separately for development, staging, and production - even from the same source - means each artifact is a different build. Different builds can produce different results due to non-deterministic build processes, updated dependencies, or changed build environments.

SNAPSHOT or mutable versions

Using version identifiers like -SNAPSHOT (Maven), latest (container images), or unversioned “current” references means the same version label can point to different artifacts at different times. This makes it impossible to know exactly what is deployed. This applies to both the artifacts you produce and the dependencies you consume. A dependency pinned to a -SNAPSHOT version can change underneath you between builds, silently altering your artifact’s behavior without any version change. Version numbers are cheap - assign a new one for every meaningful change rather than reusing a mutable label.

Manual intervention at failure points

When a deployment fails, the fix must go through the pipeline. Manually patching the artifact, restarting with modified configuration, or applying a hotfix directly to the running system breaks immutability and bypasses the quality gates.

Environment-specific builds

Build scripts that use conditionals like “if production, include X” create environment-coupled artifacts. The artifact should be environment-agnostic; environment configuration handles the differences.

Artifacts that self-modify

Applications that write to their own deployment directory, modify their own configuration files at runtime, or store state alongside the application binary are not truly immutable. Runtime state must be stored externally.

Good Patterns

Container images as immutable artifacts

Container images are an excellent vehicle for immutable artifacts. A container image built in CI, pushed to a registry with a content-addressable digest, and pulled into each environment is inherently immutable. The image that ran in staging is provably identical to the image running in production.

Artifact promotion

Instead of rebuilding for each environment, promote the same artifact through environments. The pipeline builds the artifact once, deploys it to a test environment, validates it, then promotes it (deploys the same artifact) to staging, then production. The artifact never changes; only the environment it runs in changes.

Content-addressable storage

Use content-addressable identifiers (SHA-256 digests, content hashes) rather than mutable tags as the primary artifact reference. A content-addressed artifact is immutable by definition: changing any byte changes the address.

Signed artifacts

Digitally sign artifacts at build time and verify the signature before deployment. This guarantees that the artifact has not been tampered with between the build and the deployment. This is especially important for supply chain security.

Reproducible builds

Strive for builds where the same source input produces a bit-for-bit identical artifact. While not always achievable (timestamps, non-deterministic linkers), getting close makes it possible to verify that an artifact was produced from its claimed source.

How to Get Started

Step 1: Separate build from deployment

If your pipeline currently rebuilds for each environment, restructure it into two distinct phases: a build phase that produces a single artifact, and a deployment phase that takes that artifact and deploys it to a target environment with the appropriate configuration.

Step 2: Set up an artifact repository

Choose an artifact repository appropriate for your technology stack - a container registry for container images, a package registry for libraries, or an object store for compiled binaries. All downstream pipeline stages pull from this repository.

Step 3: Eliminate mutable version references

Replace latest tags, -SNAPSHOT versions, and any other mutable version identifier with immutable references. Use commit-hash-based tags, semantic versions, or content-addressable digests.

Step 4: Implement artifact promotion

Modify your pipeline to deploy the same artifact to each environment in sequence. The pipeline should pull the artifact from the repository by its immutable identifier and deploy it without modification.

Step 5: Add traceability

Ensure every deployed artifact can be traced back to its source commit, build log, and pipeline run. Label container images with build metadata. Store build provenance alongside the artifact in the repository.

Step 6: Verify immutability

Periodically verify that what is running in production matches what the pipeline built. Compare image digests, checksums, or signatures. This catches any manual modifications that may have bypassed the pipeline.

Connection to the Pipeline Phase

Immutable artifacts are the physical manifestation of trust in the pipeline. The single path to production ensures all changes flow through the pipeline. The deterministic pipeline ensures the build is repeatable. The deployable definition ensures the artifact meets quality criteria. Immutability ensures that the validated artifact - and only that artifact - reaches production.

This practice also directly supports rollback: because previous artifacts are stored unchanged in the artifact repository, rolling back is simply deploying a previous known-good artifact.

  • Staging Passes, Production Fails - a symptom eliminated when the same artifact is deployed to every environment
  • Snowflake Environments - an anti-pattern that undermines artifact immutability through environment-specific builds
  • Application Configuration - the Pipeline practice that enables immutability by externalizing environment-specific values
  • Deterministic Pipeline - the Pipeline practice that ensures the build process itself is repeatable
  • Rollback - the Pipeline practice that relies on stored immutable artifacts for fast recovery
  • Change Fail Rate - a metric that improves when validated artifacts are deployed without modification

5 - Application Configuration

Separate configuration from code so the same artifact works in every environment.

Phase 2 - Pipeline

Definition

Application configuration is the practice of correctly separating what varies between environments from what does not, so that a single immutable artifact can run in any environment. This distinction - drawn from the Twelve-Factor App methodology - is essential for continuous delivery.

There are two distinct types of configuration:

  • Application config - settings that define how the application behaves, are the same in every environment, and should be bundled with the artifact. Examples: routing rules, feature flag defaults, serialization formats, timeout policies, retry strategies.

  • Environment config - settings that vary by deployment target and must be injected at deployment time. Examples: database connection strings, API endpoint URLs, credentials, resource limits, logging levels for that environment.

Getting this distinction right is critical. Bundling environment config into the artifact breaks immutability. Externalizing application config that does not vary creates unnecessary complexity and fragility.

Why It Matters for CD Migration

Configuration is where many CD migrations stall. Teams that have been deploying manually often have configuration tangled with code - hardcoded URLs, environment-specific build profiles, configuration files that are manually edited during deployment. Untangling this is a prerequisite for immutable artifacts and automated deployments.

When configuration is handled correctly, the same artifact flows through every environment without modification, environment-specific values are injected at deployment time, and feature behavior can be changed without redeploying. This enables the deployment speed and safety that continuous delivery requires.

Key Principles

Bundle what does not vary

Application configuration that is identical across all environments belongs inside the artifact. This includes:

  • Default feature flag values - the static, compile-time defaults for feature flags
  • Application routing and mapping rules - URL patterns, API route definitions
  • Serialization and encoding settings - JSON configuration, character encoding
  • Internal timeout and retry policies - backoff strategies, circuit breaker thresholds
  • Validation rules - input validation constraints, business rule parameters

These values are part of the application’s behavior definition. They should be version controlled with the source code and deployed as part of the artifact.

Externalize what varies

Environment configuration that changes between deployment targets must be injected at deployment time:

  • Database connection strings - different databases for test, staging, production
  • External service URLs - different endpoints for downstream dependencies
  • Credentials and secrets - always injected, never bundled, never in version control
  • Resource limits - memory, CPU, connection pool sizes tuned per environment
  • Environment-specific logging levels - verbose in development, structured in production
  • Feature flag overrides - dynamic flag values managed by an external flag service

Feature flags: static vs. dynamic

Feature flags deserve special attention because they span both categories:

  • Static feature flags - compiled into the artifact as default values. They define the initial state of a feature when the application starts. Changing them requires a new build and deployment.

  • Dynamic feature flags - read from an external service at runtime. They can be toggled without deploying. Use these for operational toggles (kill switches, gradual rollouts) and experiment flags (A/B tests).

A well-designed feature flag system uses static defaults (bundled in the artifact) that can be overridden by a dynamic source (external flag service). If the flag service is unavailable, the application falls back to its static defaults - a safe, predictable behavior.

Anti-Patterns

Hardcoded environment-specific values

Database URLs, API endpoints, or credentials embedded directly in source code or configuration files that are baked into the artifact. This forces a different build per environment and makes secrets visible in version control.

Externalizing everything

Moving all configuration to an external service - including values that never change between environments - creates unnecessary runtime dependencies. If the configuration service is down and a value that is identical in every environment cannot be read, the application fails to start for no good reason.

Environment-specific build profiles

Build systems that use profiles like mvn package -P production or Webpack configurations that toggle behavior based on NODE_ENV at build time create environment-coupled artifacts. The artifact must be the same regardless of where it will run.

Configuration files edited during deployment

Manually editing application.properties, .env files, or YAML configurations on the server during or after deployment is error-prone, unrepeatable, and invisible to the pipeline. All configuration injection must be automated.

Secrets in version control

Credentials, API keys, certificates, and tokens must never be stored in version control - not even in “private” repositories, not even encrypted with simple mechanisms. Use a secrets manager (Vault, AWS Secrets Manager, Azure Key Vault) and inject secrets at deployment time.

Good Patterns

Environment variables for environment config

Following the Twelve-Factor App approach, inject environment-specific values as environment variables. This is universally supported across languages and platforms, works with containers and orchestrators, and keeps the artifact clean.

Layered configuration

Use a configuration framework that supports layering:

  1. Defaults - bundled in the artifact (application config)
  2. Environment overrides - injected via environment variables or mounted config files
  3. Dynamic overrides - read from a feature flag service or configuration service at runtime

Each layer overrides the previous one. The application always has a working default, and environment-specific or dynamic values override only what needs to change.

Config maps and secrets in orchestrators

Kubernetes ConfigMaps and Secrets (or equivalent mechanisms in other orchestrators) provide a clean separation between the artifact (the container image) and the environment-specific configuration. The image is immutable; the configuration is injected at pod startup.

Secrets management with rotation

Use a dedicated secrets manager that supports automatic rotation, audit logging, and fine-grained access control. The application retrieves secrets at startup or on-demand, and the secrets manager handles rotation without requiring redeployment.

Configuration validation at startup

The application should validate its configuration at startup and fail fast with a clear error message if required configuration is missing or invalid. This catches configuration errors immediately rather than allowing the application to start in a broken state.

How to Get Started

Step 1: Inventory your configuration

List every configuration value your application uses. For each one, determine: Does this value change between environments? If yes, it is environment config. If no, it is application config.

Step 2: Move environment config out of the artifact

For every environment-specific value currently bundled in the artifact (hardcoded URLs, build profiles, environment-specific property files), extract it and inject it via environment variable, config map, or secrets manager.

Step 3: Bundle application config with the code

For every value that does not vary between environments, ensure it is committed to version control alongside the source code and included in the artifact at build time. Remove it from any external configuration system where it adds unnecessary complexity.

Step 4: Implement feature flags properly

Set up a feature flag framework with static defaults in the code and an external flag service for dynamic overrides. Ensure the application degrades gracefully if the flag service is unavailable.

Step 5: Remove environment-specific build profiles

Eliminate any build-time branching based on target environment. The build produces one artifact. Period.

Step 6: Automate configuration injection

Ensure that configuration injection is fully automated in the deployment pipeline. No human should manually set environment variables or edit configuration files during deployment.

Common Questions

How do I change application config for a specific environment?

You should not need to. If a value needs to vary by environment, it is environment configuration and should be injected via environment variables or a secrets manager. Application configuration is the same everywhere by definition.

What if I need to hotfix a config value in production?

If it is truly application configuration, make the change in code, commit it, let the pipeline validate it, and deploy the new artifact. Hotfixing config outside the pipeline defeats the purpose of immutable artifacts.

What about config that changes frequently?

If a value changes frequently enough that redeploying is impractical, it might be data, not configuration. Consider whether it belongs in a database or content management system instead. Configuration should be relatively stable - it defines how the application behaves, not what content it serves.

Measuring Progress

Track these metrics to confirm that configuration is being handled correctly:

  • Configuration drift incidents - should be zero when application config is immutable with the artifact
  • Config-related rollbacks - track how often configuration changes cause production rollbacks; this should decrease steadily
  • Time from config commit to production - should match your normal deployment cycle time, confirming that config changes flow through the same pipeline as code changes

Connection to the Pipeline Phase

Application configuration is the enabler that makes immutable artifacts practical. An artifact can only be truly immutable if it does not contain environment-specific values that would need to change between deployments.

Correct configuration separation also supports production-like environments - because the same artifact runs everywhere, the only difference between environments is the injected configuration, which is itself version controlled and automated.

When configuration is externalized correctly, rollback becomes straightforward: deploy the previous artifact with the appropriate configuration, and the system returns to its prior state.

6 - Production-Like Environments

Test in environments that match production to catch environment-specific issues early.

Phase 2 - Pipeline

Definition

Production-like environments are pre-production environments that mirror the infrastructure, configuration, and behavior of production closely enough that passing tests in these environments provides genuine confidence that the change will work in production.

“Production-like” does not mean “identical to production” in every dimension. It means that the aspects of the environment relevant to the tests being run match production sufficiently to produce a valid signal. A unit test environment needs the right runtime version. An integration test environment needs the right service topology. A staging environment needs the right infrastructure, networking, and data characteristics.

Why It Matters for CD Migration

The gap between pre-production environments and production is where deployment failures hide. Teams that test in environments that differ significantly from production - in operating system, database version, network topology, resource constraints, or configuration - routinely discover issues only after deployment.

For a CD migration, production-like environments are what transform pre-production testing from “we hope this works” to “we know this works.” They close the gap between the pipeline’s quality signal and the reality of production, making it safe to deploy automatically.

Key Principles

Staging reflects production infrastructure

Your staging environment should match production in the dimensions that affect application behavior:

  • Infrastructure platform - same cloud provider, same orchestrator, same service mesh
  • Network topology - same load balancer configuration, same DNS resolution patterns, same firewall rules
  • Database engine and version - same database type, same version, same configuration parameters
  • Operating system and runtime - same OS distribution, same runtime version, same system libraries
  • Service dependencies - same versions of downstream services, or accurate test doubles

Staging does not necessarily need the same scale as production (fewer replicas, smaller instances), but the architecture must be the same.

Environments are version controlled

Every aspect of the environment that can be defined in code must be:

  • Infrastructure definitions - Terraform, CloudFormation, Pulumi, or equivalent
  • Configuration - Kubernetes manifests, Helm charts, Ansible playbooks
  • Network policies - security groups, firewall rules, service mesh configuration
  • Monitoring and alerting - the same observability configuration in all environments

Version-controlled environments can be reproduced, compared, and audited. Manual environment configuration cannot.

Ephemeral environments

Ephemeral environments are full-stack, on-demand, short-lived environments spun up for a specific purpose - a pull request, a test run, a demo - and destroyed when that purpose is complete.

Key characteristics of ephemeral environments:

  • Full-stack - they include the application and all of its dependencies (databases, message queues, caches, downstream services), not just the application in isolation
  • On-demand - any developer or pipeline can spin one up at any time without waiting for a shared resource
  • Short-lived - they exist for hours or days, not weeks or months. This prevents configuration drift and stale state
  • Version controlled - the environment definition is in code, and the environment is created from a specific version of that code
  • Isolated - they do not share resources with other environments. No shared databases, no shared queues, no shared service instances

Ephemeral environments replace the long-lived “static” environments - “development,” “QA1,” “QA2,” “testing” - and the maintenance burden required to keep those stable. They eliminate the “shared staging” bottleneck where multiple teams compete for a single pre-production environment and block each other’s progress.

Data is representative

The data in pre-production environments must be representative of production data in structure, volume, and characteristics. This does not mean using production data directly (which raises security and privacy concerns). It means:

  • Schema matches production - same tables, same columns, same constraints
  • Volume is realistic - tests run against data sets large enough to reveal performance issues
  • Data characteristics are representative - edge cases, special characters, null values, and data distributions that match what the application will encounter
  • Data is anonymized - if production data is used as a seed, all personally identifiable information is removed or masked

Anti-Patterns

Shared, long-lived staging environments

A single staging environment shared by multiple teams becomes a bottleneck and a source of conflicts. Teams overwrite each other’s changes, queue up for access, and encounter failures caused by other teams’ work. Long-lived environments also drift from production as manual changes accumulate.

Environments that differ from production in critical ways

Running a different database version in staging than production, using a different operating system, or skipping the load balancer that exists in production creates blind spots where issues hide until they reach production.

“It works on my laptop” as validation

Developer laptops are the least production-like environment available. They have different operating systems, different resource constraints, different network characteristics, and different installed software. Local validation is valuable for fast feedback during development, but it does not replace testing in a production-like environment.

Manual environment provisioning

Environments created by manually clicking through cloud consoles, running ad-hoc scripts, or following runbooks are unreproducible and drift over time. If you cannot destroy and recreate the environment from code in minutes, it is not suitable for continuous delivery.

Synthetic-only test data

Using only hand-crafted test data with a few happy-path records misses the issues that emerge with production-scale data: slow queries, missing indexes, encoding problems, and edge cases that only appear in real-world data distributions.

Good Patterns

Infrastructure as Code for all environments

Define every environment - from local development to production - using the same Infrastructure as Code templates. The differences between environments are captured in configuration variables (instance sizes, replica counts, domain names), not in different templates.

Environment-per-pull-request

Automatically provision a full-stack ephemeral environment for every pull request. Run the full test suite against this environment. Tear it down when the pull request is merged or closed. This provides isolated, production-like validation for every change.

Production data sampling and anonymization

Build an automated pipeline that samples production data, anonymizes it (removing PII, masking sensitive fields), and loads it into pre-production environments. This provides realistic data without security or privacy risks.

Service virtualization for external dependencies

For external dependencies that cannot be replicated in pre-production (third-party APIs, partner systems), use service virtualization to create realistic test doubles that mimic the behavior, latency, and error modes of the real service.

Environment parity monitoring

Continuously compare pre-production environments against production to detect drift. Alert when the infrastructure, configuration, or service versions diverge. Tools that compare Terraform state, Kubernetes configurations, or cloud resource inventories can automate this comparison.

Namespaced environments in shared clusters

In Kubernetes or similar platforms, use namespaces to create isolated environments within a shared cluster. Each namespace gets its own set of services, databases, and configuration, providing isolation without the cost of separate clusters.

How to Get Started

Step 1: Audit environment parity

Compare your current pre-production environments against production across every relevant dimension: infrastructure, configuration, data, service versions, network topology. List every difference.

Step 2: Infrastructure-as-Code your environments

If your environments are not yet defined in code, start here. Define your production environment in Terraform, CloudFormation, or equivalent. Then create pre-production environments from the same definitions with different parameter values.

Step 3: Address the highest-risk parity gaps

From your audit, identify the differences most likely to cause production failures - typically database version mismatches, missing infrastructure components, or network configuration differences. Fix these first.

Step 4: Implement ephemeral environments

Build the tooling to spin up and tear down full-stack environments on demand. Start with a simplified version (perhaps without full data replication) and iterate toward full production parity.

Step 5: Automate data provisioning

Create an automated pipeline for generating or sampling representative test data. Include anonymization, schema validation, and data refresh on a regular schedule.

Step 6: Monitor and maintain parity

Set up automated checks that compare pre-production environments to production and alert on drift. Make parity a continuous concern, not a one-time setup.

Connection to the Pipeline Phase

Production-like environments are where the pipeline’s quality gates run. Without production-like environments, the deployable definition produces a false signal - tests pass in an environment that does not resemble production, and failures appear only after deployment.

Immutable artifacts flow through these environments unchanged, with only configuration varying. This combination - same artifact, production-like environment, environment-specific configuration - is what gives the pipeline its predictive power.

Production-like environments also support effective rollback testing: you can validate that a rollback works correctly in a staging environment before relying on it in production.

7 - Pipeline Architecture

Design efficient quality gates for your delivery system’s context.

Phase 2 - Pipeline | Adapted from Dojo Consortium

Definition

Pipeline architecture is the structural design of your delivery pipeline - how stages are organized, how quality gates are sequenced, how feedback loops operate, and how the pipeline evolves over time. It encompasses both the technical design of the pipeline and the improvement journey that a team follows from an initial, fragile pipeline to a mature, resilient delivery system.

Good pipeline architecture is not achieved in a single step. Teams progress through recognizable states, applying the Theory of Constraints to systematically identify and resolve bottlenecks. The goal is a loosely coupled architecture where independent services can be built, tested, and deployed independently through their own pipelines.

Why It Matters for CD Migration

Most teams beginning a CD migration have a pipeline that is somewhere between “barely functional” and “works most of the time.” The pipeline may be slow, fragile, or tightly coupled to other systems. Improving it requires a deliberate architectural approach - not just adding more stages or more tests, but designing the pipeline for the flow characteristics that continuous delivery demands.

Understanding where your pipeline architecture currently stands, and what the next improvement looks like, prevents teams from either stalling at a “good enough” state or attempting to jump directly to a target state that their context cannot support.

Three Architecture States

Teams typically progress through three recognizable states on their journey to mature pipeline architecture. Understanding which state you are in determines what improvements to prioritize.

Entangled (Requires Remediation)

In the entangled state, the pipeline has significant structural problems that prevent reliable delivery:

  • Multiple applications share a single pipeline - a change to one application triggers builds and tests for all applications, causing unnecessary delays and false failures
  • Shared, mutable infrastructure - pipeline stages depend on shared databases, shared environments, or shared services that introduce coupling and contention
  • Manual stages interrupt automated flow - manual approval gates, manual test execution, or manual environment provisioning block the pipeline for hours or days
  • No clear ownership - the pipeline is maintained by a central team, and application teams cannot modify it without filing tickets and waiting
  • Build times measured in hours - the pipeline is so slow that developers batch changes and avoid running it
  • Flaky tests are accepted - the team routinely re-runs failed pipelines, and failures are assumed to be transient

Remediation priorities:

  1. Separate pipelines for separate applications
  2. Remove manual stages or parallelize them out of the critical path
  3. Fix or remove flaky tests
  4. Establish clear pipeline ownership with the application team

Tightly Coupled (Transitional)

In the tightly coupled state, each application has its own pipeline, but pipelines depend on each other or on shared resources:

  • Integration tests span multiple services - a pipeline for service A runs integration tests that require service B, C, and D to be deployed in a specific state
  • Shared test environments - multiple pipelines deploy to the same staging environment, creating contention and sequencing constraints
  • Coordinated deployments - deploying service A requires simultaneously deploying service B, which requires coordinating two pipelines
  • Shared build infrastructure - pipelines compete for limited build agent capacity, causing queuing delays
  • Pipeline definitions are centralized - a shared pipeline library controls the structure, and application teams cannot customize it for their needs

Improvement priorities:

  1. Replace cross-service integration tests with contract tests
  2. Implement ephemeral environments to eliminate shared environment contention
  3. Decouple service deployments using backward-compatible changes and feature flags
  4. Give teams ownership of their pipeline definitions
  5. Scale build infrastructure to eliminate queuing

Loosely Coupled (Goal)

In the loosely coupled state, each service has an independent pipeline that can build, test, and deploy without depending on other services’ pipelines:

  • Independent deployability - any service can be deployed at any time without coordinating with other teams
  • Contract-based integration - services verify their interactions through contract tests, not cross-service integration tests
  • Ephemeral, isolated environments - each pipeline creates its own test environment and tears it down when done
  • Team-owned pipelines - each team controls their pipeline definition and can optimize it for their service’s needs
  • Fast feedback - the pipeline completes in minutes, providing rapid feedback to developers
  • Self-service infrastructure - teams provision their own pipeline infrastructure without waiting for a central team

Applying the Theory of Constraints

Pipeline improvement follows the Theory of Constraints: identify the single biggest bottleneck, resolve it, and repeat. The key steps:

Step 1: Identify the constraint

Measure where time is spent in the pipeline. Common constraints include:

  • Slow test suites - tests that take 30+ minutes dominate the pipeline duration
  • Queuing for shared resources - pipelines waiting for build agents, shared environments, or manual approvals
  • Flaky failures and re-runs - time lost to investigating and re-running non-deterministic failures
  • Large batch sizes - pipelines triggered by large, infrequent commits that take longer to build and are harder to debug when they fail

Step 2: Exploit the constraint

Get the maximum throughput from the current constraint without changing the architecture:

  • Parallelize test execution across multiple agents
  • Cache dependencies to speed up the build stage
  • Prioritize pipeline runs (trunk commits before branch builds)
  • Deduplicate unnecessary work (skip unchanged modules)

Step 3: Subordinate everything else to the constraint

Ensure that other parts of the system do not overwhelm the constraint:

  • If the test stage is the bottleneck, do not add more tests without first making existing tests faster
  • If the build stage is the bottleneck, do not add more build steps without first optimizing the build

Step 4: Elevate the constraint

If exploiting the constraint is not sufficient, invest in removing it:

  • Rewrite slow tests to be faster
  • Replace shared environments with ephemeral environments
  • Replace manual gates with automated checks
  • Split monolithic pipelines into independent service pipelines

Step 5: Repeat

Once a constraint is resolved, a new constraint will emerge. This is expected. The pipeline improves through continuous iteration, not through a single redesign.

Key Design Principles

Fast feedback first

Organize pipeline stages so that the fastest checks run first. A developer should know within minutes if their change has an obvious problem (compilation failure, linting error, unit test failure). Slower checks (integration tests, security scans, performance tests) run after the fast checks pass.

Fail fast, fail clearly

When the pipeline fails, it should fail as early as possible and produce a clear, actionable error message. A developer should be able to read the failure output and know exactly what to fix without digging through logs.

Parallelize where possible

Stages that do not depend on each other should run in parallel. Security scans can run alongside integration tests. Linting can run alongside compilation. Parallelization is the most effective way to reduce pipeline duration without removing checks.

Pipeline as code

The pipeline definition lives in the same repository as the application it builds and deploys. This gives the team full ownership and allows the pipeline to evolve alongside the application.

Observability

Instrument the pipeline itself with metrics and monitoring. Track:

  • Lead time - time from commit to production deployment
  • Pipeline duration - time from pipeline start to completion
  • Failure rate - percentage of pipeline runs that fail
  • Recovery time - time from failure detection to successful re-run
  • Queue time - time spent waiting before the pipeline starts

These metrics identify bottlenecks and measure improvement over time.

Anti-Patterns

The “grand redesign”

Attempting to redesign the entire pipeline at once, rather than iteratively improving the biggest constraint, is a common failure mode. Grand redesigns take too long, introduce too much risk, and often fail to address the actual problems.

Central pipeline teams that own all pipelines

A central team that controls all pipeline definitions creates a bottleneck. Application teams wait for changes, cannot customize pipelines for their context, and are disconnected from their own delivery process.

Optimizing non-constraints

Speeding up a pipeline stage that is not the bottleneck does not improve overall delivery time. Measure before optimizing.

Monolithic pipeline for microservices

Running all microservices through a single pipeline that builds and deploys everything together defeats the purpose of a microservice architecture. Each service should have its own independent pipeline.

How to Get Started

Step 1: Assess your current state

Determine which architecture state - entangled, tightly coupled, or loosely coupled - best describes your current pipeline. Be honest about where you are.

Step 2: Measure your pipeline

Instrument your pipeline to measure duration, failure rates, queue times, and bottlenecks. You cannot improve what you do not measure.

Step 3: Identify the top constraint

Using your measurements, identify the single biggest bottleneck in your pipeline. This is where you focus first.

Step 4: Apply the Theory of Constraints cycle

Exploit, subordinate, and if necessary elevate the constraint. Then measure again and identify the next constraint.

Step 5: Evolve toward loose coupling

With each improvement cycle, move toward independent, team-owned pipelines that can build, test, and deploy services independently. This is a journey of months or years, not days.

Connection to the Pipeline Phase

Pipeline architecture is where all the other practices in this phase come together. The single path to production defines the route. The deterministic pipeline ensures reliability. The deployable definition defines the quality gates. The architecture determines how these elements are organized, sequenced, and optimized for flow.

As teams mature their pipeline architecture toward loose coupling, they build the foundation for Phase 3: Optimize - where the focus shifts from building the pipeline to improving its speed and reliability.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

  • Slow Pipelines - a symptom directly addressed by applying the Theory of Constraints to pipeline architecture
  • Coordinated Deployments - a symptom of tightly coupled pipeline architecture
  • No Fast Feedback - a symptom that pipeline architecture improvements resolve through stage ordering and parallelization
  • Missing Deployment Pipeline - the anti-pattern that pipeline architecture replaces
  • Release Frequency - a key metric that improves as pipeline architecture matures toward loose coupling
  • Phase 3: Optimize - the next phase, which builds on mature pipeline architecture

8 - Rollback

Enable fast recovery from any deployment by maintaining the ability to roll back.

Phase 2 - Pipeline

Definition

Rollback is the ability to quickly and safely revert a production deployment to a previous known-good state. It is the safety net that makes continuous delivery possible: because you can always undo a deployment, deploying becomes a low-risk, routine operation.

Rollback is not a backup plan for when things go catastrophically wrong. It is a standard operational capability that should be exercised regularly and trusted completely. Every deployment to production should be accompanied by a tested, automated, fast rollback mechanism.

Why It Matters for CD Migration

Fear of deployment is the single biggest cultural barrier to continuous delivery. Teams that have experienced painful, irreversible deployments develop a natural aversion to deploying frequently. They batch changes, delay releases, and add manual approval gates - all of which slow delivery and increase risk.

Reliable, fast rollback breaks this cycle. When the team knows that any deployment can be reversed in minutes, the perceived risk of deployment drops dramatically. Smaller, more frequent deployments become possible. The feedback loop tightens. The entire delivery system improves.

Key Principles

Fast

Rollback must complete in minutes, not hours. A rollback that takes an hour to execute is not a rollback - it is a prolonged outage with a recovery plan. Target rollback times of 5 minutes or less for the deployment mechanism itself. If the previous artifact is already in the artifact repository and the deployment mechanism is automated, there is no reason rollback should take longer than a fresh deployment.

Automated

Rollback must be a single command or a single click - or better, fully automated based on health checks. It should not require:

  • SSH access to production servers
  • Manual editing of configuration files
  • Running scripts with environment-specific parameters from memory
  • Coordinating multiple teams to roll back multiple services simultaneously

Safe

Rollback must not make things worse. This means:

  • Rolling back must not lose data
  • Rolling back must not corrupt state
  • Rolling back must not break other services that depend on the rolled-back service
  • Rolling back must not require downtime beyond what the deployment mechanism itself imposes

Simple

The rollback procedure should be understandable by any team member, including those who did not perform the original deployment. It should not require specialized knowledge, deep system understanding, or heroic troubleshooting.

Tested

Rollback must be tested regularly, not just documented. A rollback procedure that has never been exercised is a rollback procedure that will fail when you need it most. Include rollback verification in your deployable definition and practice rollback as part of routine deployment validation.

Rollback Strategies

Blue-Green Deployment

Maintain two identical production environments - blue and green. At any time, one is live (serving traffic) and the other is idle. To deploy, deploy to the idle environment, verify it, and switch traffic. To roll back, switch traffic back to the previous environment.

Blue (current): v1.2.3
Green (idle):   v1.2.2

Issue detected in Blue
  |
Switch traffic to Green (v1.2.2)
  |
Instant rollback (< 30 seconds)

Advantages:

  • Rollback is instantaneous - just a traffic switch
  • The previous version remains running and warm
  • Zero-downtime deployment and rollback

Considerations:

  • Requires double the infrastructure (though the idle environment can be scaled down)
  • Database changes must be backward-compatible across both versions
  • Session state must be externalized so it survives the switch

Canary Deployment

Deploy the new version to a small subset of production infrastructure (the “canary”) and route a percentage of traffic to it. Monitor the canary for errors, latency, and business metrics. If the canary is healthy, gradually increase traffic. If problems appear, route all traffic back to the previous version.

Deploy v1.2.3 to 10% of servers
  |
Issue detected in monitoring
  |
Automatically roll back 10% to v1.2.2
  |
Issue contained, minimal user impact

Advantages:

  • Limits blast radius - problems affect only a fraction of users
  • Provides real production data for validation before full rollout
  • Rollback is fast - stop sending traffic to the canary

Considerations:

  • Requires traffic routing infrastructure (service mesh, load balancer configuration)
  • Both versions must be able to run simultaneously
  • Monitoring must be sophisticated enough to detect subtle problems in the canary

Feature Flag Rollback

When a deployment introduces new behavior behind a feature flag, rollback can be as simple as turning off the flag. The code remains deployed, but the new behavior is disabled. This is the fastest possible rollback - it requires no deployment at all.

// Feature flag controls new behavior
if (featureFlags.isEnabled('new-checkout')) {
  return renderNewCheckout()
}
return renderOldCheckout()

// Rollback: Toggle flag off via configuration
// No deployment needed, instant effect

Advantages:

  • Instantaneous - no deployment, no traffic switch
  • Granular - roll back a single feature without affecting other changes
  • No infrastructure changes required

Considerations:

  • Requires a feature flag system with runtime toggle capability
  • Only works for changes that are behind flags
  • Feature flag debt (old flags that are never cleaned up) must be managed

Database-Safe Rollback with Expand-Contract

Database schema changes are the most common obstacle to rollback. If a deployment changes the database schema, rolling back the application code may fail if the old code is incompatible with the new schema.

The expand-contract pattern (also called parallel change) solves this:

  1. Expand - add new columns, tables, or structures alongside the existing ones. The old application code continues to work. Deploy this change.
  2. Migrate - update the application to write to both old and new structures, and read from the new structure. Deploy this change. Backfill historical data.
  3. Contract - once all application versions using the old structure are retired, remove the old columns or tables. Deploy this change.

At every step, the previous application version remains compatible with the current database schema. Rollback is always safe.

-- Safe: Additive change (expand)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Old code ignores the new column
-- New code uses the new column
-- Rolling back code does not break anything

-- Unsafe: Destructive change
ALTER TABLE users DROP COLUMN email;
-- Old code breaks because email column is gone
-- Rollback requires schema rollback (risky)

Anti-pattern: Destructive schema changes (dropping columns, renaming tables, changing types) deployed simultaneously with the application code change that requires them. This makes rollback impossible because the old code cannot work with the new schema.

Anti-Patterns

“We’ll fix forward”

Relying exclusively on fixing forward (deploying a new fix rather than rolling back) is dangerous when the system is actively degraded. Fix-forward should be an option when the issue is well-understood and the fix is quick. Rollback should be the default when the issue is unclear or the fix will take time. Both capabilities must exist.

Rollback as a documented procedure only

A rollback procedure that exists only in a runbook, wiki, or someone’s memory is not a reliable rollback capability. Procedures that are not automated and regularly tested will fail under the pressure of a production incident.

Coupled service rollbacks

When rolling back service A requires simultaneously rolling back services B and C, you do not have independent rollback capability. Design services to be backward-compatible so that each service can be rolled back independently.

Destructive database migrations

Schema changes that destroy data or break backward compatibility make rollback impossible. Always use the expand-contract pattern for schema changes.

Manual rollback requiring specialized knowledge

If only one person on the team knows how to perform a rollback, the team does not have a rollback capability - it has a single point of failure. Rollback must be simple enough for any team member to execute.

Good Patterns

Automated rollback on health check failure

Configure the deployment system to automatically roll back if the new version fails health checks within a defined window after deployment. This removes the need for a human to detect the problem and initiate the rollback.

Rollback testing in staging

As part of every deployment to staging, deploy the new version, verify it, then roll it back and verify the rollback. This ensures that rollback works for every release, not just in theory.

Artifact retention

Retain previous artifact versions in the artifact repository so that rollback is always possible. Define a retention policy (for example, keep the last 10 production-deployed versions) and ensure that rollback targets are always available.

Deployment log and audit trail

Maintain a clear record of what is currently deployed, what was previously deployed, and when changes occurred. This makes it easy to identify the correct rollback target and verify that the rollback was successful.

Rollback runbook exercises

Regularly practice rollback as a team exercise - not just as part of automated testing, but as a deliberate drill. This builds team confidence and identifies gaps in the process.

How to Get Started

Step 1: Document your current rollback capability

Can you roll back your current production deployment right now? How long would it take? Who would need to be involved? What could go wrong? Be honest about the answers.

Step 2: Implement a basic automated rollback

Start with the simplest mechanism available for your deployment platform - redeploying the previous container image, switching a load balancer target, or reverting a Kubernetes deployment. Automate this as a single command.

Step 3: Test the rollback

Deploy a change to staging, then roll it back. Verify that the system returns to its previous state. Make this a standard part of your deployment validation.

Step 4: Address database compatibility

Audit your database migration practices. If you are making destructive schema changes, shift to the expand-contract pattern. Ensure that the previous application version is always compatible with the current database schema.

Step 5: Reduce rollback time

Measure how long rollback takes. Identify and eliminate delays - slow artifact downloads, slow startup times, manual steps. Target rollback completion in under 5 minutes.

Step 6: Build team confidence

Practice rollback regularly. Demonstrate it during deployment reviews. Make it a normal part of operations, not an emergency procedure. When the team trusts rollback, they will trust deployment.

Connection to the Pipeline Phase

Rollback is the capstone of the Pipeline phase. It is what makes the rest of the phase safe:

With rollback in place, the team has the confidence to deploy frequently, which is the foundation for Phase 3: Optimize and ultimately Phase 4: Deliver on Demand.

FAQ

How far back should we be able to roll back?

At minimum, keep the last 3 to 5 production releases available for rollback. Ideally, retain any production release from the past 30 to 90 days. Balance storage costs with rollback flexibility by defining a retention policy for your artifact repository.

What if the database schema changed?

Design schema changes to be backward-compatible:

  • Use the expand-contract pattern described above
  • Make schema changes in a separate deployment from the code changes that depend on them
  • Test that the old application code works with the new schema before deploying the code change

What if we need to roll back the database too?

Database rollbacks are inherently risky because they can destroy data. Instead of rolling back the database:

  1. Design schema changes to support application rollback (backward compatibility)
  2. Use feature flags to disable code that depends on the new schema
  3. If absolutely necessary, maintain tested database rollback scripts - but treat this as a last resort

Should rollback require approval?

No. The on-call engineer should be empowered to roll back immediately without waiting for approval. Speed of recovery is critical during an incident. Post-rollback review is appropriate, but requiring approval before rollback adds delay when every minute counts.

How do we test rollback?

  1. Practice regularly - perform rollback drills during low-traffic periods
  2. Automate testing - include rollback verification in your pipeline
  3. Use staging - test rollback in staging before every production deployment
  4. Run chaos exercises - randomly trigger rollbacks to ensure they work under realistic conditions

What if rollback fails?

Have a contingency plan:

  1. Roll forward to the next known-good version
  2. Use feature flags to disable the problematic behavior
  3. Have an out-of-band deployment method as a last resort

If rollback is regularly tested, failures should be extremely rare.

How long should rollback take?

Target under 5 minutes from the decision to roll back to service restored.

Typical breakdown:

  • Trigger rollback: under 30 seconds
  • Deploy previous artifact: 2 to 3 minutes
  • Verify with smoke tests: 1 to 2 minutes

What about configuration changes?

Configuration should be versioned and separated from the application artifact. Rolling back the artifact should not require separately rolling back environment configuration. See Application Configuration for how to achieve this.

  • Fear of Deploying - the symptom that reliable rollback capability directly resolves
  • Infrequent Releases - a symptom driven by deployment risk that rollback mitigates
  • Manual Deployments - an anti-pattern incompatible with fast, automated rollback
  • Immutable Artifacts - the Pipeline practice that makes rollback reliable by preserving previous artifacts
  • Mean Time to Repair - a key metric that rollback capability directly improves
  • Feature Flags - an Optimize practice that provides an alternative rollback mechanism at the feature level