This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Phase 1: Foundations

Establish the essential practices for daily integration, testing, and small work decomposition.

Key question: “Can we integrate safely every day?”

This phase establishes the development practices that make continuous delivery possible. Without these foundations, pipeline automation just speeds up a broken process.

What You’ll Do

  1. Adopt trunk-based development - Integrate to trunk at least daily
  2. Build testing fundamentals - Create a fast, reliable test suite
  3. Automate your build - One command to build, test, and package
  4. Decompose work - Break features into small, deliverable increments
  5. Streamline code review - Fast, effective review that doesn’t block flow
  6. Establish working agreements - Shared definitions of done and ready
  7. Everything as code - Infrastructure, pipelines, schemas, monitoring, and security policies in version control, delivered through pipelines

Why This Phase Matters

These practices are the prerequisites for everything that follows. Trunk-based development eliminates merge hell. Testing fundamentals give you the confidence to deploy frequently. Small work decomposition reduces risk per change. Together, they create the feedback loops that drive continuous improvement.

When You’re Ready to Move On

You’re ready for Phase 2: Pipeline when:

  • All developers integrate to trunk at least once per day
  • Your test suite catches real defects and runs in under 10 minutes
  • You can build and package your application with a single command
  • Most work items are completable within 2 days

1 - Trunk-Based Development

Integrate all work to the trunk at least once per day to enable continuous integration.

Phase 1 - Foundations | Adapted from MinimumCD.org

Trunk-based development is the first foundation to establish. Without daily integration to a shared trunk, the rest of the CD migration cannot succeed. This page covers the core practice, two migration paths, and a tactical guide for getting started.

What Is Trunk-Based Development?

Trunk-based development (TBD) is a branching strategy where all developers integrate their work into a single shared branch - the trunk - at least once per day. The trunk is always kept in a releasable state.

This is a non-negotiable prerequisite for continuous delivery. If your team is not integrating to trunk daily, you are not doing CI, and you cannot do CD. There is no workaround.

“If it hurts, do it more often, and bring the pain forward.”

  • Jez Humble, Continuous Delivery

What TBD Is Not

  • It is not “everyone commits directly to main with no guardrails.” You still test, review, and validate work - you just do it in small increments.
  • It is not incompatible with code review. It requires review to happen quickly.
  • It is not reckless. It is the opposite: small, frequent integrations are far safer than large, infrequent merges.

What Trunk-Based Development Improves

Problem How TBD Helps
Merge conflicts Small changes integrated frequently rarely conflict
Integration risk Bugs are caught within hours, not weeks
Long-lived branches diverge from reality The trunk always reflects the current state of the codebase
“Works on my branch” syndrome Everyone shares the same integration point
Slow feedback CI runs on every integration, giving immediate signal
Large batch deployments Small changes are individually deployable
Fear of deployment Each change is small enough to reason about

Two Migration Paths

There are two valid approaches to trunk-based development. Both satisfy the minimum CD requirement of daily integration. Choose the one that fits your team’s current maturity and constraints.

Path 1: Short-Lived Branches

Developers create branches that live for less than 24 hours. Work is done on the branch, reviewed quickly, and merged to trunk within a single day.

How it works:

  1. Pull the latest trunk
  2. Create a short-lived branch
  3. Make small, focused changes
  4. Open a pull request (or use pair programming as the review)
  5. Merge to trunk before end of day
  6. The branch is deleted after merge

Best for teams that:

  • Currently use long-lived feature branches and need a stepping stone
  • Have regulatory requirements for traceable review records
  • Use pull request workflows they want to keep (but make faster)
  • Are new to TBD and want a gradual transition

Key constraint: The branch must merge to trunk within 24 hours. If it does not, you have a long-lived branch and you have lost the benefit of TBD.

Path 2: Direct Trunk Commits

Developers commit directly to trunk. Quality is ensured through pre-commit checks, pair programming, and strong automated testing.

How it works:

  1. Pull the latest trunk
  2. Make a small, tested change locally
  3. Run the local build and test suite
  4. Push directly to trunk
  5. CI validates the commit immediately

Best for teams that:

  • Have strong automated test coverage
  • Practice pair or mob programming (which provides real-time review)
  • Want maximum integration frequency
  • Have high trust and shared code ownership

Key constraint: This requires excellent test coverage and a culture where the team owns quality collectively. Without these, direct trunk commits become reckless.

How to Choose Your Path

Ask these questions:

  1. Do you have automated tests that catch real defects? If no, start with Path 1 and invest in testing fundamentals in parallel.
  2. Does your organization require documented review approvals? If yes, use Path 1 with rapid pull requests.
  3. Does your team practice pair programming? If yes, Path 2 may work immediately - pairing is a continuous review process.
  4. How large is your team? Teams of 2-4 can adopt Path 2 more easily. Larger teams may start with Path 1 and transition later.

Both paths are valid. The important thing is daily integration to trunk. Do not spend weeks debating which path to use. Pick one, start today, and adjust.

Essential Supporting Practices

Trunk-based development does not work in isolation. These supporting practices make daily integration safe and sustainable.

Feature Flags

When you integrate to trunk daily, incomplete features will exist on trunk. Feature flags let you merge code that is not yet ready for users.

# Simple feature flag example
if feature_flags.is_enabled("new-checkout-flow", user):
    return new_checkout(cart)
else:
    return legacy_checkout(cart)

Rules for feature flags in TBD:

  • Use flags to decouple deployment from release
  • Remove flags within days or weeks - they are temporary by design
  • Keep flag logic simple; avoid nested or dependent flags
  • Test both flag states in your automated test suite

Feature flags are covered in more depth in Phase 3: Optimize.

Commit Small, Commit Often

Each commit should be a small, coherent change that leaves trunk in a working state. If you are committing once a day in a large batch, you are not getting the benefit of TBD.

Guidelines:

  • Each commit should be independently deployable
  • A commit should represent a single logical change
  • If you cannot describe the change in one sentence, it is too big
  • Target multiple commits per day, not one large commit at end of day

Test-Driven Development (TDD) and ATDD

TDD provides the safety net that makes frequent integration sustainable. When every change is accompanied by tests, you can integrate confidently.

  • TDD: Write the test before the code. Red, green, refactor.
  • ATDD (Acceptance Test-Driven Development): Write acceptance criteria as executable tests before implementation.

Both practices ensure that your test suite grows with your code and that trunk remains releasable.

Getting Started: A Tactical Guide

Step 1: Shorten Your Branches (Week 1)

If your team currently uses long-lived feature branches, start by shortening their lifespan.

Current State Target
Branches live for weeks Branches live for < 1 week
Merge once per sprint Merge multiple times per week
Large merge conflicts are normal Conflicts are rare and small

Action: Set a team agreement that no branch lives longer than 2 days. Track branch age as a metric.

Step 2: Integrate Daily (Week 2-3)

Tighten the window from 2 days to 1 day.

Action:

  • Every developer merges to trunk at least once per day, every day they write code
  • If work is not complete, use a feature flag or other technique to merge safely
  • Track integration frequency as your primary metric

Step 3: Ensure Trunk Stays Green (Week 2-3)

Daily integration is only useful if trunk remains in a releasable state.

Action:

  • Run your test suite on every merge to trunk
  • If the build breaks, fixing it becomes the team’s top priority
  • Establish a working agreement: “broken build = stop the line” (see Working Agreements)

Step 4: Remove the Safety Net of Long Branches (Week 4+)

Once the team is integrating daily with a green trunk, eliminate the option of long-lived branches.

Action:

  • Configure branch protection rules to warn or block branches older than 24 hours
  • Remove any workflow that depends on long-lived branches (e.g., “dev” or “release” branches)
  • Celebrate the transition - this is a significant shift in how the team works

Key Pitfalls

1. “We integrate daily, but we also keep our feature branches”

If you are merging to trunk daily but also maintaining a long-lived feature branch, you are not doing TBD. The feature branch will diverge, and merging it later will be painful. The integration to trunk must be the only integration point.

2. “Our builds are too slow for frequent integration”

If your CI pipeline takes 30 minutes, integrating multiple times a day feels impractical. This is a real constraint - address it by investing in build automation and parallelizing your test suite. Target a build time under 10 minutes.

3. “We can’t integrate incomplete features to trunk”

Yes, you can. Use feature flags to hide incomplete work from users. The code exists on trunk, but the feature is not active. This is a standard practice at every company that practices CD.

4. “Code review takes too long for daily integration”

If pull request reviews take 2 days, daily integration is impossible. The solution is to change how you review: pair programming provides continuous review, mob programming reviews in real time, and small changes can be reviewed asynchronously in minutes. See Code Review for specific techniques.

5. “What if someone pushes a bad commit to trunk?”

This is why you have automated tests, CI, and the “broken build = top priority” agreement. Bad commits will happen. The question is how fast you detect and fix them. With TBD and CI, the answer is minutes, not days.

Measuring Success

Track these metrics to verify your TBD adoption:

Metric Target Why It Matters
Integration frequency At least 1 per developer per day Confirms daily integration is happening
Branch age < 24 hours Catches long-lived branches
Build duration < 10 minutes Enables frequent integration without frustration
Merge conflict frequency Decreasing over time Confirms small changes reduce conflicts

Further Reading

This page covers the essentials for Phase 1 of your migration. For detailed guidance on specific scenarios, see the full source material:

Next Step

Once your team is integrating to trunk daily, build the test suite that makes that integration trustworthy. Continue to Testing Fundamentals.


This content is adapted from MinimumCD.org, licensed under CC BY 4.0.

2 - Testing Fundamentals

Build a test architecture that gives your pipeline the confidence to deploy any change, even when dependencies outside your control are unavailable.

Phase 1 - Foundations | Adapted from Dojo Consortium

Before you can trust your pipeline, you need a test suite that is fast, deterministic, and catches real defects. But a collection of tests is not enough. You need a test architecture - a deliberate structure where different types of tests work together to give you the confidence to deploy every change, regardless of whether external systems are up, slow, or behaving unexpectedly.

Why Testing Is a Foundation

Continuous delivery requires that trunk always be releasable. The only way to know trunk is releasable is to test it - automatically, on every change. Without a reliable test suite, daily integration is just daily risk.

In many organizations, testing is the single biggest obstacle to CD adoption. Not because teams lack tests, but because the tests they have are slow, flaky, poorly structured, and - most critically - unable to give the pipeline a reliable answer to the question: is this change safe to deploy?

Testing Goals for CD

Your test suite must meet these criteria before it can support continuous delivery:

Goal Target Why
Fast Full suite completes in under 10 minutes Developers need feedback before context-switching
Deterministic Same code always produces the same test result Flaky tests destroy trust and get ignored
Catches real bugs Tests fail when behavior is wrong, not when implementation changes Brittle tests create noise, not signal
Independent of external systems Pipeline can determine deployability without any dependency being available Your ability to deploy cannot be held hostage by someone else’s outage

If your test suite does not meet these criteria today, improving it is your highest-priority foundation work.

Beyond the Test Pyramid

The test pyramid - many unit tests at the base, fewer integration tests in the middle, a handful of end-to-end tests at the top - has been the dominant mental model for test strategy since Mike Cohn introduced it. The core insight is sound: push testing as low as possible. Lower-level tests are faster, more deterministic, and cheaper to maintain. Higher-level tests are slower, more brittle, and more expensive.

But as a prescriptive model, the pyramid is overly simplistic. Teams that treat it as a rigid ratio end up in unproductive debates about whether they have “too many” integration tests or “not enough” unit tests. The shape of your test distribution matters far less than whether your tests, taken together, give you the confidence to deploy.

What actually matters

The pyramid’s principle - write tests with different granularity - remains correct. But for CD, the question is not “do we have the right pyramid shape?” The question is:

Can our pipeline determine that a change is safe to deploy without depending on any system we do not control?

This reframes the testing conversation. Instead of counting tests by type and trying to match a diagram, you design a test architecture where:

  1. Fast, deterministic tests catch the vast majority of defects and run on every commit. These tests use test doubles for anything outside the team’s control. They give you a reliable go/no-go signal in minutes.

  2. Contract tests verify that your test doubles still match reality. They run asynchronously and catch drift between your assumptions and the real world - without blocking your pipeline.

  3. A small number of non-deterministic tests validate that the fully integrated system works. These run post-deployment and provide monitoring, not gating.

This structure means your pipeline can confidently say “yes, deploy this” even if a downstream API is having an outage, a third-party service is slow, or a partner team hasn’t deployed their latest changes yet. Your ability to deliver is decoupled from the reliability of systems you do not own.

The anti-pattern: the ice cream cone

Most teams that struggle with CD have an inverted test distribution - too many slow, expensive end-to-end tests and too few fast, focused tests.

        ┌─────────────────────────┐
        │    Manual Testing       │  ← Most testing happens here
        ├─────────────────────────┤
        │   End-to-End Tests      │  ← Slow, flaky, expensive
        ├─────────────────────────┤
        │  Integration Tests      │  ← Some, but not enough
        ├───────────┤
        │Unit Tests │              ← Too few
        └───────────┘

The ice cream cone makes CD impossible. Manual testing gates block every release. End-to-end tests take hours, fail randomly, and depend on external systems being healthy. The pipeline cannot give a fast, reliable answer about deployability, so deployments become high-ceremony events.

Test Architecture for the CD Pipeline

A test architecture is the deliberate structure of how different test types work together across your pipeline to give you deployment confidence. Each layer has a specific role, and the layers reinforce each other.

Layer 1: Unit tests - verify logic in isolation

Unit tests exercise individual functions, methods, or components with all external dependencies replaced by test doubles. They are the fastest and most deterministic tests you have.

Role in CD: Catch logic errors, regressions, and edge cases instantly. Provide the tightest feedback loop - developers should see results in seconds while coding.

What they cannot do: Verify that components work together, that your code correctly calls external services, or that the system behaves correctly as a whole.

See Unit Tests for detailed guidance.

Layer 2: Integration tests - verify boundaries

Integration tests verify that components interact correctly at their boundaries: database queries return the expected data, HTTP clients serialize requests correctly, message producers format messages as expected. External systems are replaced with test doubles, but internal collaborators are real.

Role in CD: Catch the bugs that unit tests miss - mismatched interfaces, serialization errors, query bugs. These tests are fast enough to run on every commit but realistic enough to catch real integration failures.

What they cannot do: Verify that the system works end-to-end from a user’s perspective, or that your assumptions about external services are still correct.

The line between unit tests and integration tests is often debated. As Ham Vocke writes in The Practical Test Pyramid: the naming matters less than the discipline. The key question is whether the test is fast, deterministic, and tests something your unit tests cannot. If yes, it belongs here.

See Integration Tests for detailed guidance.

Layer 3: Functional tests - verify your system works in isolation

Functional tests (also called component tests) exercise your entire sub-system - your service, your application - from the outside, as a user or consumer would interact with it. All external dependencies are replaced with test doubles. The test boots your application, sends real HTTP requests or simulates real user interactions, and verifies the responses.

Role in CD: This is the layer that proves your system works as a complete unit, independent of everything else. Functional tests answer: “if we deploy this service right now, will it behave correctly for every interaction that is within our control?” Because all external dependencies are stubbed, these tests are deterministic and fast. They can run on every commit.

Why this layer is critical for CD: Functional tests are what allow you to deploy with confidence even when dependencies outside your control are unavailable. Your test doubles simulate the expected behavior of those dependencies. As long as your doubles are accurate (which is what contract tests verify), your functional tests prove your system handles those interactions correctly.

See Functional Tests for detailed guidance.

Layer 4: Contract tests - verify your assumptions about others

Contract tests validate that the test doubles you use in layers 1-3 still accurately represent the real external systems. They run against live dependencies and check contract format - response structures, field names, types, and status codes - not specific data values.

Role in CD: Contract tests are the bridge between your fast, deterministic test suite and the real world. Without them, your test doubles can silently drift from reality, and your functional tests provide false confidence. With them, you know that the assumptions baked into your test doubles are still correct.

Consumer-driven contracts take this further: the consumer of an API publishes expectations (using tools like Pact), and the provider runs those expectations as part of their build. Both teams know immediately when a change would break the contract.

Contract tests are non-deterministic because they hit live systems. They should not block your pipeline. Instead, failures trigger a review: has the contract changed, or was it a transient network issue? If the contract has changed, update your test doubles and re-verify.

See Contract Tests for detailed guidance.

Layer 5: End-to-end tests - verify the integrated system post-deployment

End-to-end tests validate complete user journeys through the fully integrated system with no test doubles. They run against real services, real databases, and real third-party integrations.

Role in CD: E2E tests are monitoring, not gating. They run after deployment to verify that the integrated system works. A small suite of smoke tests can run immediately post-deployment to catch gross integration failures. Broader E2E suites run on a schedule.

Why E2E tests should not gate your pipeline: E2E tests are non-deterministic. They fail for reasons unrelated to your change - network blips, third-party outages, shared environment instability. If your pipeline depends on E2E tests passing before you can deploy, your deployment frequency is limited by the reliability of every system in the chain. This is the opposite of the independence CD requires.

See End-to-End Tests for detailed guidance.

How the layers work together

Pipeline stage    Test layer              Deterministic?   Blocks deploy?
─────────────────────────────────────────────────────────────────────────
On every commit   Unit tests              Yes              Yes
                  Integration tests       Yes              Yes
                  Functional tests        Yes              Yes

Asynchronous      Contract tests          No               No (triggers review)

Post-deployment   E2E smoke tests         No               Triggers rollback if critical
                  Synthetic monitoring    No               Triggers alerts

The critical insight: everything that blocks deployment is deterministic and under your control. Everything that involves external systems runs asynchronously or post-deployment. This is what gives you the independence to deploy any time, regardless of the state of the world around you.

Week 1 Action Plan

If your test suite is not yet ready to support CD, use this focused action plan to make immediate progress.

Day 1-2: Audit your current test suite

Assess where you stand before making changes.

Actions:

  • Run your full test suite 3 times. Note total duration and any tests that pass intermittently (flaky tests).
  • Count tests by type: unit, integration, functional, end-to-end.
  • Identify tests that require external dependencies (databases, APIs, file systems) to run.
  • Record your baseline: total test count, pass rate, duration, flaky test count.
  • Map each test type to a pipeline stage. Which tests gate deployment? Which run asynchronously? Which tests couple your deployment to external systems?

Output: A clear picture of your test distribution and the specific problems to address.

Day 2-3: Fix or remove flaky tests

Flaky tests are worse than no tests. They train developers to ignore failures, which means real failures also get ignored.

Actions:

  • Quarantine all flaky tests immediately. Move them to a separate suite that does not block the build.
  • For each quarantined test, decide: fix it (if the behavior it tests matters) or delete it (if it does not).
  • Common causes of flakiness: timing dependencies, shared mutable state, reliance on external services, test order dependencies.
  • Target: zero flaky tests in your main test suite by end of week.

Day 3-4: Decouple your pipeline from external dependencies

This is the highest-leverage change for CD. Identify every test that calls a real external service and replace that dependency with a test double.

Actions:

  • List every external service your tests depend on: databases, APIs, message queues, file storage, third-party services.
  • For each dependency, decide the right test double approach:
    • In-memory fakes for databases (e.g., SQLite, H2, testcontainers with local instances).
    • HTTP stubs for external APIs (e.g., WireMock, nock, MSW).
    • Fakes for message queues, email services, and other infrastructure.
  • Replace the dependencies in your unit, integration, and functional tests.
  • Move the original tests that hit real services into a separate suite - these become your starting contract tests or E2E smoke tests.

Output: A test suite where everything that blocks the build is deterministic and runs without network access to external systems.

Day 4-5: Add functional tests for critical paths

If you don’t have functional tests (component tests) that exercise your whole service in isolation, start with the most critical paths.

Actions:

  • Identify the 3-5 most critical user journeys or API endpoints in your application.
  • Write a functional test for each: boot the application, stub external dependencies, send a real request or simulate a real user action, verify the response.
  • Each functional test should prove that the feature works correctly assuming external dependencies behave as expected (which your test doubles encode).
  • Run these in CI on every commit.

Day 5: Set up contract tests for your most important dependency

Pick the external dependency that changes most frequently or has caused the most production issues. Set up a contract test for it.

Actions:

  • Write a contract test that validates the response structure (types, required fields, status codes) of the dependency’s API.
  • Run it on a schedule (e.g., every hour or daily), not on every commit.
  • When it fails, update your test doubles to match the new reality and re-verify your functional tests.
  • If the dependency is owned by another team in your organization, explore consumer-driven contracts with a tool like Pact.

Test-Driven Development (TDD)

TDD is the practice of writing the test before the code. It is the most effective way to build a reliable test suite because it ensures every piece of behavior has a corresponding test.

The TDD cycle:

  1. Red: Write a failing test that describes the behavior you want.
  2. Green: Write the minimum code to make the test pass.
  3. Refactor: Improve the code without changing the behavior. The test ensures you do not break anything.

Why TDD supports CD:

  • Every change is automatically covered by a test
  • The test suite grows proportionally with the codebase
  • Tests describe behavior, not implementation, making them more resilient to refactoring
  • Developers get immediate feedback on whether their change works

TDD is not mandatory for CD, but teams that practice TDD consistently have significantly faster and more reliable test suites.

Getting started with TDD

If your team is new to TDD, start small:

  1. Pick one new feature or bug fix this week.
  2. Write the test first, watch it fail.
  3. Write the code to make it pass.
  4. Refactor.
  5. Repeat for the next change.

Do not try to retroactively TDD your entire codebase. Apply TDD to new code and to any code you modify.

Testing Matrix

Use this reference to decide what type of test to write and where it runs in your pipeline.

What You Need to Verify Test Type Speed Deterministic? Blocks Deploy? See Also
A function or method behaves correctly Unit Milliseconds Yes Yes
Components interact correctly at a boundary Integration Milliseconds to seconds Yes Yes
Your whole service works in isolation Functional Seconds Yes Yes
Your test doubles match reality Contract Seconds No No
A critical user journey works end-to-end E2E Minutes No No
Code quality, security, and style compliance Static Analysis Seconds Yes Yes

Best Practices Summary

Do

  • Run tests on every commit. If tests do not run automatically, they will be skipped.
  • Keep the deterministic suite under 10 minutes. If it is slower, developers will stop running it locally.
  • Fix broken tests immediately. A broken test is equivalent to a broken build.
  • Delete tests that do not provide value. A test that never fails and tests trivial behavior is maintenance cost with no benefit.
  • Test behavior, not implementation. Tests should verify what the code does, not how it does it. As Ham Vocke advises: “if I enter values x and y, will the result be z?” - not the sequence of internal calls that produce z.
  • Use test doubles for external dependencies. Your deterministic tests should run without network access to external systems.
  • Validate test doubles with contract tests. Test doubles that drift from reality give false confidence.
  • Treat test code as production code. Give it the same care, review, and refactoring attention.

Do Not

  • Do not tolerate flaky tests. Quarantine or delete them immediately.
  • Do not gate your pipeline on non-deterministic tests. E2E and contract test failures should trigger review or alerts, not block deployment.
  • Do not couple your deployment to external system availability. If a third-party API being down prevents you from deploying, your test architecture has a critical gap.
  • Do not write tests after the fact as a checkbox exercise. Tests written without understanding the behavior they verify add noise, not value.
  • Do not test private methods directly. Test the public interface; private methods are tested indirectly.
  • Do not share mutable state between tests. Each test should set up and tear down its own state.
  • Do not use sleep/wait for timing-dependent tests. Use explicit waits, polling, or event-driven assertions.
  • Do not require a running database or external service for unit tests. That makes them integration tests - which is fine, but categorize them correctly.

Using Tests to Find and Eliminate Defect Sources

A test suite that catches bugs is good. A test suite that helps you stop producing those bugs is transformational. Every test failure is evidence of a defect, and every defect has a source. If you treat test failures only as things to fix, you are doing rework. If you treat them as diagnostic data about where your process breaks down, you can make systemic changes that prevent entire categories of defects from occurring.

This is the difference between a team that writes more tests to catch more bugs and a team that changes how it works so that fewer bugs are created in the first place.

Trace every defect to its origin

When a test catches a defect - or worse, when a defect escapes to production - ask: where was this defect introduced, and what would have prevented it from being created?

Defects do not originate randomly. They cluster around specific causes, and each cause has a systemic fix:

Where Defects Originate Example Defects Detection Method Systemic Fix
Requirements Building the right thing wrong, or the wrong thing right UX analytics, task completion tracking, A/B testing Acceptance criteria as user outcomes, not implementation tasks. Three Amigos sessions before work starts. Example mapping to surface edge cases before coding begins.
Missing domain knowledge Business rules encoded incorrectly, implicit assumptions Magic number detection, knowledge-concentration metrics Embed domain rules in code using ubiquitous language (DDD). Pair programming to spread knowledge. Living documentation generated from code.
Integration boundaries Interface mismatches, wrong assumptions about upstream behavior Consumer-driven contract tests, schema validation Contract tests mandatory per boundary. API-first design. Document behavioral contracts, not just data schemas.
Untested edge cases Null handling, boundary values, error paths Mutation testing, branch coverage thresholds, property-based testing Require a test for every bug fix. Adopt property-based testing for logic with many input permutations. Boundary value analysis as a standard practice.
Unintended side effects Change to module A breaks module B Mutation testing, change impact analysis Small focused commits. Trunk-based development (integrate daily so side effects surface immediately). Modular design with clear boundaries.
Accumulated complexity Defects cluster in the most complex, most-changed files Complexity trends, duplication scoring, dependency cycle detection Refactoring as part of every story, not deferred to a “tech debt sprint.” Dedicated complexity budget.
Long-lived branches Merge conflicts, integration failures, stale code Branch age alerts, merge conflict frequency Trunk-based development. Merge at least daily. CI rejects stale branches.
Configuration drift Works in staging, fails in production IaC drift detection, environment comparison, smoke tests All infrastructure as code. Same provisioning for every environment. Immutable infrastructure.
Data assumptions Null pointer exceptions, schema migration failures Null safety static analysis, schema compatibility checks, migration dry-runs Enforce null-safe types. Expand-then-contract for all schema changes.

Build a defect feedback loop

Knowing the categories is not enough. You need a process that systematically connects test failures to root causes and root causes to systemic fixes.

Step 1: Classify every defect. When a test fails or a bug is reported, tag it with its origin category from the table above. This takes seconds and builds a dataset over time.

Step 2: Look for patterns. Monthly (or during retrospectives), review the defect classifications. Which categories appear most often? That is where your process is weakest.

Step 3: Apply the systemic fix, not just the local fix. When you fix a bug, also ask: what systemic change would prevent this entire category of bug? If most defects come from integration boundaries, the fix is not “write more integration tests” - it is “make contract tests mandatory for every new boundary.” If most defects come from untested edge cases, the fix is not “increase code coverage” - it is “adopt property-based testing as a standard practice.”

Step 4: Measure whether the fix works. Track defect counts by category over time. If you applied a systemic fix for integration boundary defects and the count does not drop, the fix is not working and you need a different approach.

The test-for-every-bug-fix rule

One of the most effective systemic practices: every bug fix must include a test that reproduces the bug before the fix and passes after. This is non-negotiable for CD because:

  • It proves the fix actually addresses the defect (not just the symptom).
  • It prevents the same defect from recurring.
  • It builds test coverage exactly where the codebase is weakest - the places where bugs actually occur.
  • Over time, it shifts your test suite from “tests we thought to write” to “tests that cover real failure modes.”

Advanced detection techniques

As your test architecture matures, add techniques that find defects humans overlook:

Technique What It Finds When to Adopt
Mutation testing (Stryker, PIT) Tests that pass but do not actually verify behavior - your test suite’s blind spots When basic coverage is in place but defect escape rate is not dropping
Property-based testing Edge cases and boundary conditions across large input spaces that example-based tests miss When defects cluster around unexpected input combinations
Chaos engineering Failure modes in distributed systems - what happens when a dependency is slow, returns errors, or disappears When you have functional tests and contract tests in place and need confidence in failure handling
Static analysis and linting Null safety violations, type errors, security vulnerabilities, dead code From day one - these are cheap and fast

For more examples of mapping defect origins to detection methods and systemic corrections, see the CD Defect Detection and Remediation Patterns.

Measuring Success

Metric Target Why It Matters
Deterministic suite duration < 10 minutes Enables fast feedback loops
Flaky test count 0 in pipeline-gating suite Maintains trust in test results
External dependencies in gating tests 0 Ensures deployment independence
Test coverage trend Increasing Confirms new code is being tested
Defect escape rate Decreasing Confirms tests catch real bugs
Contract test freshness All passing within last 24 hours Confirms test doubles are current

Next Step

With a reliable test suite in place, automate your build process so that building, testing, and packaging happens with a single command. Continue to Build Automation.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0. Additional concepts drawn from Ham Vocke, The Practical Test Pyramid, and Toby Clemson, Testing Strategies in a Microservice Architecture.

3 - Build Automation

Automate your build process so a single command builds, tests, and packages your application.

Phase 1 - Foundations | Adapted from Dojo Consortium

Build automation is the mechanism that turns trunk-based development and testing into a continuous integration loop. If you cannot build, test, and package your application with a single command, you cannot automate your pipeline. This page covers the practices that make your build reproducible, fast, and trustworthy.

What Build Automation Means

Build automation is the practice of scripting every step required to go from source code to a deployable artifact. A single command - or a single CI trigger - should execute the entire sequence:

  1. Compile the source code (if applicable)
  2. Run all automated tests
  3. Package the application into a deployable artifact (container image, binary, archive)
  4. Report the result (pass or fail, with details)

No manual steps. No “run this script, then do that.” No tribal knowledge about which flags to set or which order to run things. One command, every time, same result.

The Litmus Test

Ask yourself: “Can a new team member clone the repository and produce a deployable artifact with a single command within 15 minutes?”

If the answer is no, your build is not fully automated.

Why Build Automation Matters for CD

CD Requirement How Build Automation Supports It
Reproducibility The same commit always produces the same artifact, on any machine
Speed Automated builds can be optimized, cached, and parallelized
Confidence If the build passes, the artifact is trustworthy
Developer experience Developers run the same build locally that CI runs, eliminating “works on my machine”
Pipeline foundation The CI/CD pipeline is just the build running automatically on every commit

Without build automation, every other practice in this guide breaks down. You cannot have continuous integration if the build requires manual intervention. You cannot have a deterministic pipeline if the build produces different results depending on who runs it.

Key Practices

1. Version-Controlled Build Scripts

Your build configuration lives in the same repository as your code. It is versioned, reviewed, and tested alongside the application.

What belongs in version control:

  • Build scripts (Makefile, build.gradle, package.json scripts, Dockerfile)
  • Dependency manifests (requirements.txt, go.mod, pom.xml, package-lock.json)
  • CI/CD pipeline definitions (.github/workflows, .gitlab-ci.yml, Jenkinsfile)
  • Environment setup scripts (docker-compose.yml for local development)

What does not belong in version control:

  • Secrets and credentials (use secret management tools)
  • Environment-specific configuration values (use environment variables or config management)
  • Generated artifacts (build outputs, compiled binaries)

Anti-pattern: Build instructions that exist only in a wiki, a Confluence page, or one developer’s head. If the build steps are not in the repository, they will drift from reality.

2. Dependency Management

All dependencies must be declared explicitly and resolved deterministically.

Practices:

  • Lock files: Use lock files (package-lock.json, Pipfile.lock, go.sum) to pin exact dependency versions. Check lock files into version control.
  • Reproducible resolution: Running the dependency install twice should produce identical results.
  • No undeclared dependencies: Your build should not rely on tools or libraries that happen to be installed on the build machine. If you need it, declare it.
  • Dependency scanning: Automate vulnerability scanning of dependencies as part of the build. Do not wait for a separate security review.

Anti-pattern: “It builds on Jenkins because Jenkins has Java 11 installed, but the Dockerfile uses Java 17.” The build must declare and control its own runtime.

3. Build Caching

Fast builds keep developers in flow. Caching is the primary mechanism for build speed.

What to cache:

  • Dependencies: Download once, reuse across builds. Most build tools (npm, Maven, Gradle, pip) support a local cache.
  • Compilation outputs: Incremental compilation avoids rebuilding unchanged modules.
  • Docker layers: Structure your Dockerfile so that rarely-changing layers (OS, dependencies) are cached and only the application code layer is rebuilt.
  • Test fixtures: Prebuilt test data or container images used by tests.

Guidelines:

  • Cache aggressively for local development and CI
  • Invalidate caches when dependencies or build configuration change
  • Do not cache test results - tests must always run

4. Single Build Script Entry Point

Developers, CI, and CD should all use the same entry point.

# Example: Makefile as the single entry point

.PHONY: build test package all

all: build test package

build:
	./gradlew compileJava

test:
	./gradlew test

package:
	docker build -t myapp:$(GIT_SHA) .

clean:
	./gradlew clean
	docker rmi myapp:$(GIT_SHA) || true

The CI server runs make all. A developer runs make all. The result is the same. There is no separate “CI build script” that diverges from what developers run locally.

5. Artifact Versioning

Every build artifact must be traceable to the exact commit that produced it.

Practices:

  • Tag artifacts with the Git commit SHA or a build number derived from it
  • Store build metadata (commit, branch, timestamp, builder) in the artifact or alongside it
  • Never overwrite an existing artifact - if the version exists, the artifact is immutable

This becomes critical in Phase 2 when you establish immutable artifact practices.

CI Server Setup Basics

The CI server is the mechanism that runs your build automatically. In Phase 1, the setup is straightforward:

What the CI Server Does

  1. Watches the trunk for new commits
  2. Runs the build (the same command a developer would run locally)
  3. Reports the result (pass/fail, test results, build duration)
  4. Notifies the team if the build fails

Minimum CI Configuration

Regardless of which CI tool you use (GitHub Actions, GitLab CI, Jenkins, CircleCI), the configuration follows the same pattern:

# Conceptual CI configuration (adapt to your tool)
trigger:
  branch: main  # Run on every commit to trunk

steps:
  - checkout: source code
  - install: dependencies
  - run: build
  - run: tests
  - run: package
  - report: test results and build status

CI Principles for Phase 1

  • Run on every commit. Not nightly, not weekly, not “when someone remembers.” Every commit to trunk triggers a build.
  • Keep the build green. A failing build is the team’s top priority. Work stops until trunk is green again. (See Working Agreements.)
  • Run the same build everywhere. The CI server runs the same script as local development. No CI-only steps that developers cannot reproduce.
  • Fail fast. Run the fastest checks first (compilation, unit tests) before the slower ones (integration tests, packaging).

Build Time Targets

Build speed directly affects developer productivity and integration frequency. If the build takes 30 minutes, developers will not integrate multiple times per day.

Build Phase Target Rationale
Compilation < 1 minute Developers need instant feedback on syntax and type errors
Unit tests < 3 minutes Fast enough to run before every commit
Integration tests < 5 minutes Must complete before the developer context-switches
Full build (compile + test + package) < 10 minutes The outer bound for fast feedback

If Your Build Is Too Slow

Slow builds are a common constraint that blocks CD adoption. Address them systematically:

  1. Profile the build. Identify which steps take the most time. Optimize the bottleneck, not everything.
  2. Parallelize tests. Most test frameworks support parallel execution. Run independent test suites concurrently.
  3. Use build caching. Avoid recompiling or re-downloading unchanged dependencies.
  4. Split the build. Run fast checks (lint, compile, unit tests) as a “fast feedback” stage. Run slower checks (integration tests, security scans) as a second stage.
  5. Upgrade build hardware. Sometimes the fastest optimization is more CPU and RAM.

The target is under 10 minutes for the feedback loop that developers use on every commit. Longer-running validation (E2E tests, performance tests) can run in a separate stage.

Common Anti-Patterns

Manual Build Steps

Symptom: The build process includes steps like “open this tool and click Run” or “SSH into the build server and execute this script.”

Problem: Manual steps are error-prone, slow, and cannot be parallelized or cached. They are the single biggest obstacle to build automation.

Fix: Script every step. If a human must perform the step today, write a script that performs it tomorrow.

Environment-Specific Builds

Symptom: The build produces different artifacts for different environments (dev, staging, production). Or the build only works on specific machines because of pre-installed tools.

Problem: Environment-specific builds mean you are not testing the same artifact you deploy. Bugs that appear in production but not in staging become impossible to diagnose.

Fix: Build one artifact and configure it per environment at deployment time. The artifact is immutable; the configuration is external. (See Application Config in Phase 2.)

Build Scripts That Only Run in CI

Symptom: The CI pipeline has build steps that developers cannot run locally. Local development uses a different build process.

Problem: Developers cannot reproduce CI failures locally, leading to slow debugging cycles and “push and pray” development.

Fix: Use a single build entry point (Makefile, build script) that both CI and developers use. CI configuration should only add triggers and notifications, not build logic.

Missing Dependency Pinning

Symptom: Builds break randomly because a dependency released a new version overnight.

Problem: Without pinned dependencies, the build is non-deterministic. The same code can produce different results on different days.

Fix: Use lock files. Pin all dependency versions. Update dependencies intentionally, not accidentally.

Long Build Queues

Symptom: Developers commit to trunk, but the build does not run for 20 minutes because the CI server is processing a queue.

Problem: Delayed feedback defeats the purpose of CI. If developers do not see the result of their commit for 30 minutes, they have already moved on.

Fix: Ensure your CI infrastructure can handle your team’s commit frequency. Use parallel build agents. Prioritize builds on the main branch.

Measuring Success

Metric Target Why It Matters
Build duration < 10 minutes Enables fast feedback and frequent integration
Build success rate > 95% Indicates reliable, reproducible builds
Time from commit to build result < 15 minutes (including queue time) Measures the full feedback loop
Developer ability to build locally 100% of team Confirms the build is portable and documented

Next Step

With build automation in place, you can build, test, and package your application reliably. The next foundation is ensuring that the work you integrate daily is small enough to be safe. Continue to Work Decomposition.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

4 - Work Decomposition

Break features into small, deliverable increments that can be completed in 2 days or less.

Phase 1 - Foundations | Adapted from Dojo Consortium

Trunk-based development requires daily integration, and daily integration requires small work. If a feature takes two weeks to build, you cannot integrate it daily without decomposing it first. This page covers the techniques for breaking work into small, deliverable increments that flow through your pipeline continuously.

Why Small Work Matters for CD

Continuous delivery depends on a simple equation: small changes, integrated frequently, are safer than large changes integrated rarely.

Every practice in Phase 1 reinforces this:

  • Trunk-based development requires that you integrate at least daily. You cannot integrate a two-week feature daily unless you decompose it.
  • Testing fundamentals work best when each change is small enough to test thoroughly.
  • Code review is fast when the change is small. A 50-line change can be reviewed in minutes. A 2,000-line change takes hours - if it gets reviewed at all.

The data supports this. The DORA research consistently shows that smaller batch sizes correlate with higher delivery performance. Small changes have:

  • Lower risk: If a small change breaks something, the blast radius is limited, and the cause is obvious.
  • Faster feedback: A small change gets through the pipeline quickly. You learn whether it works today, not next week.
  • Easier rollback: Rolling back a 50-line change is straightforward. Rolling back a 2,000-line change often requires a new deployment.
  • Better flow: Small work items move through the system predictably. Large work items block queues and create bottlenecks.

The 2-Day Rule

If a work item takes longer than 2 days to complete, it is too big.

This is not arbitrary. Two days gives you at least one integration to trunk per day (the minimum for TBD) and allows for the natural rhythm of development: plan, implement, test, integrate, move on.

When a developer says “this will take a week,” the answer is not “go faster.” The answer is “break it into smaller pieces.”

What “Complete” Means

A work item is complete when it is:

  • Integrated to trunk
  • All tests pass
  • The change is deployable (even if the feature is not yet user-visible)
  • It meets the Definition of Done

If a story requires a feature flag to hide incomplete user-facing behavior, that is fine. The code is still integrated, tested, and deployable.

Story Slicing Techniques

Story slicing is the practice of breaking user stories into the smallest possible increments that still deliver value or make progress toward delivering value.

The INVEST Criteria

Good stories follow INVEST:

Criterion Meaning Why It Matters for CD
Independent Can be developed and deployed without waiting for other stories Enables parallel work and avoids blocking
Negotiable Details can be discussed and adjusted Allows the team to find the smallest valuable slice
Valuable Delivers something meaningful to the user or the system Prevents “technical stories” that do not move the product forward
Estimable Small enough that the team can reasonably estimate it Large stories are unestimable because they hide unknowns
Small Completable within 2 days Enables daily integration and fast feedback
Testable Has clear acceptance criteria that can be automated Supports the testing foundation

Vertical Slicing

The most important slicing technique for CD is vertical slicing: cutting through all layers of the application to deliver a thin but complete slice of functionality.

Vertical slice (correct):

“As a user, I can log in with my email and password.”

This slice touches the UI (login form), the API (authentication endpoint), and the database (user lookup). It is deployable and testable end-to-end.

Horizontal slice (anti-pattern):

“Build the database schema for user accounts.” “Build the authentication API.” “Build the login form UI.”

Each horizontal slice is incomplete on its own. None is deployable. None is testable end-to-end. They create dependencies between work items and block flow.

Slicing Strategies

When a story feels too big, apply one of these strategies:

Strategy How It Works Example
By workflow step Implement one step of a multi-step process “User can add items to cart” (before “user can checkout”)
By business rule Implement one rule at a time “Orders over $100 get free shipping” (before “orders ship to international addresses”)
By data variation Handle one data type first “Support credit card payments” (before “support PayPal”)
By operation Implement CRUD operations separately “Create a new customer” (before “edit customer” or “delete customer”)
By performance Get it working first, optimize later “Search returns results” (before “search returns results in under 200ms”)
By platform Support one platform first “Works on desktop web” (before “works on mobile”)
Happy path first Implement the success case first “User completes checkout” (before “user sees error when payment fails”)

Example: Decomposing a Feature

Original story (too big):

“As a user, I can manage my profile including name, email, avatar, password, notification preferences, and two-factor authentication.”

Decomposed into vertical slices:

  1. “User can view their current profile information” (read-only display)
  2. “User can update their name” (simplest edit)
  3. “User can update their email with verification” (adds email flow)
  4. “User can upload an avatar image” (adds file handling)
  5. “User can change their password” (adds security validation)
  6. “User can configure notification preferences” (adds preferences)
  7. “User can enable two-factor authentication” (adds 2FA flow)

Each slice is independently deployable, testable, and completable within 2 days. Each delivers incremental value. The feature is built up over a series of small deliveries rather than one large batch.

BDD as a Decomposition Tool

Behavior-Driven Development (BDD) is not just a testing practice - it is a powerful tool for decomposing work into small, clear increments.

Three Amigos

Before work begins, hold a brief “Three Amigos” session with three perspectives:

  • Business/Product: What should this feature do? What is the expected behavior?
  • Development: How will we build it? What are the technical considerations?
  • Testing: How will we verify it? What are the edge cases?

This 15-30 minute conversation accomplishes two things:

  1. Shared understanding: Everyone agrees on what “done” looks like before work begins.
  2. Natural decomposition: Discussing specific scenarios reveals natural slice boundaries.

Specification by Example

Write acceptance criteria as concrete examples, not abstract requirements.

Abstract (hard to slice):

“The system should validate user input.”

Concrete (easy to slice):

  • Given an email field, when the user enters “not-an-email”, then the form shows “Please enter a valid email address.”
  • Given a password field, when the user enters fewer than 8 characters, then the form shows “Password must be at least 8 characters.”
  • Given a name field, when the user leaves it blank, then the form shows “Name is required.”

Each concrete example can become its own story or task. The scope is clear, the acceptance criteria are testable, and the work is small.

Given-When-Then Format

Structure acceptance criteria in Given-When-Then format to make them executable:

Feature: User login

  Scenario: Successful login with valid credentials
    Given a registered user with email "user@example.com"
    When they enter their correct password and click "Log in"
    Then they are redirected to the dashboard

  Scenario: Failed login with wrong password
    Given a registered user with email "user@example.com"
    When they enter an incorrect password and click "Log in"
    Then they see the message "Invalid email or password"
    And they remain on the login page

Each scenario is a natural unit of work. Implement one scenario at a time, integrate to trunk after each one.

Task Decomposition Within Stories

Even well-sliced stories may contain multiple tasks. Decompose stories into tasks that can be completed and integrated independently.

Example story: “User can update their name”

Tasks:

  1. Add the name field to the profile API endpoint (backend change, integration test)
  2. Add the name field to the profile form (frontend change, unit test)
  3. Connect the form to the API endpoint (integration, E2E test)

Each task results in a commit to trunk. The story is completed through a series of small integrations, not one large merge.

Guidelines for task decomposition:

  • Each task should take hours, not days
  • Each task should leave trunk in a working state after integration
  • Tasks should be ordered so that the simplest changes come first
  • If a task requires a feature flag or stub to be integrated safely, that is fine

Common Anti-Patterns

Horizontal Slicing

Symptom: Stories are organized by architectural layer: “build the database schema,” “build the API,” “build the UI.”

Problem: No individual slice is deployable or testable end-to-end. Integration happens at the end, which is where bugs are found and schedules slip.

Fix: Slice vertically. Every story should touch all the layers needed to deliver a thin slice of complete functionality.

Technical Stories

Symptom: The backlog contains stories like “refactor the database access layer” or “upgrade to React 18” that do not deliver user-visible value.

Problem: Technical work is important, but when it is separated from feature work, it becomes hard to prioritize and easy to defer. It also creates large, risky changes.

Fix: Embed technical improvements in feature stories. Refactor as you go. If a technical change is necessary, tie it to a specific business outcome and keep it small enough to complete in 2 days.

Stories That Are Really Epics

Symptom: A story has 10+ acceptance criteria, or the estimate is “8 points” or “2 weeks.”

Problem: Large stories hide unknowns, resist estimation, and cannot be integrated daily.

Fix: If a story has more than 3-5 acceptance criteria, it is an epic. Break it into smaller stories using the slicing strategies above.

Splitting by Role Instead of by Behavior

Symptom: Separate stories for “frontend developer builds the UI” and “backend developer builds the API.”

Problem: This creates handoff dependencies and delays integration. The feature is not testable until both stories are complete.

Fix: Write stories from the user’s perspective. The same developer (or pair) implements the full vertical slice.

Deferring “Edge Cases” Indefinitely

Symptom: The team builds the happy path and creates a backlog of “handle error case X” stories that never get prioritized.

Problem: Error handling is not optional. Unhandled edge cases become production incidents.

Fix: Include the most important error cases in the initial story decomposition. Use the “happy path first” slicing strategy, but schedule edge case stories immediately after, not “someday.”

Measuring Success

Metric Target Why It Matters
Story cycle time < 2 days from start to trunk Confirms stories are small enough
Development cycle time Decreasing Shows improved flow from smaller work
Stories completed per week Increasing (with same team size) Indicates better decomposition and less rework
Work in progress Decreasing Fewer large stories blocking the pipeline

Next Step

Small, well-decomposed work flows through the system quickly - but only if code review does not become a bottleneck. Continue to Code Review to learn how to keep review fast and effective.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

5 - Code Review

Streamline code review to provide fast feedback without blocking flow.

Phase 1 - Foundations | Adapted from Dojo Consortium

Code review is essential for quality, but it is also the most common bottleneck in teams adopting trunk-based development. If reviews take days, daily integration is impossible. This page covers review techniques that maintain quality while enabling the flow that CD requires.

Why Code Review Matters for CD

Code review serves multiple purposes:

  • Defect detection: A second pair of eyes catches bugs that the author missed.
  • Knowledge sharing: Reviews spread understanding of the codebase across the team.
  • Consistency: Reviews enforce coding standards and architectural patterns.
  • Mentoring: Junior developers learn by having their code reviewed and by reviewing others’ code.

These are real benefits. The challenge is that traditional code review - open a pull request, wait for someone to review it, address comments, wait again - is too slow for CD.

In a CD workflow, code review must happen within minutes or hours, not days. The review is still rigorous, but the process is designed for speed.

The Core Tension: Quality vs. Flow

Traditional teams optimize review for thoroughness: detailed comments, multiple reviewers, extensive back-and-forth. This produces high-quality reviews but blocks flow.

CD teams optimize review for speed without sacrificing the quality that matters. The key insight is that most of the quality benefit of code review comes from small, focused reviews done quickly, not from exhaustive reviews done slowly.

Traditional Review CD-Compatible Review
Review happens after the feature is complete Review happens continuously throughout development
Large diffs (hundreds or thousands of lines) Small diffs (< 200 lines, ideally < 50)
Multiple rounds of feedback and revision One round, or real-time feedback during pairing
Review takes 1-3 days Review takes minutes to a few hours
Review is asynchronous by default Review is synchronous by preference
2+ reviewers required 1 reviewer (or pairing as the review)

Synchronous vs. Asynchronous Review

Synchronous Review (Preferred for CD)

In synchronous review, the reviewer and author are engaged at the same time. Feedback is immediate. Questions are answered in real time. The review is done when the conversation ends.

Methods:

  • Pair programming: Two developers work on the same code at the same time. Review is continuous. There is no separate review step because the code was reviewed as it was written.
  • Mob programming: The entire team (or a subset) works on the same code together. Everyone reviews in real time.
  • Over-the-shoulder review: The author walks the reviewer through the change in person or on a video call. The reviewer asks questions and provides feedback immediately.

Advantages for CD:

  • Zero wait time between “ready for review” and “review complete”
  • Higher bandwidth communication (tone, context, visual cues) catches more issues
  • Immediate resolution of questions - no async back-and-forth
  • Knowledge transfer happens naturally through the shared work

Asynchronous Review (When Necessary)

Sometimes synchronous review is not possible - time zones, schedules, or team preferences may require asynchronous review. This is fine, but it must be fast.

Rules for async review in a CD workflow:

  • Review within 2 hours. If a pull request sits for a day, it blocks integration. Set a team working agreement: “pull requests are reviewed within 2 hours during working hours.”
  • Keep changes small. A 50-line change can be reviewed in 5 minutes. A 500-line change takes an hour and reviewers procrastinate on it.
  • Use draft PRs for early feedback. If you want feedback on an approach before the code is complete, open a draft PR. Do not wait until the change is “perfect.”
  • Avoid back-and-forth. If a comment requires discussion, move to a synchronous channel (call, chat). Async comment threads that go 5 rounds deep are a sign the change is too large or the design was not discussed upfront.

Review Techniques Compatible with TBD

Pair Programming as Review

When two developers pair on a change, the code is reviewed as it is written. There is no separate review step, no pull request waiting for approval, and no delay to integration.

How it works with TBD:

  1. Two developers sit together (physically or via screen share)
  2. They discuss the approach, write the code, and review each other’s decisions in real time
  3. When the change is ready, they commit to trunk together
  4. Both developers are accountable for the quality of the code

When to pair:

  • New or unfamiliar areas of the codebase
  • Changes that affect critical paths
  • When a junior developer is working on a change (pairing doubles as mentoring)
  • Any time the change involves design decisions that benefit from discussion

Pair programming satisfies most organizations’ code review requirements because two developers have actively reviewed and approved the code.

Mob Programming as Review

Mob programming extends pairing to the whole team. One person drives (types), one person navigates (directs), and the rest observe and contribute.

When to mob:

  • Establishing new patterns or architectural decisions
  • Complex changes that benefit from multiple perspectives
  • Onboarding new team members to the codebase
  • Working through particularly difficult problems

Mob programming is intensive but highly effective. Every team member understands the code, the design decisions, and the trade-offs.

Rapid Async Review

For teams that use pull requests, rapid async review adapts the pull request workflow for CD speed.

Practices:

  • Auto-assign reviewers. Do not wait for someone to volunteer. Use tools to automatically assign a reviewer when a PR is opened.
  • Keep PRs small. Target < 200 lines of changed code. Smaller PRs get reviewed faster and more thoroughly.
  • Provide context. Write a clear PR description that explains what the change does, why it is needed, and how to verify it. A good description reduces review time dramatically.
  • Use automated checks. Run linting, formatting, and tests before the human review. The reviewer should focus on logic and design, not style.
  • Approve and merge quickly. If the change looks correct, approve it. Do not hold it for nitpicks. Nitpicks can be addressed in a follow-up commit.

What to Review

Not everything in a code change deserves the same level of scrutiny. Focus reviewer attention where it matters most.

High Priority (Reviewer Should Focus Here)

  • Behavior correctness: Does the code do what it is supposed to do? Are edge cases handled?
  • Security: Does the change introduce vulnerabilities? Are inputs validated? Are secrets handled properly?
  • Clarity: Can another developer understand this code in 6 months? Are names clear? Is the logic straightforward?
  • Test coverage: Are the new behaviors tested? Do the tests verify the right things?
  • API contracts: Do changes to public interfaces maintain backward compatibility? Are they documented?
  • Error handling: What happens when things go wrong? Are errors caught, logged, and surfaced appropriately?

Low Priority (Automate Instead of Reviewing)

  • Code style and formatting: Use automated formatters (Prettier, Black, gofmt). Do not waste reviewer time on indentation and bracket placement.
  • Import ordering: Automate with linting rules.
  • Naming conventions: Enforce with lint rules where possible. Only flag naming in review if it genuinely harms readability.
  • Unused variables or imports: Static analysis tools catch these instantly.
  • Consistent patterns: Where possible, encode patterns in architecture decision records and lint rules rather than relying on reviewers to catch deviations.

Rule of thumb: If a style or convention issue can be caught by a machine, do not ask a human to catch it. Reserve human attention for the things machines cannot evaluate: correctness, design, clarity, and security.

Review Scope for Small Changes

In a CD workflow, most changes are small - tens of lines, not hundreds. This changes the economics of review.

Change Size Expected Review Time Review Depth
< 20 lines 2-5 minutes Quick scan: is it correct? Any security issues?
20-100 lines 5-15 minutes Full review: behavior, tests, clarity
100-200 lines 15-30 minutes Detailed review: design, contracts, edge cases
> 200 lines Consider splitting the change Large changes get superficial reviews

Research consistently shows that reviewer effectiveness drops sharply after 200-400 lines. If you are regularly reviewing changes larger than 200 lines, the problem is not the review process - it is the work decomposition.

Working Agreements for Review SLAs

Establish clear team agreements about review expectations. Without explicit agreements, review latency will drift based on individual habits.

Agreement Target
Response time Review within 2 hours during working hours
Reviewer count 1 reviewer (or pairing as the review)
PR size < 200 lines of changed code
Blocking issues only Only block a merge for correctness, security, or significant design issues
Nitpicks Use a “nit:” prefix. Nitpicks are suggestions, not merge blockers
Stale PRs PRs open for > 24 hours are escalated to the team
Self-review Author reviews their own diff before requesting review

How to Enforce Review SLAs

  • Track review turnaround time. If it consistently exceeds 2 hours, discuss it in retrospectives.
  • Make review a first-class responsibility, not something developers do “when they have time.”
  • If a reviewer is unavailable, any other team member can review. Do not create single-reviewer dependencies.
  • Consider pairing as the default and async review as the exception. This eliminates the review bottleneck entirely.

Code Review and Trunk-Based Development

Code review and TBD work together, but only if review does not block integration. Here is how to reconcile them:

TBD Requirement How Review Adapts
Integrate to trunk at least daily Reviews must complete within hours, not days
Branches live < 24 hours PRs are opened and merged within the same day
Trunk is always releasable Reviewers focus on correctness, not perfection
Small, frequent changes Small changes are reviewed quickly and thoroughly

If your team finds that review is the bottleneck preventing daily integration, the most effective solution is to adopt pair programming. It eliminates the review step entirely by making review continuous.

Measuring Success

Metric Target Why It Matters
Review turnaround time < 2 hours Prevents review from blocking integration
PR size (lines changed) < 200 lines Smaller PRs get faster, more thorough reviews
PR age at merge < 24 hours Aligns with TBD branch age constraint
Review rework cycles < 2 rounds Multiple rounds indicate the change is too large or design was not discussed upfront

Next Step

Code review practices need to be codified in team agreements alongside other shared commitments. Continue to Working Agreements to establish your team’s definitions of done, ready, and CI practice.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6 - Working Agreements

Establish shared definitions of done and ready to align the team on quality and process.

Phase 1 - Foundations | Adapted from Dojo Consortium

The practices in Phase 1 - trunk-based development, testing, small work, and fast review - only work when the whole team commits to them. Working agreements make that commitment explicit. This page covers the key agreements a team needs before moving to pipeline automation in Phase 2.

Why Working Agreements Matter

A working agreement is a shared commitment that the team creates, owns, and enforces together. It is not a policy imposed from outside. It is the team’s own answer to the question: “How do we work together?”

Without working agreements, CD practices drift. One developer integrates daily; another keeps a branch for a week. One developer fixes a broken build immediately; another waits until after lunch. These inconsistencies compound. Within weeks, the team is no longer practicing CD - they are practicing individual preferences.

Working agreements prevent this drift by making expectations explicit. When everyone agrees on what “done” means, what “ready” means, and how CI works, the team can hold each other accountable without conflict.

Definition of Done

The Definition of Done (DoD) is the team’s shared standard for when a work item is complete. For CD, the Definition of Done must include deployment.

Minimum Definition of Done for CD

A work item is done when all of the following are true:

  • Code is integrated to trunk
  • All automated tests pass
  • Code has been reviewed (via pairing, mob, or pull request)
  • The change is deployable to production
  • No known defects are introduced
  • Relevant documentation is updated (API docs, runbooks, etc.)
  • Feature flags are in place for incomplete user-facing features

Why “Deployed to Production” Matters

Many teams define “done” as “code is merged.” This creates a gap between “done” and “delivered.” Work accumulates in a staging environment, waiting for a release. Risk grows with each unreleased change.

In a CD organization, “done” means the change is in production (or ready to be deployed to production at any time). This is the ultimate test of completeness: the change works in the real environment, with real data, under real load.

In Phase 1, you may not yet have the pipeline to deploy every change to production automatically. That is fine - your DoD should still include “deployable to production” as the standard, even if the deployment step is not yet automated. The pipeline work in Phase 2 will close that gap.

Extending Your Definition of Done

As your CD maturity grows, extend the DoD:

Phase Addition to DoD
Phase 1 (Foundations) Code integrated to trunk, tests pass, reviewed, deployable
Phase 2 (Pipeline) Artifact built and validated by the pipeline
Phase 3 (Optimize) Change deployed to production behind a feature flag
Phase 4 (Deliver on Demand) Change deployed to production and monitored

Definition of Ready

The Definition of Ready (DoR) answers: “When is a work item ready to be worked on?” Pulling unready work into development creates waste - unclear requirements lead to rework, missing acceptance criteria lead to untestable changes, and oversized stories lead to long-lived branches.

Minimum Definition of Ready for CD

A work item is ready when all of the following are true:

  • Acceptance criteria are defined and specific (using Given-When-Then or equivalent)
  • The work item is small enough to complete in 2 days or less
  • The work item is testable - the team knows how to verify it works
  • Dependencies are identified and resolved (or the work item is independent)
  • The team has discussed the work item (Three Amigos or equivalent)
  • The work item is estimated (or the team has agreed estimation is unnecessary for items this small)

Common Mistakes with Definition of Ready

  • Making it too rigid. The DoR is a guideline, not a gate. If the team agrees a work item is understood well enough, it is ready. Do not use the DoR to avoid starting work.
  • Requiring design documents. For small work items (< 2 days), a conversation and acceptance criteria are sufficient. Formal design documents are for larger initiatives.
  • Skipping the conversation. The DoR is most valuable as a prompt for discussion, not as a checklist. The Three Amigos conversation matters more than the checkboxes.

CI Working Agreement

The CI working agreement codifies how the team practices continuous integration. This is the most operationally critical working agreement for CD.

The CI Agreement

The team agrees to the following practices:

Integration:

  • Every developer integrates to trunk at least once per day
  • Branches (if used) live for less than 24 hours
  • No long-lived feature, development, or release branches

Build:

  • All tests must pass before merging to trunk
  • The build runs on every commit to trunk
  • Build results are visible to the entire team

Broken builds:

  • A broken build is the team’s top priority - it is fixed before any new work begins
  • The developer(s) who broke the build are responsible for fixing it immediately
  • If the fix will take more than 10 minutes, revert the change and fix it offline
  • No one commits to a broken trunk (except to fix the break)

Work in progress:

  • Finishing existing work takes priority over starting new work
  • The team limits work in progress to maintain flow
  • If a developer is blocked, they help a teammate before starting a new story

Why “Broken Build = Top Priority”

This is the single most important CI agreement. When the build is broken:

  • No one can integrate safely. Changes are stacking up.
  • Trunk is not releasable. The team has lost its safety net.
  • Every minute the build stays broken, the team accumulates risk.

“Fix the build” is not a suggestion. It is an agreement that the team enforces collectively. If the build is broken and someone starts a new feature instead of fixing it, the team should call that out. This is not punitive - it is the team protecting its own ability to deliver.

The Revert Rule

If a broken build cannot be fixed within 10 minutes, revert the offending commit and fix the issue on a branch. This keeps trunk green and unblocks the rest of the team. The developer who made the change is not being punished - they are protecting the team’s flow.

Reverting feels uncomfortable at first. Teams worry about “losing work.” But a reverted commit is not lost - the code is still in the Git history. The developer can re-apply their change after fixing the issue. The alternative - a broken trunk for hours while someone debugs - is far more costly.

How Working Agreements Support the CD Migration

Each working agreement maps directly to a Phase 1 practice:

Practice Supporting Agreement
Trunk-based development CI agreement: daily integration, branch age < 24h
Testing fundamentals DoD: all tests pass. CI: tests pass before merge
Build automation CI: build runs on every commit. Broken build = top priority
Work decomposition DoR: work items < 2 days. WIP limits
Code review CI: review within 2 hours. DoD: code reviewed

Without these agreements, individual practices exist in isolation. Working agreements connect them into a coherent way of working.

Template: Create Your Own Working Agreements

Use this template as a starting point. Customize it for your team’s context. The specific targets may differ, but the structure should remain.

Team Working Agreement Template

# [Team Name] Working Agreement
Date: [Date]
Participants: [All team members]

## Definition of Done
A work item is done when:
- [ ] Code is integrated to trunk
- [ ] All automated tests pass
- [ ] Code has been reviewed (method: [pair / mob / PR])
- [ ] The change is deployable to production
- [ ] No known defects are introduced
- [ ] [Add team-specific criteria]

## Definition of Ready
A work item is ready when:
- [ ] Acceptance criteria are defined (Given-When-Then)
- [ ] The item can be completed in [X] days or less
- [ ] The item is testable
- [ ] Dependencies are identified
- [ ] The team has discussed the item
- [ ] [Add team-specific criteria]

## CI Practices
- Integration frequency: at least [X] per developer per day
- Maximum branch age: [X] hours
- Review turnaround: within [X] hours
- Broken build response: fix within [X] minutes or revert
- WIP limit: [X] items per developer

## Review Practices
- Default review method: [pair / mob / async PR]
- PR size limit: [X] lines
- Review focus: [correctness, security, clarity]
- Style enforcement: [automated via linting]

## Meeting Cadence
- Standup: [time, frequency]
- Retrospective: [frequency]
- Working agreement review: [frequency, e.g., monthly]

## Agreement Review
This agreement is reviewed and updated [monthly / quarterly].
Any team member can propose changes at any time.
All changes require team consensus.

Tips for Creating Working Agreements

  1. Include everyone. Every team member should participate in creating the agreement. Agreements imposed by a manager or tech lead are policies, not agreements.
  2. Start simple. Do not try to cover every scenario. Start with the essentials (DoD, DoR, CI) and add specifics as the team identifies gaps.
  3. Make them visible. Post the agreements where the team sees them daily - on a team wiki, in the team channel, or on a physical board.
  4. Review regularly. Agreements should evolve as the team matures. Review them monthly. Remove agreements that are second nature. Add agreements for new challenges.
  5. Enforce collectively. Working agreements are only effective if the team holds each other accountable. This is a team responsibility, not a manager responsibility.
  6. Start with agreements you can keep. If the team is currently integrating once a week, do not agree to integrate three times daily. Agree to integrate daily, practice for a month, then tighten.

Measuring Success

Metric Target Why It Matters
Agreement adherence Team self-reports > 80% adherence Indicates agreements are realistic and followed
Agreement review frequency Monthly Ensures agreements stay relevant
Integration frequency Meets CI agreement target Validates the CI working agreement
Broken build fix time Meets CI agreement target Validates the broken build response agreement

Next Step

With working agreements in place, your team has established the foundations for continuous delivery: daily integration, reliable testing, automated builds, small work, fast review, and shared commitments.

You are ready to move to Phase 2: Pipeline, where you will build the automated path from commit to production.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7 - Everything as Code

Every artifact that defines your system - infrastructure, pipelines, configuration, database schemas, monitoring - belongs in version control and is delivered through pipelines.

Phase 1 - Foundations

If it is not in version control, it does not exist. If it is not delivered through a pipeline, it is a manual step. Manual steps block continuous delivery. This page establishes the principle that everything required to build, deploy, and operate your system is defined as code, version controlled, reviewed, and delivered through the same automated pipelines as your application.

The Principle

Continuous delivery requires that any change to your system - application code, infrastructure, pipeline configuration, database schema, monitoring rules, security policies - can be made through a single, consistent process: change the code, commit, let the pipeline deliver it.

When something is defined as code:

  • It is version controlled. You can see who changed what, when, and why. You can revert any change. You can trace any production state to a specific commit.
  • It is reviewed. Changes go through the same review process as application code. A second pair of eyes catches mistakes before they reach production.
  • It is tested. Automated validation catches errors before deployment. Linting, dry-runs, and policy checks apply to infrastructure the same way unit tests apply to application code.
  • It is reproducible. You can recreate any environment from scratch. Disaster recovery is “re-run the pipeline,” not “find the person who knows how to configure the server.”
  • It is delivered through a pipeline. No SSH, no clicking through UIs, no manual steps. The pipeline is the only path to production for everything, not just application code.

When something is not defined as code, it is a liability. It cannot be reviewed, tested, or reproduced. It exists only in someone’s head, a wiki page that is already outdated, or a configuration that was applied manually and has drifted from any documented state.

What “Everything” Means

Application code

This is where most teams start, and it is the least controversial. Your application source code is in version control, built and tested by a pipeline, and deployed as an immutable artifact.

If your application code is not in version control, start here. Nothing else in this page matters until this is in place.

Infrastructure

Every server, network, database instance, load balancer, DNS record, and cloud resource should be defined in code and provisioned through automation.

What this looks like:

  • Cloud resources defined in Terraform, Pulumi, CloudFormation, or similar tools
  • Server configuration managed by Ansible, Chef, Puppet, or container images
  • Network topology, firewall rules, and security groups defined declaratively
  • Environment creation is a pipeline run, not a ticket to another team

What this replaces:

  • Clicking through cloud provider consoles to create resources
  • SSH-ing into servers to install packages or change configuration
  • Filing tickets for another team to provision an environment
  • “Snowflake” servers that were configured by hand and nobody knows how to recreate

Why it matters for CD: If creating or modifying an environment requires manual steps, your deployment frequency is limited by the availability and speed of the person who performs those steps. If a production server fails and you cannot recreate it from code, your mean time to recovery is measured in hours or days instead of minutes.

Pipeline definitions

Your CI/CD pipeline configuration belongs in the same repository as the code it builds and deploys. The pipeline is code, not a configuration applied through a UI.

What this looks like:

  • Pipeline definitions in .github/workflows/, .gitlab-ci.yml, Jenkinsfile, or equivalent
  • Pipeline changes go through the same review process as application code
  • Pipeline behavior is deterministic - the same commit always produces the same pipeline behavior
  • Teams can modify their own pipelines without filing tickets

What this replaces:

  • Pipeline configuration maintained through a Jenkins UI that nobody is allowed to touch
  • A “platform team” that owns all pipeline definitions and queues change requests
  • Pipeline behavior that varies depending on server state or installed plugins

Why it matters for CD: The pipeline is the path to production. If the pipeline itself cannot be changed through a reviewed, automated process, it becomes a bottleneck and a risk. Pipeline changes should flow with the same speed and safety as application changes.

Database schemas and migrations

Database schema changes should be defined as versioned migration scripts, stored in version control, and applied through the pipeline.

What this looks like:

  • Migration scripts in the repository (using tools like Flyway, Liquibase, Alembic, or ActiveRecord migrations)
  • Every schema change is a numbered, ordered migration that can be applied and rolled back
  • Migrations run as part of the deployment pipeline, not as a manual step
  • Schema changes follow the expand-then-contract pattern: add the new column, deploy code that uses it, then remove the old column in a later migration

What this replaces:

  • A DBA manually applying SQL scripts during a maintenance window
  • Schema changes that are “just done in production” and not tracked anywhere
  • Database state that has drifted from what is defined in any migration script

Why it matters for CD: Database changes are one of the most common reasons teams cannot deploy continuously. If schema changes require manual intervention, coordinated downtime, or a separate approval process, they become a bottleneck that forces batching. Treating schemas as code with automated migrations removes this bottleneck.

Application configuration

Environment-specific configuration - database connection strings, API endpoints, feature flag states, logging levels - should be defined as code and managed through version control.

What this looks like:

  • Configuration values stored in a config management system (Consul, AWS Parameter Store, environment variable definitions in infrastructure code)
  • Configuration changes are committed, reviewed, and deployed through a pipeline
  • The same application artifact is deployed to every environment; only the configuration differs

What this replaces:

  • Configuration files edited manually on servers
  • Environment variables set by hand and forgotten
  • Configuration that exists only in a deployment runbook

See Application Config for detailed guidance on externalizing configuration.

Monitoring, alerting, and observability

Dashboards, alert rules, SLO definitions, and logging configuration should be defined as code.

What this looks like:

  • Alert rules defined in Terraform, Prometheus rules files, or Datadog monitors-as-code
  • Dashboards defined as JSON or YAML, not built by hand in a UI
  • SLO definitions tracked in version control alongside the services they measure
  • Logging configuration (what to log, where to send it, retention policies) in code

What this replaces:

  • Dashboards built manually in a monitoring UI that nobody knows how to recreate
  • Alert rules that were configured by hand during an incident and never documented
  • Monitoring configuration that exists only on the monitoring server

Why it matters for CD: If you deploy ten times a day, you need to know instantly whether each deployment is healthy. If your monitoring and alerting configuration is manual, it will drift, break, or be incomplete. Monitoring-as-code ensures that every service has consistent, reviewed, reproducible observability.

Security policies

Security controls - access policies, network rules, secret rotation schedules, compliance checks - should be defined as code and enforced automatically.

What this looks like:

  • IAM policies and RBAC rules defined in Terraform or policy-as-code tools (OPA, Sentinel)
  • Security scanning integrated into the pipeline (SAST, dependency scanning, container image scanning)
  • Secret rotation automated and defined in code
  • Compliance checks that run on every commit, not once a quarter

What this replaces:

  • Security reviews that happen at the end of the development cycle
  • Access policies configured through UIs and never audited
  • Compliance as a manual checklist performed before each release

Why it matters for CD: Security and compliance requirements are the most common organizational blockers for CD. When security controls are defined as code and enforced by the pipeline, you can prove to auditors that every change passed security checks automatically. This is stronger evidence than a manual review, and it does not slow down delivery.

The “One Change, One Process” Test

For every type of artifact in your system, ask:

If I need to change this, do I commit a code change and let the pipeline deliver it?

If the answer is yes, the artifact is managed as code. If the answer involves SSH, a UI, a ticket to another team, or a manual step, it is not.

Artifact Managed as code? If not, the risk is…
Application source code Usually yes -
Infrastructure (servers, networks, cloud resources) Often no Snowflake environments, slow provisioning, unreproducible disasters
Pipeline definitions Sometimes Pipeline changes are slow, unreviewed, and risky
Database schemas Sometimes Schema changes require manual coordination and downtime
Application configuration Sometimes Config drift between environments, “works in staging” failures
Monitoring and alerting Rarely Monitoring gaps, unreproducible dashboards, alert fatigue
Security policies Rarely Security as a gate instead of a guardrail, audit failures

The goal is for every row in this table to be “yes.” You will not get there overnight, but every artifact you move from manual to code-managed removes a bottleneck and a risk.

How to Get There

Start with what blocks you most

Do not try to move everything to code at once. Identify the artifact type that causes the most pain or blocks deployments most frequently:

  • If environment provisioning takes days, start with infrastructure as code.
  • If database changes are the reason you cannot deploy more than once a week, start with schema migrations as code.
  • If pipeline changes require tickets to a platform team, start with pipeline as code.
  • If configuration drift causes production incidents, start with configuration as code.

Apply the same practices as application code

Once an artifact is defined as code, treat it with the same rigor as application code:

  • Store it in version control (ideally in the same repository as the application it supports)
  • Review changes before they are applied
  • Test changes automatically (linting, dry-runs, policy checks)
  • Deliver changes through a pipeline
  • Never modify the artifact outside of this process

Eliminate manual pathways

The hardest part is closing the manual back doors. As long as someone can SSH into a server and make a change, or click through a UI to modify infrastructure, the code-defined state will drift from reality.

The principle is the same as Single Path to Production for application code: the pipeline is the only way any change reaches production. This applies to infrastructure, configuration, schemas, monitoring, and policies just as much as it applies to application code.

Measuring Progress

Metric What to look for
Artifact types managed as code Track how many of the categories above are fully code-managed. The number should increase over time.
Manual changes to production Count any change made outside of a pipeline (SSH, UI clicks, manual scripts). Target: zero.
Environment recreation time How long does it take to recreate a production-like environment from scratch? Should decrease as more infrastructure moves to code.
Mean time to recovery When infrastructure-as-code is in place, recovery from failures is “re-run the pipeline.” MTTR drops dramatically.