This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Phase 1: Foundations
Establish the essential practices for daily integration, testing, and small work decomposition.
Key question: “Can we integrate safely every day?”
This phase establishes the development practices that make continuous delivery possible.
Without these foundations, pipeline automation just speeds up a broken process.
What You’ll Do
- Adopt trunk-based development - Integrate to trunk at least daily
- Build testing fundamentals - Create a fast, reliable test suite
- Automate your build - One command to build, test, and package
- Decompose work - Break features into small, deliverable increments
- Streamline code review - Fast, effective review that doesn’t block flow
- Establish working agreements - Shared definitions of done and ready
- Everything as code - Infrastructure, pipelines, schemas, monitoring, and security policies in version control, delivered through pipelines
Why This Phase Matters
These practices are the prerequisites for everything that follows. Trunk-based development
eliminates merge hell. Testing fundamentals give you the confidence to deploy frequently.
Small work decomposition reduces risk per change. Together, they create the feedback loops
that drive continuous improvement.
When You’re Ready to Move On
You’re ready for Phase 2: Pipeline when:
- All developers integrate to trunk at least once per day
- Your test suite catches real defects and runs in under 10 minutes
- You can build and package your application with a single command
- Most work items are completable within 2 days
1 - Trunk-Based Development
Integrate all work to the trunk at least once per day to enable continuous integration.
Phase 1 - Foundations | Adapted from MinimumCD.org
Trunk-based development is the first foundation to establish. Without daily integration to a shared trunk, the rest of the CD migration cannot succeed. This page covers the core practice, two migration paths, and a tactical guide for getting started.
What Is Trunk-Based Development?
Trunk-based development (TBD) is a branching strategy where all developers integrate their work into a single shared branch - the trunk - at least once per day. The trunk is always kept in a releasable state.
This is a non-negotiable prerequisite for continuous delivery. If your team is not integrating to trunk daily, you are not doing CI, and you cannot do CD. There is no workaround.
“If it hurts, do it more often, and bring the pain forward.”
- Jez Humble, Continuous Delivery
What TBD Is Not
- It is not “everyone commits directly to
main with no guardrails.” You still test, review, and validate work - you just do it in small increments.
- It is not incompatible with code review. It requires review to happen quickly.
- It is not reckless. It is the opposite: small, frequent integrations are far safer than large, infrequent merges.
What Trunk-Based Development Improves
| Problem |
How TBD Helps |
| Merge conflicts |
Small changes integrated frequently rarely conflict |
| Integration risk |
Bugs are caught within hours, not weeks |
| Long-lived branches diverge from reality |
The trunk always reflects the current state of the codebase |
| “Works on my branch” syndrome |
Everyone shares the same integration point |
| Slow feedback |
CI runs on every integration, giving immediate signal |
| Large batch deployments |
Small changes are individually deployable |
| Fear of deployment |
Each change is small enough to reason about |
Two Migration Paths
There are two valid approaches to trunk-based development. Both satisfy the minimum CD requirement of daily integration. Choose the one that fits your team’s current maturity and constraints.
Path 1: Short-Lived Branches
Developers create branches that live for less than 24 hours. Work is done on the branch, reviewed quickly, and merged to trunk within a single day.
How it works:
- Pull the latest trunk
- Create a short-lived branch
- Make small, focused changes
- Open a pull request (or use pair programming as the review)
- Merge to trunk before end of day
- The branch is deleted after merge
Best for teams that:
- Currently use long-lived feature branches and need a stepping stone
- Have regulatory requirements for traceable review records
- Use pull request workflows they want to keep (but make faster)
- Are new to TBD and want a gradual transition
Key constraint: The branch must merge to trunk within 24 hours. If it does not, you have a long-lived branch and you have lost the benefit of TBD.
Path 2: Direct Trunk Commits
Developers commit directly to trunk. Quality is ensured through pre-commit checks, pair programming, and strong automated testing.
How it works:
- Pull the latest trunk
- Make a small, tested change locally
- Run the local build and test suite
- Push directly to trunk
- CI validates the commit immediately
Best for teams that:
- Have strong automated test coverage
- Practice pair or mob programming (which provides real-time review)
- Want maximum integration frequency
- Have high trust and shared code ownership
Key constraint: This requires excellent test coverage and a culture where the team owns quality collectively. Without these, direct trunk commits become reckless.
How to Choose Your Path
Ask these questions:
- Do you have automated tests that catch real defects? If no, start with Path 1 and invest in testing fundamentals in parallel.
- Does your organization require documented review approvals? If yes, use Path 1 with rapid pull requests.
- Does your team practice pair programming? If yes, Path 2 may work immediately - pairing is a continuous review process.
- How large is your team? Teams of 2-4 can adopt Path 2 more easily. Larger teams may start with Path 1 and transition later.
Both paths are valid. The important thing is daily integration to trunk. Do not spend weeks debating which path to use. Pick one, start today, and adjust.
Essential Supporting Practices
Trunk-based development does not work in isolation. These supporting practices make daily integration safe and sustainable.
Feature Flags
When you integrate to trunk daily, incomplete features will exist on trunk. Feature flags let you merge code that is not yet ready for users.
Rules for feature flags in TBD:
- Use flags to decouple deployment from release
- Remove flags within days or weeks - they are temporary by design
- Keep flag logic simple; avoid nested or dependent flags
- Test both flag states in your automated test suite
Feature flags are covered in more depth in Phase 3: Optimize.
Commit Small, Commit Often
Each commit should be a small, coherent change that leaves trunk in a working state. If you are committing once a day in a large batch, you are not getting the benefit of TBD.
Guidelines:
- Each commit should be independently deployable
- A commit should represent a single logical change
- If you cannot describe the change in one sentence, it is too big
- Target multiple commits per day, not one large commit at end of day
Test-Driven Development (TDD) and ATDD
TDD provides the safety net that makes frequent integration sustainable. When every change is accompanied by tests, you can integrate confidently.
- TDD: Write the test before the code. Red, green, refactor.
- ATDD (Acceptance Test-Driven Development): Write acceptance criteria as executable tests before implementation.
Both practices ensure that your test suite grows with your code and that trunk remains releasable.
Getting Started: A Tactical Guide
Step 1: Shorten Your Branches (Week 1)
If your team currently uses long-lived feature branches, start by shortening their lifespan.
| Current State |
Target |
| Branches live for weeks |
Branches live for < 1 week |
| Merge once per sprint |
Merge multiple times per week |
| Large merge conflicts are normal |
Conflicts are rare and small |
Action: Set a team agreement that no branch lives longer than 2 days. Track branch age as a metric.
Step 2: Integrate Daily (Week 2-3)
Tighten the window from 2 days to 1 day.
Action:
- Every developer merges to trunk at least once per day, every day they write code
- If work is not complete, use a feature flag or other technique to merge safely
- Track integration frequency as your primary metric
Step 3: Ensure Trunk Stays Green (Week 2-3)
Daily integration is only useful if trunk remains in a releasable state.
Action:
- Run your test suite on every merge to trunk
- If the build breaks, fixing it becomes the team’s top priority
- Establish a working agreement: “broken build = stop the line” (see Working Agreements)
Step 4: Remove the Safety Net of Long Branches (Week 4+)
Once the team is integrating daily with a green trunk, eliminate the option of long-lived branches.
Action:
- Configure branch protection rules to warn or block branches older than 24 hours
- Remove any workflow that depends on long-lived branches (e.g., “dev” or “release” branches)
- Celebrate the transition - this is a significant shift in how the team works
Key Pitfalls
1. “We integrate daily, but we also keep our feature branches”
If you are merging to trunk daily but also maintaining a long-lived feature branch, you are not doing TBD. The feature branch will diverge, and merging it later will be painful. The integration to trunk must be the only integration point.
2. “Our builds are too slow for frequent integration”
If your CI pipeline takes 30 minutes, integrating multiple times a day feels impractical. This is a real constraint - address it by investing in build automation and parallelizing your test suite. Target a build time under 10 minutes.
3. “We can’t integrate incomplete features to trunk”
Yes, you can. Use feature flags to hide incomplete work from users. The code exists on trunk, but the feature is not active. This is a standard practice at every company that practices CD.
4. “Code review takes too long for daily integration”
If pull request reviews take 2 days, daily integration is impossible. The solution is to change how you review: pair programming provides continuous review, mob programming reviews in real time, and small changes can be reviewed asynchronously in minutes. See Code Review for specific techniques.
5. “What if someone pushes a bad commit to trunk?”
This is why you have automated tests, CI, and the “broken build = top priority” agreement. Bad commits will happen. The question is how fast you detect and fix them. With TBD and CI, the answer is minutes, not days.
Measuring Success
Track these metrics to verify your TBD adoption:
| Metric |
Target |
Why It Matters |
| Integration frequency |
At least 1 per developer per day |
Confirms daily integration is happening |
| Branch age |
< 24 hours |
Catches long-lived branches |
| Build duration |
< 10 minutes |
Enables frequent integration without frustration |
| Merge conflict frequency |
Decreasing over time |
Confirms small changes reduce conflicts |
Further Reading
This page covers the essentials for Phase 1 of your migration. For detailed guidance on specific scenarios, see the full source material:
Next Step
Once your team is integrating to trunk daily, build the test suite that makes that integration trustworthy. Continue to Testing Fundamentals.
This content is adapted from MinimumCD.org,
licensed under CC BY 4.0.
2 - Testing Fundamentals
Build a test architecture that gives your pipeline the confidence to deploy any change, even when dependencies outside your control are unavailable.
Phase 1 - Foundations | Adapted from Dojo Consortium
Before you can trust your pipeline, you need a test suite that is fast, deterministic, and catches
real defects. But a collection of tests is not enough. You need a test architecture - a
deliberate structure where different types of tests work together to give you the confidence to
deploy every change, regardless of whether external systems are up, slow, or behaving
unexpectedly.
Why Testing Is a Foundation
Continuous delivery requires that trunk always be releasable. The only way to know trunk is
releasable is to test it - automatically, on every change. Without a reliable test suite, daily
integration is just daily risk.
In many organizations, testing is the single biggest obstacle to CD adoption. Not because teams
lack tests, but because the tests they have are slow, flaky, poorly structured, and - most
critically - unable to give the pipeline a reliable answer to the question: is this change safe
to deploy?
Testing Goals for CD
Your test suite must meet these criteria before it can support continuous delivery:
| Goal |
Target |
Why |
| Fast |
Full suite completes in under 10 minutes |
Developers need feedback before context-switching |
| Deterministic |
Same code always produces the same test result |
Flaky tests destroy trust and get ignored |
| Catches real bugs |
Tests fail when behavior is wrong, not when implementation changes |
Brittle tests create noise, not signal |
| Independent of external systems |
Pipeline can determine deployability without any dependency being available |
Your ability to deploy cannot be held hostage by someone else’s outage |
If your test suite does not meet these criteria today, improving it is your highest-priority
foundation work.
Beyond the Test Pyramid
The test pyramid - many unit tests at the base, fewer integration tests in the middle, a handful
of end-to-end tests at the top - has been the dominant mental model for test strategy since Mike
Cohn introduced it. The core insight is sound: push testing as low as possible. Lower-level
tests are faster, more deterministic, and cheaper to maintain. Higher-level tests are slower,
more brittle, and more expensive.
But as a prescriptive model, the pyramid is overly simplistic. Teams that treat it as a rigid
ratio end up in unproductive debates about whether they have “too many” integration tests or “not
enough” unit tests. The shape of your test distribution matters far less than whether your tests,
taken together, give you the confidence to deploy.
What actually matters
The pyramid’s principle - write tests with different granularity - remains correct. But for
CD, the question is not “do we have the right pyramid shape?” The question is:
Can our pipeline determine that a change is safe to deploy without depending on any system we
do not control?
This reframes the testing conversation. Instead of counting tests by type and trying to match a
diagram, you design a test architecture where:
-
Fast, deterministic tests catch the vast majority of defects and run on every commit.
These tests use test doubles for anything outside
the team’s control. They give you a reliable go/no-go signal in minutes.
-
Contract tests verify that your test doubles still match reality. They run asynchronously
and catch drift between your assumptions and the real world - without blocking your pipeline.
-
A small number of non-deterministic tests validate that the fully integrated system works.
These run post-deployment and provide monitoring, not gating.
This structure means your pipeline can confidently say “yes, deploy this” even if a downstream
API is having an outage, a third-party service is slow, or a partner team hasn’t deployed their
latest changes yet. Your ability to deliver is decoupled from the reliability of systems you do
not own.
The anti-pattern: the ice cream cone
Most teams that struggle with CD have an inverted test distribution - too many slow, expensive
end-to-end tests and too few fast, focused tests.
┌─────────────────────────┐
│ Manual Testing │ ← Most testing happens here
├─────────────────────────┤
│ End-to-End Tests │ ← Slow, flaky, expensive
├─────────────────────────┤
│ Integration Tests │ ← Some, but not enough
├───────────┤
│Unit Tests │ ← Too few
└───────────┘
The ice cream cone makes CD impossible. Manual testing gates block every release. End-to-end tests
take hours, fail randomly, and depend on external systems being healthy. The pipeline cannot give
a fast, reliable answer about deployability, so deployments become high-ceremony events.
Test Architecture for the CD Pipeline
A test architecture is the deliberate structure of how different test types work together across
your pipeline to give you deployment confidence. Each layer has a specific role, and the layers
reinforce each other.
Layer 1: Unit tests - verify logic in isolation
Unit tests exercise individual functions, methods, or components with all external dependencies
replaced by test doubles. They are the fastest and most
deterministic tests you have.
Role in CD: Catch logic errors, regressions, and edge cases instantly. Provide the tightest
feedback loop - developers should see results in seconds while coding.
What they cannot do: Verify that components work together, that your code correctly calls
external services, or that the system behaves correctly as a whole.
See Unit Tests for detailed guidance.
Layer 2: Integration tests - verify boundaries
Integration tests verify that components interact correctly at their boundaries: database queries
return the expected data, HTTP clients serialize requests correctly, message producers format
messages as expected. External systems are replaced with test doubles, but internal collaborators
are real.
Role in CD: Catch the bugs that unit tests miss - mismatched interfaces, serialization errors,
query bugs. These tests are fast enough to run on every commit but realistic enough to catch
real integration failures.
What they cannot do: Verify that the system works end-to-end from a user’s perspective, or
that your assumptions about external services are still correct.
The line between unit tests and integration tests is often debated. As Ham Vocke writes in
The Practical Test Pyramid:
the naming matters less than the discipline. The key question is whether the test is fast,
deterministic, and tests something your unit tests cannot. If yes, it belongs here.
See Integration Tests for detailed guidance.
Layer 3: Functional tests - verify your system works in isolation
Functional tests (also called component tests) exercise your entire sub-system - your service,
your application - from the outside, as a user or consumer would interact with it. All external
dependencies are replaced with test doubles. The test boots your application, sends real HTTP
requests or simulates real user interactions, and verifies the responses.
Role in CD: This is the layer that proves your system works as a complete unit, independent
of everything else. Functional tests answer: “if we deploy this service right now, will it
behave correctly for every interaction that is within our control?” Because all external
dependencies are stubbed, these tests are deterministic and fast. They can run on every commit.
Why this layer is critical for CD: Functional tests are what allow you to deploy with
confidence even when dependencies outside your control are unavailable. Your test doubles
simulate the expected behavior of those dependencies. As long as your doubles are accurate (which
is what contract tests verify), your functional tests prove your system handles those interactions
correctly.
See Functional Tests for detailed guidance.
Layer 4: Contract tests - verify your assumptions about others
Contract tests validate that the test doubles you use in layers 1-3 still accurately represent
the real external systems. They run against live dependencies and check contract format - response
structures, field names, types, and status codes - not specific data values.
Role in CD: Contract tests are the bridge between your fast, deterministic test suite and the
real world. Without them, your test doubles can silently drift from reality, and your functional
tests provide false confidence. With them, you know that the assumptions baked into your test
doubles are still correct.
Consumer-driven contracts take this further: the consumer of an API publishes expectations
(using tools like Pact), and the provider runs those expectations as part of
their build. Both teams know immediately when a change would break the contract.
Contract tests are non-deterministic because they hit live systems. They should not block
your pipeline. Instead, failures trigger a review: has the contract changed, or was it a transient
network issue? If the contract has changed, update your test doubles and re-verify.
See Contract Tests for detailed guidance.
Layer 5: End-to-end tests - verify the integrated system post-deployment
End-to-end tests validate complete user journeys through the fully integrated system with no
test doubles. They run against real services, real databases, and real third-party integrations.
Role in CD: E2E tests are monitoring, not gating. They run after deployment to verify that
the integrated system works. A small suite of smoke tests can run immediately post-deployment
to catch gross integration failures. Broader E2E suites run on a schedule.
Why E2E tests should not gate your pipeline: E2E tests are non-deterministic. They fail for
reasons unrelated to your change - network blips, third-party outages, shared environment
instability. If your pipeline depends on E2E tests passing before you can deploy, your deployment
frequency is limited by the reliability of every system in the chain. This is the opposite of the
independence CD requires.
See End-to-End Tests for detailed guidance.
How the layers work together
Pipeline stage Test layer Deterministic? Blocks deploy?
─────────────────────────────────────────────────────────────────────────
On every commit Unit tests Yes Yes
Integration tests Yes Yes
Functional tests Yes Yes
Asynchronous Contract tests No No (triggers review)
Post-deployment E2E smoke tests No Triggers rollback if critical
Synthetic monitoring No Triggers alerts
The critical insight: everything that blocks deployment is deterministic and under your
control. Everything that involves external systems runs asynchronously or post-deployment. This
is what gives you the independence to deploy any time, regardless of the state of the world
around you.
Week 1 Action Plan
If your test suite is not yet ready to support CD, use this focused action plan to make immediate
progress.
Day 1-2: Audit your current test suite
Assess where you stand before making changes.
Actions:
- Run your full test suite 3 times. Note total duration and any tests that pass intermittently
(flaky tests).
- Count tests by type: unit, integration, functional, end-to-end.
- Identify tests that require external dependencies (databases, APIs, file systems) to run.
- Record your baseline: total test count, pass rate, duration, flaky test count.
- Map each test type to a pipeline stage. Which tests gate deployment? Which run asynchronously?
Which tests couple your deployment to external systems?
Output: A clear picture of your test distribution and the specific problems to address.
Day 2-3: Fix or remove flaky tests
Flaky tests are worse than no tests. They train developers to ignore failures, which means real
failures also get ignored.
Actions:
- Quarantine all flaky tests immediately. Move them to a separate suite that does not block the
build.
- For each quarantined test, decide: fix it (if the behavior it tests matters) or delete it (if
it does not).
- Common causes of flakiness: timing dependencies, shared mutable state, reliance on external
services, test order dependencies.
- Target: zero flaky tests in your main test suite by end of week.
Day 3-4: Decouple your pipeline from external dependencies
This is the highest-leverage change for CD. Identify every test that calls a real external service
and replace that dependency with a test double.
Actions:
- List every external service your tests depend on: databases, APIs, message queues, file
storage, third-party services.
- For each dependency, decide the right test double approach:
- In-memory fakes for databases (e.g., SQLite, H2, testcontainers with local instances).
- HTTP stubs for external APIs (e.g., WireMock, nock, MSW).
- Fakes for message queues, email services, and other infrastructure.
- Replace the dependencies in your unit, integration, and functional tests.
- Move the original tests that hit real services into a separate suite - these become your
starting contract tests or E2E smoke tests.
Output: A test suite where everything that blocks the build is deterministic and runs without
network access to external systems.
Day 4-5: Add functional tests for critical paths
If you don’t have functional tests (component tests) that exercise your whole service in
isolation, start with the most critical paths.
Actions:
- Identify the 3-5 most critical user journeys or API endpoints in your application.
- Write a functional test for each: boot the application, stub external dependencies, send a
real request or simulate a real user action, verify the response.
- Each functional test should prove that the feature works correctly assuming external
dependencies behave as expected (which your test doubles encode).
- Run these in CI on every commit.
Day 5: Set up contract tests for your most important dependency
Pick the external dependency that changes most frequently or has caused the most production
issues. Set up a contract test for it.
Actions:
- Write a contract test that validates the response structure (types, required fields, status
codes) of the dependency’s API.
- Run it on a schedule (e.g., every hour or daily), not on every commit.
- When it fails, update your test doubles to match the new reality and re-verify your
functional tests.
- If the dependency is owned by another team in your organization, explore consumer-driven
contracts with a tool like Pact.
Test-Driven Development (TDD)
TDD is the practice of writing the test before the code. It is the most effective way to build a
reliable test suite because it ensures every piece of behavior has a corresponding test.
The TDD cycle:
- Red: Write a failing test that describes the behavior you want.
- Green: Write the minimum code to make the test pass.
- Refactor: Improve the code without changing the behavior. The test ensures you do not
break anything.
Why TDD supports CD:
- Every change is automatically covered by a test
- The test suite grows proportionally with the codebase
- Tests describe behavior, not implementation, making them more resilient to refactoring
- Developers get immediate feedback on whether their change works
TDD is not mandatory for CD, but teams that practice TDD consistently have significantly faster
and more reliable test suites.
Getting started with TDD
If your team is new to TDD, start small:
- Pick one new feature or bug fix this week.
- Write the test first, watch it fail.
- Write the code to make it pass.
- Refactor.
- Repeat for the next change.
Do not try to retroactively TDD your entire codebase. Apply TDD to new code and to any code you
modify.
Testing Matrix
Use this reference to decide what type of test to write and where it runs in your pipeline.
| What You Need to Verify |
Test Type |
Speed |
Deterministic? |
Blocks Deploy? |
See Also |
| A function or method behaves correctly |
Unit |
Milliseconds |
Yes |
Yes |
|
| Components interact correctly at a boundary |
Integration |
Milliseconds to seconds |
Yes |
Yes |
|
| Your whole service works in isolation |
Functional |
Seconds |
Yes |
Yes |
|
| Your test doubles match reality |
Contract |
Seconds |
No |
No |
|
| A critical user journey works end-to-end |
E2E |
Minutes |
No |
No |
|
| Code quality, security, and style compliance |
Static Analysis |
Seconds |
Yes |
Yes |
|
Best Practices Summary
Do
- Run tests on every commit. If tests do not run automatically, they will be skipped.
- Keep the deterministic suite under 10 minutes. If it is slower, developers will stop
running it locally.
- Fix broken tests immediately. A broken test is equivalent to a broken build.
- Delete tests that do not provide value. A test that never fails and tests trivial behavior
is maintenance cost with no benefit.
- Test behavior, not implementation. Tests should verify what the code does, not how it
does it. As Ham Vocke advises: “if I enter values
x and y, will the result be z?” - not
the sequence of internal calls that produce z.
- Use test doubles for external dependencies. Your deterministic tests should run without
network access to external systems.
- Validate test doubles with contract tests. Test doubles that drift from reality give false
confidence.
- Treat test code as production code. Give it the same care, review, and refactoring
attention.
Do Not
- Do not tolerate flaky tests. Quarantine or delete them immediately.
- Do not gate your pipeline on non-deterministic tests. E2E and contract test failures
should trigger review or alerts, not block deployment.
- Do not couple your deployment to external system availability. If a third-party API being
down prevents you from deploying, your test architecture has a critical gap.
- Do not write tests after the fact as a checkbox exercise. Tests written without
understanding the behavior they verify add noise, not value.
- Do not test private methods directly. Test the public interface; private methods are tested
indirectly.
- Do not share mutable state between tests. Each test should set up and tear down its own
state.
- Do not use sleep/wait for timing-dependent tests. Use explicit waits, polling, or
event-driven assertions.
- Do not require a running database or external service for unit tests. That makes them
integration tests - which is fine, but categorize them correctly.
Using Tests to Find and Eliminate Defect Sources
A test suite that catches bugs is good. A test suite that helps you stop producing those bugs
is transformational. Every test failure is evidence of a defect, and every defect has a source. If
you treat test failures only as things to fix, you are doing rework. If you treat them as
diagnostic data about where your process breaks down, you can make systemic changes that prevent
entire categories of defects from occurring.
This is the difference between a team that writes more tests to catch more bugs and a team that
changes how it works so that fewer bugs are created in the first place.
Trace every defect to its origin
When a test catches a defect - or worse, when a defect escapes to production - ask: where was
this defect introduced, and what would have prevented it from being created?
Defects do not originate randomly. They cluster around specific causes, and each cause has a
systemic fix:
| Where Defects Originate |
Example Defects |
Detection Method |
Systemic Fix |
| Requirements |
Building the right thing wrong, or the wrong thing right |
UX analytics, task completion tracking, A/B testing |
Acceptance criteria as user outcomes, not implementation tasks. Three Amigos sessions before work starts. Example mapping to surface edge cases before coding begins. |
| Missing domain knowledge |
Business rules encoded incorrectly, implicit assumptions |
Magic number detection, knowledge-concentration metrics |
Embed domain rules in code using ubiquitous language (DDD). Pair programming to spread knowledge. Living documentation generated from code. |
| Integration boundaries |
Interface mismatches, wrong assumptions about upstream behavior |
Consumer-driven contract tests, schema validation |
Contract tests mandatory per boundary. API-first design. Document behavioral contracts, not just data schemas. |
| Untested edge cases |
Null handling, boundary values, error paths |
Mutation testing, branch coverage thresholds, property-based testing |
Require a test for every bug fix. Adopt property-based testing for logic with many input permutations. Boundary value analysis as a standard practice. |
| Unintended side effects |
Change to module A breaks module B |
Mutation testing, change impact analysis |
Small focused commits. Trunk-based development (integrate daily so side effects surface immediately). Modular design with clear boundaries. |
| Accumulated complexity |
Defects cluster in the most complex, most-changed files |
Complexity trends, duplication scoring, dependency cycle detection |
Refactoring as part of every story, not deferred to a “tech debt sprint.” Dedicated complexity budget. |
| Long-lived branches |
Merge conflicts, integration failures, stale code |
Branch age alerts, merge conflict frequency |
Trunk-based development. Merge at least daily. CI rejects stale branches. |
| Configuration drift |
Works in staging, fails in production |
IaC drift detection, environment comparison, smoke tests |
All infrastructure as code. Same provisioning for every environment. Immutable infrastructure. |
| Data assumptions |
Null pointer exceptions, schema migration failures |
Null safety static analysis, schema compatibility checks, migration dry-runs |
Enforce null-safe types. Expand-then-contract for all schema changes. |
Build a defect feedback loop
Knowing the categories is not enough. You need a process that systematically connects test
failures to root causes and root causes to systemic fixes.
Step 1: Classify every defect. When a test fails or a bug is reported, tag it with its origin
category from the table above. This takes seconds and builds a dataset over time.
Step 2: Look for patterns. Monthly (or during retrospectives), review the defect
classifications. Which categories appear most often? That is where your process is weakest.
Step 3: Apply the systemic fix, not just the local fix. When you fix a bug, also ask: what
systemic change would prevent this entire category of bug? If most defects come from integration
boundaries, the fix is not “write more integration tests” - it is “make contract tests mandatory
for every new boundary.” If most defects come from untested edge cases, the fix is not “increase
code coverage” - it is “adopt property-based testing as a standard practice.”
Step 4: Measure whether the fix works. Track defect counts by category over time. If you
applied a systemic fix for integration boundary defects and the count does not drop, the fix is
not working and you need a different approach.
The test-for-every-bug-fix rule
One of the most effective systemic practices: every bug fix must include a test that
reproduces the bug before the fix and passes after. This is non-negotiable for CD because:
- It proves the fix actually addresses the defect (not just the symptom).
- It prevents the same defect from recurring.
- It builds test coverage exactly where the codebase is weakest - the places where bugs actually
occur.
- Over time, it shifts your test suite from “tests we thought to write” to “tests that cover
real failure modes.”
Advanced detection techniques
As your test architecture matures, add techniques that find defects humans overlook:
| Technique |
What It Finds |
When to Adopt |
| Mutation testing (Stryker, PIT) |
Tests that pass but do not actually verify behavior - your test suite’s blind spots |
When basic coverage is in place but defect escape rate is not dropping |
| Property-based testing |
Edge cases and boundary conditions across large input spaces that example-based tests miss |
When defects cluster around unexpected input combinations |
| Chaos engineering |
Failure modes in distributed systems - what happens when a dependency is slow, returns errors, or disappears |
When you have functional tests and contract tests in place and need confidence in failure handling |
| Static analysis and linting |
Null safety violations, type errors, security vulnerabilities, dead code |
From day one - these are cheap and fast |
For more examples of mapping defect origins to detection methods and systemic corrections, see
the CD Defect Detection and Remediation Patterns.
Measuring Success
| Metric |
Target |
Why It Matters |
| Deterministic suite duration |
< 10 minutes |
Enables fast feedback loops |
| Flaky test count |
0 in pipeline-gating suite |
Maintains trust in test results |
| External dependencies in gating tests |
0 |
Ensures deployment independence |
| Test coverage trend |
Increasing |
Confirms new code is being tested |
| Defect escape rate |
Decreasing |
Confirms tests catch real bugs |
| Contract test freshness |
All passing within last 24 hours |
Confirms test doubles are current |
Next Step
With a reliable test suite in place, automate your build process so that building, testing, and
packaging happens with a single command. Continue to Build Automation.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0. Additional concepts
drawn from Ham Vocke,
The Practical Test Pyramid,
and Toby Clemson,
Testing Strategies in a Microservice Architecture.
3 - Build Automation
Automate your build process so a single command builds, tests, and packages your application.
Phase 1 - Foundations | Adapted from Dojo Consortium
Build automation is the mechanism that turns trunk-based development and testing into a continuous integration loop. If you cannot build, test, and package your application with a single command, you cannot automate your pipeline. This page covers the practices that make your build reproducible, fast, and trustworthy.
What Build Automation Means
Build automation is the practice of scripting every step required to go from source code to a deployable artifact. A single command - or a single CI trigger - should execute the entire sequence:
- Compile the source code (if applicable)
- Run all automated tests
- Package the application into a deployable artifact (container image, binary, archive)
- Report the result (pass or fail, with details)
No manual steps. No “run this script, then do that.” No tribal knowledge about which flags to set or which order to run things. One command, every time, same result.
The Litmus Test
Ask yourself: “Can a new team member clone the repository and produce a deployable artifact with a single command within 15 minutes?”
If the answer is no, your build is not fully automated.
Why Build Automation Matters for CD
| CD Requirement |
How Build Automation Supports It |
| Reproducibility |
The same commit always produces the same artifact, on any machine |
| Speed |
Automated builds can be optimized, cached, and parallelized |
| Confidence |
If the build passes, the artifact is trustworthy |
| Developer experience |
Developers run the same build locally that CI runs, eliminating “works on my machine” |
| Pipeline foundation |
The CI/CD pipeline is just the build running automatically on every commit |
Without build automation, every other practice in this guide breaks down. You cannot have continuous integration if the build requires manual intervention. You cannot have a deterministic pipeline if the build produces different results depending on who runs it.
Key Practices
1. Version-Controlled Build Scripts
Your build configuration lives in the same repository as your code. It is versioned, reviewed, and tested alongside the application.
What belongs in version control:
- Build scripts (Makefile, build.gradle, package.json scripts, Dockerfile)
- Dependency manifests (requirements.txt, go.mod, pom.xml, package-lock.json)
- CI/CD pipeline definitions (.github/workflows, .gitlab-ci.yml, Jenkinsfile)
- Environment setup scripts (docker-compose.yml for local development)
What does not belong in version control:
- Secrets and credentials (use secret management tools)
- Environment-specific configuration values (use environment variables or config management)
- Generated artifacts (build outputs, compiled binaries)
Anti-pattern: Build instructions that exist only in a wiki, a Confluence page, or one developer’s head. If the build steps are not in the repository, they will drift from reality.
2. Dependency Management
All dependencies must be declared explicitly and resolved deterministically.
Practices:
- Lock files: Use lock files (package-lock.json, Pipfile.lock, go.sum) to pin exact dependency versions. Check lock files into version control.
- Reproducible resolution: Running the dependency install twice should produce identical results.
- No undeclared dependencies: Your build should not rely on tools or libraries that happen to be installed on the build machine. If you need it, declare it.
- Dependency scanning: Automate vulnerability scanning of dependencies as part of the build. Do not wait for a separate security review.
Anti-pattern: “It builds on Jenkins because Jenkins has Java 11 installed, but the Dockerfile uses Java 17.” The build must declare and control its own runtime.
3. Build Caching
Fast builds keep developers in flow. Caching is the primary mechanism for build speed.
What to cache:
- Dependencies: Download once, reuse across builds. Most build tools (npm, Maven, Gradle, pip) support a local cache.
- Compilation outputs: Incremental compilation avoids rebuilding unchanged modules.
- Docker layers: Structure your Dockerfile so that rarely-changing layers (OS, dependencies) are cached and only the application code layer is rebuilt.
- Test fixtures: Prebuilt test data or container images used by tests.
Guidelines:
- Cache aggressively for local development and CI
- Invalidate caches when dependencies or build configuration change
- Do not cache test results - tests must always run
4. Single Build Script Entry Point
Developers, CI, and CD should all use the same entry point.
The CI server runs make all. A developer runs make all. The result is the same. There is no separate “CI build script” that diverges from what developers run locally.
5. Artifact Versioning
Every build artifact must be traceable to the exact commit that produced it.
Practices:
- Tag artifacts with the Git commit SHA or a build number derived from it
- Store build metadata (commit, branch, timestamp, builder) in the artifact or alongside it
- Never overwrite an existing artifact - if the version exists, the artifact is immutable
This becomes critical in Phase 2 when you establish immutable artifact practices.
CI Server Setup Basics
The CI server is the mechanism that runs your build automatically. In Phase 1, the setup is straightforward:
What the CI Server Does
- Watches the trunk for new commits
- Runs the build (the same command a developer would run locally)
- Reports the result (pass/fail, test results, build duration)
- Notifies the team if the build fails
Minimum CI Configuration
Regardless of which CI tool you use (GitHub Actions, GitLab CI, Jenkins, CircleCI), the configuration follows the same pattern:
CI Principles for Phase 1
- Run on every commit. Not nightly, not weekly, not “when someone remembers.” Every commit to trunk triggers a build.
- Keep the build green. A failing build is the team’s top priority. Work stops until trunk is green again. (See Working Agreements.)
- Run the same build everywhere. The CI server runs the same script as local development. No CI-only steps that developers cannot reproduce.
- Fail fast. Run the fastest checks first (compilation, unit tests) before the slower ones (integration tests, packaging).
Build Time Targets
Build speed directly affects developer productivity and integration frequency. If the build takes 30 minutes, developers will not integrate multiple times per day.
| Build Phase |
Target |
Rationale |
| Compilation |
< 1 minute |
Developers need instant feedback on syntax and type errors |
| Unit tests |
< 3 minutes |
Fast enough to run before every commit |
| Integration tests |
< 5 minutes |
Must complete before the developer context-switches |
| Full build (compile + test + package) |
< 10 minutes |
The outer bound for fast feedback |
If Your Build Is Too Slow
Slow builds are a common constraint that blocks CD adoption. Address them systematically:
- Profile the build. Identify which steps take the most time. Optimize the bottleneck, not everything.
- Parallelize tests. Most test frameworks support parallel execution. Run independent test suites concurrently.
- Use build caching. Avoid recompiling or re-downloading unchanged dependencies.
- Split the build. Run fast checks (lint, compile, unit tests) as a “fast feedback” stage. Run slower checks (integration tests, security scans) as a second stage.
- Upgrade build hardware. Sometimes the fastest optimization is more CPU and RAM.
The target is under 10 minutes for the feedback loop that developers use on every commit. Longer-running validation (E2E tests, performance tests) can run in a separate stage.
Common Anti-Patterns
Manual Build Steps
Symptom: The build process includes steps like “open this tool and click Run” or “SSH into the build server and execute this script.”
Problem: Manual steps are error-prone, slow, and cannot be parallelized or cached. They are the single biggest obstacle to build automation.
Fix: Script every step. If a human must perform the step today, write a script that performs it tomorrow.
Environment-Specific Builds
Symptom: The build produces different artifacts for different environments (dev, staging, production). Or the build only works on specific machines because of pre-installed tools.
Problem: Environment-specific builds mean you are not testing the same artifact you deploy. Bugs that appear in production but not in staging become impossible to diagnose.
Fix: Build one artifact and configure it per environment at deployment time. The artifact is immutable; the configuration is external. (See Application Config in Phase 2.)
Build Scripts That Only Run in CI
Symptom: The CI pipeline has build steps that developers cannot run locally. Local development uses a different build process.
Problem: Developers cannot reproduce CI failures locally, leading to slow debugging cycles and “push and pray” development.
Fix: Use a single build entry point (Makefile, build script) that both CI and developers use. CI configuration should only add triggers and notifications, not build logic.
Missing Dependency Pinning
Symptom: Builds break randomly because a dependency released a new version overnight.
Problem: Without pinned dependencies, the build is non-deterministic. The same code can produce different results on different days.
Fix: Use lock files. Pin all dependency versions. Update dependencies intentionally, not accidentally.
Long Build Queues
Symptom: Developers commit to trunk, but the build does not run for 20 minutes because the CI server is processing a queue.
Problem: Delayed feedback defeats the purpose of CI. If developers do not see the result of their commit for 30 minutes, they have already moved on.
Fix: Ensure your CI infrastructure can handle your team’s commit frequency. Use parallel build agents. Prioritize builds on the main branch.
Measuring Success
| Metric |
Target |
Why It Matters |
| Build duration |
< 10 minutes |
Enables fast feedback and frequent integration |
| Build success rate |
> 95% |
Indicates reliable, reproducible builds |
| Time from commit to build result |
< 15 minutes (including queue time) |
Measures the full feedback loop |
| Developer ability to build locally |
100% of team |
Confirms the build is portable and documented |
Next Step
With build automation in place, you can build, test, and package your application reliably. The next foundation is ensuring that the work you integrate daily is small enough to be safe. Continue to Work Decomposition.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
4 - Work Decomposition
Break features into small, deliverable increments that can be completed in 2 days or less.
Phase 1 - Foundations | Adapted from Dojo Consortium
Trunk-based development requires daily integration, and daily integration requires small work. If a feature takes two weeks to build, you cannot integrate it daily without decomposing it first. This page covers the techniques for breaking work into small, deliverable increments that flow through your pipeline continuously.
Why Small Work Matters for CD
Continuous delivery depends on a simple equation: small changes, integrated frequently, are safer than large changes integrated rarely.
Every practice in Phase 1 reinforces this:
- Trunk-based development requires that you integrate at least daily. You cannot integrate a two-week feature daily unless you decompose it.
- Testing fundamentals work best when each change is small enough to test thoroughly.
- Code review is fast when the change is small. A 50-line change can be reviewed in minutes. A 2,000-line change takes hours - if it gets reviewed at all.
The data supports this. The DORA research consistently shows that smaller batch sizes correlate with higher delivery performance. Small changes have:
- Lower risk: If a small change breaks something, the blast radius is limited, and the cause is obvious.
- Faster feedback: A small change gets through the pipeline quickly. You learn whether it works today, not next week.
- Easier rollback: Rolling back a 50-line change is straightforward. Rolling back a 2,000-line change often requires a new deployment.
- Better flow: Small work items move through the system predictably. Large work items block queues and create bottlenecks.
The 2-Day Rule
If a work item takes longer than 2 days to complete, it is too big.
This is not arbitrary. Two days gives you at least one integration to trunk per day (the minimum for TBD) and allows for the natural rhythm of development: plan, implement, test, integrate, move on.
When a developer says “this will take a week,” the answer is not “go faster.” The answer is “break it into smaller pieces.”
What “Complete” Means
A work item is complete when it is:
- Integrated to trunk
- All tests pass
- The change is deployable (even if the feature is not yet user-visible)
- It meets the Definition of Done
If a story requires a feature flag to hide incomplete user-facing behavior, that is fine. The code is still integrated, tested, and deployable.
Story Slicing Techniques
Story slicing is the practice of breaking user stories into the smallest possible increments that still deliver value or make progress toward delivering value.
The INVEST Criteria
Good stories follow INVEST:
| Criterion |
Meaning |
Why It Matters for CD |
| Independent |
Can be developed and deployed without waiting for other stories |
Enables parallel work and avoids blocking |
| Negotiable |
Details can be discussed and adjusted |
Allows the team to find the smallest valuable slice |
| Valuable |
Delivers something meaningful to the user or the system |
Prevents “technical stories” that do not move the product forward |
| Estimable |
Small enough that the team can reasonably estimate it |
Large stories are unestimable because they hide unknowns |
| Small |
Completable within 2 days |
Enables daily integration and fast feedback |
| Testable |
Has clear acceptance criteria that can be automated |
Supports the testing foundation |
Vertical Slicing
The most important slicing technique for CD is vertical slicing: cutting through all layers of the application to deliver a thin but complete slice of functionality.
Vertical slice (correct):
“As a user, I can log in with my email and password.”
This slice touches the UI (login form), the API (authentication endpoint), and the database (user lookup). It is deployable and testable end-to-end.
Horizontal slice (anti-pattern):
“Build the database schema for user accounts.”
“Build the authentication API.”
“Build the login form UI.”
Each horizontal slice is incomplete on its own. None is deployable. None is testable end-to-end. They create dependencies between work items and block flow.
Slicing Strategies
When a story feels too big, apply one of these strategies:
| Strategy |
How It Works |
Example |
| By workflow step |
Implement one step of a multi-step process |
“User can add items to cart” (before “user can checkout”) |
| By business rule |
Implement one rule at a time |
“Orders over $100 get free shipping” (before “orders ship to international addresses”) |
| By data variation |
Handle one data type first |
“Support credit card payments” (before “support PayPal”) |
| By operation |
Implement CRUD operations separately |
“Create a new customer” (before “edit customer” or “delete customer”) |
| By performance |
Get it working first, optimize later |
“Search returns results” (before “search returns results in under 200ms”) |
| By platform |
Support one platform first |
“Works on desktop web” (before “works on mobile”) |
| Happy path first |
Implement the success case first |
“User completes checkout” (before “user sees error when payment fails”) |
Example: Decomposing a Feature
Original story (too big):
“As a user, I can manage my profile including name, email, avatar, password, notification preferences, and two-factor authentication.”
Decomposed into vertical slices:
- “User can view their current profile information” (read-only display)
- “User can update their name” (simplest edit)
- “User can update their email with verification” (adds email flow)
- “User can upload an avatar image” (adds file handling)
- “User can change their password” (adds security validation)
- “User can configure notification preferences” (adds preferences)
- “User can enable two-factor authentication” (adds 2FA flow)
Each slice is independently deployable, testable, and completable within 2 days. Each delivers incremental value. The feature is built up over a series of small deliveries rather than one large batch.
Behavior-Driven Development (BDD) is not just a testing practice - it is a powerful tool for decomposing work into small, clear increments.
Three Amigos
Before work begins, hold a brief “Three Amigos” session with three perspectives:
- Business/Product: What should this feature do? What is the expected behavior?
- Development: How will we build it? What are the technical considerations?
- Testing: How will we verify it? What are the edge cases?
This 15-30 minute conversation accomplishes two things:
- Shared understanding: Everyone agrees on what “done” looks like before work begins.
- Natural decomposition: Discussing specific scenarios reveals natural slice boundaries.
Specification by Example
Write acceptance criteria as concrete examples, not abstract requirements.
Abstract (hard to slice):
“The system should validate user input.”
Concrete (easy to slice):
- Given an email field, when the user enters “not-an-email”, then the form shows “Please enter a valid email address.”
- Given a password field, when the user enters fewer than 8 characters, then the form shows “Password must be at least 8 characters.”
- Given a name field, when the user leaves it blank, then the form shows “Name is required.”
Each concrete example can become its own story or task. The scope is clear, the acceptance criteria are testable, and the work is small.
Structure acceptance criteria in Given-When-Then format to make them executable:
Each scenario is a natural unit of work. Implement one scenario at a time, integrate to trunk after each one.
Task Decomposition Within Stories
Even well-sliced stories may contain multiple tasks. Decompose stories into tasks that can be completed and integrated independently.
Example story: “User can update their name”
Tasks:
- Add the name field to the profile API endpoint (backend change, integration test)
- Add the name field to the profile form (frontend change, unit test)
- Connect the form to the API endpoint (integration, E2E test)
Each task results in a commit to trunk. The story is completed through a series of small integrations, not one large merge.
Guidelines for task decomposition:
- Each task should take hours, not days
- Each task should leave trunk in a working state after integration
- Tasks should be ordered so that the simplest changes come first
- If a task requires a feature flag or stub to be integrated safely, that is fine
Common Anti-Patterns
Horizontal Slicing
Symptom: Stories are organized by architectural layer: “build the database schema,” “build the API,” “build the UI.”
Problem: No individual slice is deployable or testable end-to-end. Integration happens at the end, which is where bugs are found and schedules slip.
Fix: Slice vertically. Every story should touch all the layers needed to deliver a thin slice of complete functionality.
Technical Stories
Symptom: The backlog contains stories like “refactor the database access layer” or “upgrade to React 18” that do not deliver user-visible value.
Problem: Technical work is important, but when it is separated from feature work, it becomes hard to prioritize and easy to defer. It also creates large, risky changes.
Fix: Embed technical improvements in feature stories. Refactor as you go. If a technical change is necessary, tie it to a specific business outcome and keep it small enough to complete in 2 days.
Stories That Are Really Epics
Symptom: A story has 10+ acceptance criteria, or the estimate is “8 points” or “2 weeks.”
Problem: Large stories hide unknowns, resist estimation, and cannot be integrated daily.
Fix: If a story has more than 3-5 acceptance criteria, it is an epic. Break it into smaller stories using the slicing strategies above.
Splitting by Role Instead of by Behavior
Symptom: Separate stories for “frontend developer builds the UI” and “backend developer builds the API.”
Problem: This creates handoff dependencies and delays integration. The feature is not testable until both stories are complete.
Fix: Write stories from the user’s perspective. The same developer (or pair) implements the full vertical slice.
Deferring “Edge Cases” Indefinitely
Symptom: The team builds the happy path and creates a backlog of “handle error case X” stories that never get prioritized.
Problem: Error handling is not optional. Unhandled edge cases become production incidents.
Fix: Include the most important error cases in the initial story decomposition. Use the “happy path first” slicing strategy, but schedule edge case stories immediately after, not “someday.”
Measuring Success
| Metric |
Target |
Why It Matters |
| Story cycle time |
< 2 days from start to trunk |
Confirms stories are small enough |
| Development cycle time |
Decreasing |
Shows improved flow from smaller work |
| Stories completed per week |
Increasing (with same team size) |
Indicates better decomposition and less rework |
| Work in progress |
Decreasing |
Fewer large stories blocking the pipeline |
Next Step
Small, well-decomposed work flows through the system quickly - but only if code review does not become a bottleneck. Continue to Code Review to learn how to keep review fast and effective.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
5 - Code Review
Streamline code review to provide fast feedback without blocking flow.
Phase 1 - Foundations | Adapted from Dojo Consortium
Code review is essential for quality, but it is also the most common bottleneck in teams adopting trunk-based development. If reviews take days, daily integration is impossible. This page covers review techniques that maintain quality while enabling the flow that CD requires.
Why Code Review Matters for CD
Code review serves multiple purposes:
- Defect detection: A second pair of eyes catches bugs that the author missed.
- Knowledge sharing: Reviews spread understanding of the codebase across the team.
- Consistency: Reviews enforce coding standards and architectural patterns.
- Mentoring: Junior developers learn by having their code reviewed and by reviewing others’ code.
These are real benefits. The challenge is that traditional code review - open a pull request, wait for someone to review it, address comments, wait again - is too slow for CD.
In a CD workflow, code review must happen within minutes or hours, not days. The review is still rigorous, but the process is designed for speed.
The Core Tension: Quality vs. Flow
Traditional teams optimize review for thoroughness: detailed comments, multiple reviewers, extensive back-and-forth. This produces high-quality reviews but blocks flow.
CD teams optimize review for speed without sacrificing the quality that matters. The key insight is that most of the quality benefit of code review comes from small, focused reviews done quickly, not from exhaustive reviews done slowly.
| Traditional Review |
CD-Compatible Review |
| Review happens after the feature is complete |
Review happens continuously throughout development |
| Large diffs (hundreds or thousands of lines) |
Small diffs (< 200 lines, ideally < 50) |
| Multiple rounds of feedback and revision |
One round, or real-time feedback during pairing |
| Review takes 1-3 days |
Review takes minutes to a few hours |
| Review is asynchronous by default |
Review is synchronous by preference |
| 2+ reviewers required |
1 reviewer (or pairing as the review) |
Synchronous vs. Asynchronous Review
Synchronous Review (Preferred for CD)
In synchronous review, the reviewer and author are engaged at the same time. Feedback is immediate. Questions are answered in real time. The review is done when the conversation ends.
Methods:
- Pair programming: Two developers work on the same code at the same time. Review is continuous. There is no separate review step because the code was reviewed as it was written.
- Mob programming: The entire team (or a subset) works on the same code together. Everyone reviews in real time.
- Over-the-shoulder review: The author walks the reviewer through the change in person or on a video call. The reviewer asks questions and provides feedback immediately.
Advantages for CD:
- Zero wait time between “ready for review” and “review complete”
- Higher bandwidth communication (tone, context, visual cues) catches more issues
- Immediate resolution of questions - no async back-and-forth
- Knowledge transfer happens naturally through the shared work
Asynchronous Review (When Necessary)
Sometimes synchronous review is not possible - time zones, schedules, or team preferences may require asynchronous review. This is fine, but it must be fast.
Rules for async review in a CD workflow:
- Review within 2 hours. If a pull request sits for a day, it blocks integration. Set a team working agreement: “pull requests are reviewed within 2 hours during working hours.”
- Keep changes small. A 50-line change can be reviewed in 5 minutes. A 500-line change takes an hour and reviewers procrastinate on it.
- Use draft PRs for early feedback. If you want feedback on an approach before the code is complete, open a draft PR. Do not wait until the change is “perfect.”
- Avoid back-and-forth. If a comment requires discussion, move to a synchronous channel (call, chat). Async comment threads that go 5 rounds deep are a sign the change is too large or the design was not discussed upfront.
Review Techniques Compatible with TBD
Pair Programming as Review
When two developers pair on a change, the code is reviewed as it is written. There is no separate review step, no pull request waiting for approval, and no delay to integration.
How it works with TBD:
- Two developers sit together (physically or via screen share)
- They discuss the approach, write the code, and review each other’s decisions in real time
- When the change is ready, they commit to trunk together
- Both developers are accountable for the quality of the code
When to pair:
- New or unfamiliar areas of the codebase
- Changes that affect critical paths
- When a junior developer is working on a change (pairing doubles as mentoring)
- Any time the change involves design decisions that benefit from discussion
Pair programming satisfies most organizations’ code review requirements because two developers have actively reviewed and approved the code.
Mob Programming as Review
Mob programming extends pairing to the whole team. One person drives (types), one person navigates (directs), and the rest observe and contribute.
When to mob:
- Establishing new patterns or architectural decisions
- Complex changes that benefit from multiple perspectives
- Onboarding new team members to the codebase
- Working through particularly difficult problems
Mob programming is intensive but highly effective. Every team member understands the code, the design decisions, and the trade-offs.
Rapid Async Review
For teams that use pull requests, rapid async review adapts the pull request workflow for CD speed.
Practices:
- Auto-assign reviewers. Do not wait for someone to volunteer. Use tools to automatically assign a reviewer when a PR is opened.
- Keep PRs small. Target < 200 lines of changed code. Smaller PRs get reviewed faster and more thoroughly.
- Provide context. Write a clear PR description that explains what the change does, why it is needed, and how to verify it. A good description reduces review time dramatically.
- Use automated checks. Run linting, formatting, and tests before the human review. The reviewer should focus on logic and design, not style.
- Approve and merge quickly. If the change looks correct, approve it. Do not hold it for nitpicks. Nitpicks can be addressed in a follow-up commit.
What to Review
Not everything in a code change deserves the same level of scrutiny. Focus reviewer attention where it matters most.
High Priority (Reviewer Should Focus Here)
- Behavior correctness: Does the code do what it is supposed to do? Are edge cases handled?
- Security: Does the change introduce vulnerabilities? Are inputs validated? Are secrets handled properly?
- Clarity: Can another developer understand this code in 6 months? Are names clear? Is the logic straightforward?
- Test coverage: Are the new behaviors tested? Do the tests verify the right things?
- API contracts: Do changes to public interfaces maintain backward compatibility? Are they documented?
- Error handling: What happens when things go wrong? Are errors caught, logged, and surfaced appropriately?
Low Priority (Automate Instead of Reviewing)
- Code style and formatting: Use automated formatters (Prettier, Black, gofmt). Do not waste reviewer time on indentation and bracket placement.
- Import ordering: Automate with linting rules.
- Naming conventions: Enforce with lint rules where possible. Only flag naming in review if it genuinely harms readability.
- Unused variables or imports: Static analysis tools catch these instantly.
- Consistent patterns: Where possible, encode patterns in architecture decision records and lint rules rather than relying on reviewers to catch deviations.
Rule of thumb: If a style or convention issue can be caught by a machine, do not ask a human to catch it. Reserve human attention for the things machines cannot evaluate: correctness, design, clarity, and security.
Review Scope for Small Changes
In a CD workflow, most changes are small - tens of lines, not hundreds. This changes the economics of review.
| Change Size |
Expected Review Time |
Review Depth |
| < 20 lines |
2-5 minutes |
Quick scan: is it correct? Any security issues? |
| 20-100 lines |
5-15 minutes |
Full review: behavior, tests, clarity |
| 100-200 lines |
15-30 minutes |
Detailed review: design, contracts, edge cases |
| > 200 lines |
Consider splitting the change |
Large changes get superficial reviews |
Research consistently shows that reviewer effectiveness drops sharply after 200-400 lines. If you are regularly reviewing changes larger than 200 lines, the problem is not the review process - it is the work decomposition.
Working Agreements for Review SLAs
Establish clear team agreements about review expectations. Without explicit agreements, review latency will drift based on individual habits.
Recommended Review Agreements
| Agreement |
Target |
| Response time |
Review within 2 hours during working hours |
| Reviewer count |
1 reviewer (or pairing as the review) |
| PR size |
< 200 lines of changed code |
| Blocking issues only |
Only block a merge for correctness, security, or significant design issues |
| Nitpicks |
Use a “nit:” prefix. Nitpicks are suggestions, not merge blockers |
| Stale PRs |
PRs open for > 24 hours are escalated to the team |
| Self-review |
Author reviews their own diff before requesting review |
How to Enforce Review SLAs
- Track review turnaround time. If it consistently exceeds 2 hours, discuss it in retrospectives.
- Make review a first-class responsibility, not something developers do “when they have time.”
- If a reviewer is unavailable, any other team member can review. Do not create single-reviewer dependencies.
- Consider pairing as the default and async review as the exception. This eliminates the review bottleneck entirely.
Code Review and Trunk-Based Development
Code review and TBD work together, but only if review does not block integration. Here is how to reconcile them:
| TBD Requirement |
How Review Adapts |
| Integrate to trunk at least daily |
Reviews must complete within hours, not days |
| Branches live < 24 hours |
PRs are opened and merged within the same day |
| Trunk is always releasable |
Reviewers focus on correctness, not perfection |
| Small, frequent changes |
Small changes are reviewed quickly and thoroughly |
If your team finds that review is the bottleneck preventing daily integration, the most effective solution is to adopt pair programming. It eliminates the review step entirely by making review continuous.
Measuring Success
| Metric |
Target |
Why It Matters |
| Review turnaround time |
< 2 hours |
Prevents review from blocking integration |
| PR size (lines changed) |
< 200 lines |
Smaller PRs get faster, more thorough reviews |
| PR age at merge |
< 24 hours |
Aligns with TBD branch age constraint |
| Review rework cycles |
< 2 rounds |
Multiple rounds indicate the change is too large or design was not discussed upfront |
Next Step
Code review practices need to be codified in team agreements alongside other shared commitments. Continue to Working Agreements to establish your team’s definitions of done, ready, and CI practice.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
6 - Working Agreements
Establish shared definitions of done and ready to align the team on quality and process.
Phase 1 - Foundations | Adapted from Dojo Consortium
The practices in Phase 1 - trunk-based development, testing, small work, and fast review - only work when the whole team commits to them. Working agreements make that commitment explicit. This page covers the key agreements a team needs before moving to pipeline automation in Phase 2.
Why Working Agreements Matter
A working agreement is a shared commitment that the team creates, owns, and enforces together. It is not a policy imposed from outside. It is the team’s own answer to the question: “How do we work together?”
Without working agreements, CD practices drift. One developer integrates daily; another keeps a branch for a week. One developer fixes a broken build immediately; another waits until after lunch. These inconsistencies compound. Within weeks, the team is no longer practicing CD - they are practicing individual preferences.
Working agreements prevent this drift by making expectations explicit. When everyone agrees on what “done” means, what “ready” means, and how CI works, the team can hold each other accountable without conflict.
Definition of Done
The Definition of Done (DoD) is the team’s shared standard for when a work item is complete. For CD, the Definition of Done must include deployment.
Minimum Definition of Done for CD
A work item is done when all of the following are true:
Why “Deployed to Production” Matters
Many teams define “done” as “code is merged.” This creates a gap between “done” and “delivered.” Work accumulates in a staging environment, waiting for a release. Risk grows with each unreleased change.
In a CD organization, “done” means the change is in production (or ready to be deployed to production at any time). This is the ultimate test of completeness: the change works in the real environment, with real data, under real load.
In Phase 1, you may not yet have the pipeline to deploy every change to production automatically. That is fine - your DoD should still include “deployable to production” as the standard, even if the deployment step is not yet automated. The pipeline work in Phase 2 will close that gap.
Extending Your Definition of Done
As your CD maturity grows, extend the DoD:
| Phase |
Addition to DoD |
| Phase 1 (Foundations) |
Code integrated to trunk, tests pass, reviewed, deployable |
| Phase 2 (Pipeline) |
Artifact built and validated by the pipeline |
| Phase 3 (Optimize) |
Change deployed to production behind a feature flag |
| Phase 4 (Deliver on Demand) |
Change deployed to production and monitored |
Definition of Ready
The Definition of Ready (DoR) answers: “When is a work item ready to be worked on?” Pulling unready work into development creates waste - unclear requirements lead to rework, missing acceptance criteria lead to untestable changes, and oversized stories lead to long-lived branches.
Minimum Definition of Ready for CD
A work item is ready when all of the following are true:
Common Mistakes with Definition of Ready
- Making it too rigid. The DoR is a guideline, not a gate. If the team agrees a work item is understood well enough, it is ready. Do not use the DoR to avoid starting work.
- Requiring design documents. For small work items (< 2 days), a conversation and acceptance criteria are sufficient. Formal design documents are for larger initiatives.
- Skipping the conversation. The DoR is most valuable as a prompt for discussion, not as a checklist. The Three Amigos conversation matters more than the checkboxes.
CI Working Agreement
The CI working agreement codifies how the team practices continuous integration. This is the most operationally critical working agreement for CD.
The CI Agreement
The team agrees to the following practices:
Integration:
Build:
Broken builds:
Work in progress:
Why “Broken Build = Top Priority”
This is the single most important CI agreement. When the build is broken:
- No one can integrate safely. Changes are stacking up.
- Trunk is not releasable. The team has lost its safety net.
- Every minute the build stays broken, the team accumulates risk.
“Fix the build” is not a suggestion. It is an agreement that the team enforces collectively. If the build is broken and someone starts a new feature instead of fixing it, the team should call that out. This is not punitive - it is the team protecting its own ability to deliver.
The Revert Rule
If a broken build cannot be fixed within 10 minutes, revert the offending commit and fix the issue on a branch. This keeps trunk green and unblocks the rest of the team. The developer who made the change is not being punished - they are protecting the team’s flow.
Reverting feels uncomfortable at first. Teams worry about “losing work.” But a reverted commit is not lost - the code is still in the Git history. The developer can re-apply their change after fixing the issue. The alternative - a broken trunk for hours while someone debugs - is far more costly.
How Working Agreements Support the CD Migration
Each working agreement maps directly to a Phase 1 practice:
Without these agreements, individual practices exist in isolation. Working agreements connect them into a coherent way of working.
Template: Create Your Own Working Agreements
Use this template as a starting point. Customize it for your team’s context. The specific targets may differ, but the structure should remain.
Team Working Agreement Template
Tips for Creating Working Agreements
- Include everyone. Every team member should participate in creating the agreement. Agreements imposed by a manager or tech lead are policies, not agreements.
- Start simple. Do not try to cover every scenario. Start with the essentials (DoD, DoR, CI) and add specifics as the team identifies gaps.
- Make them visible. Post the agreements where the team sees them daily - on a team wiki, in the team channel, or on a physical board.
- Review regularly. Agreements should evolve as the team matures. Review them monthly. Remove agreements that are second nature. Add agreements for new challenges.
- Enforce collectively. Working agreements are only effective if the team holds each other accountable. This is a team responsibility, not a manager responsibility.
- Start with agreements you can keep. If the team is currently integrating once a week, do not agree to integrate three times daily. Agree to integrate daily, practice for a month, then tighten.
Measuring Success
| Metric |
Target |
Why It Matters |
| Agreement adherence |
Team self-reports > 80% adherence |
Indicates agreements are realistic and followed |
| Agreement review frequency |
Monthly |
Ensures agreements stay relevant |
| Integration frequency |
Meets CI agreement target |
Validates the CI working agreement |
| Broken build fix time |
Meets CI agreement target |
Validates the broken build response agreement |
Next Step
With working agreements in place, your team has established the foundations for continuous delivery: daily integration, reliable testing, automated builds, small work, fast review, and shared commitments.
You are ready to move to Phase 2: Pipeline, where you will build the automated path from commit to production.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
7 - Everything as Code
Every artifact that defines your system - infrastructure, pipelines, configuration, database schemas, monitoring - belongs in version control and is delivered through pipelines.
Phase 1 - Foundations
If it is not in version control, it does not exist. If it is not delivered through a pipeline, it
is a manual step. Manual steps block continuous delivery. This page establishes the principle that
everything required to build, deploy, and operate your system is defined as code, version
controlled, reviewed, and delivered through the same automated pipelines as your application.
The Principle
Continuous delivery requires that any change to your system - application code, infrastructure,
pipeline configuration, database schema, monitoring rules, security policies - can be made through
a single, consistent process: change the code, commit, let the pipeline deliver it.
When something is defined as code:
- It is version controlled. You can see who changed what, when, and why. You can revert any
change. You can trace any production state to a specific commit.
- It is reviewed. Changes go through the same review process as application code. A second
pair of eyes catches mistakes before they reach production.
- It is tested. Automated validation catches errors before deployment. Linting, dry-runs,
and policy checks apply to infrastructure the same way unit tests apply to application code.
- It is reproducible. You can recreate any environment from scratch. Disaster recovery is
“re-run the pipeline,” not “find the person who knows how to configure the server.”
- It is delivered through a pipeline. No SSH, no clicking through UIs, no manual steps. The
pipeline is the only path to production for everything, not just application code.
When something is not defined as code, it is a liability. It cannot be reviewed, tested, or
reproduced. It exists only in someone’s head, a wiki page that is already outdated, or a
configuration that was applied manually and has drifted from any documented state.
What “Everything” Means
Application code
This is where most teams start, and it is the least controversial. Your application source code
is in version control, built and tested by a pipeline, and deployed as an immutable artifact.
If your application code is not in version control, start here. Nothing else in this page matters
until this is in place.
Infrastructure
Every server, network, database instance, load balancer, DNS record, and cloud resource should be
defined in code and provisioned through automation.
What this looks like:
- Cloud resources defined in Terraform, Pulumi, CloudFormation, or similar tools
- Server configuration managed by Ansible, Chef, Puppet, or container images
- Network topology, firewall rules, and security groups defined declaratively
- Environment creation is a pipeline run, not a ticket to another team
What this replaces:
- Clicking through cloud provider consoles to create resources
- SSH-ing into servers to install packages or change configuration
- Filing tickets for another team to provision an environment
- “Snowflake” servers that were configured by hand and nobody knows how to recreate
Why it matters for CD: If creating or modifying an environment requires manual steps, your
deployment frequency is limited by the availability and speed of the person who performs those
steps. If a production server fails and you cannot recreate it from code, your mean time to
recovery is measured in hours or days instead of minutes.
Pipeline definitions
Your CI/CD pipeline configuration belongs in the same repository as the code it builds and
deploys. The pipeline is code, not a configuration applied through a UI.
What this looks like:
- Pipeline definitions in
.github/workflows/, .gitlab-ci.yml, Jenkinsfile, or equivalent
- Pipeline changes go through the same review process as application code
- Pipeline behavior is deterministic - the same commit always produces the same pipeline behavior
- Teams can modify their own pipelines without filing tickets
What this replaces:
- Pipeline configuration maintained through a Jenkins UI that nobody is allowed to touch
- A “platform team” that owns all pipeline definitions and queues change requests
- Pipeline behavior that varies depending on server state or installed plugins
Why it matters for CD: The pipeline is the path to production. If the pipeline itself cannot
be changed through a reviewed, automated process, it becomes a bottleneck and a risk. Pipeline
changes should flow with the same speed and safety as application changes.
Database schemas and migrations
Database schema changes should be defined as versioned migration scripts, stored in version
control, and applied through the pipeline.
What this looks like:
- Migration scripts in the repository (using tools like Flyway, Liquibase, Alembic, or
ActiveRecord migrations)
- Every schema change is a numbered, ordered migration that can be applied and rolled back
- Migrations run as part of the deployment pipeline, not as a manual step
- Schema changes follow the expand-then-contract pattern: add the new column, deploy code that
uses it, then remove the old column in a later migration
What this replaces:
- A DBA manually applying SQL scripts during a maintenance window
- Schema changes that are “just done in production” and not tracked anywhere
- Database state that has drifted from what is defined in any migration script
Why it matters for CD: Database changes are one of the most common reasons teams cannot deploy
continuously. If schema changes require manual intervention, coordinated downtime, or a separate
approval process, they become a bottleneck that forces batching. Treating schemas as code with
automated migrations removes this bottleneck.
Application configuration
Environment-specific configuration - database connection strings, API endpoints, feature flag
states, logging levels - should be defined as code and managed through version control.
What this looks like:
- Configuration values stored in a config management system (Consul, AWS Parameter Store,
environment variable definitions in infrastructure code)
- Configuration changes are committed, reviewed, and deployed through a pipeline
- The same application artifact is deployed to every environment; only the configuration differs
What this replaces:
- Configuration files edited manually on servers
- Environment variables set by hand and forgotten
- Configuration that exists only in a deployment runbook
See Application Config for detailed guidance on
externalizing configuration.
Monitoring, alerting, and observability
Dashboards, alert rules, SLO definitions, and logging configuration should be defined as code.
What this looks like:
- Alert rules defined in Terraform, Prometheus rules files, or Datadog monitors-as-code
- Dashboards defined as JSON or YAML, not built by hand in a UI
- SLO definitions tracked in version control alongside the services they measure
- Logging configuration (what to log, where to send it, retention policies) in code
What this replaces:
- Dashboards built manually in a monitoring UI that nobody knows how to recreate
- Alert rules that were configured by hand during an incident and never documented
- Monitoring configuration that exists only on the monitoring server
Why it matters for CD: If you deploy ten times a day, you need to know instantly whether each
deployment is healthy. If your monitoring and alerting configuration is manual, it will drift,
break, or be incomplete. Monitoring-as-code ensures that every service has consistent, reviewed,
reproducible observability.
Security policies
Security controls - access policies, network rules, secret rotation schedules, compliance
checks - should be defined as code and enforced automatically.
What this looks like:
- IAM policies and RBAC rules defined in Terraform or policy-as-code tools (OPA, Sentinel)
- Security scanning integrated into the pipeline (SAST, dependency scanning, container image
scanning)
- Secret rotation automated and defined in code
- Compliance checks that run on every commit, not once a quarter
What this replaces:
- Security reviews that happen at the end of the development cycle
- Access policies configured through UIs and never audited
- Compliance as a manual checklist performed before each release
Why it matters for CD: Security and compliance requirements are the most common organizational
blockers for CD. When security controls are defined as code and enforced by the pipeline, you can
prove to auditors that every change passed security checks automatically. This is stronger
evidence than a manual review, and it does not slow down delivery.
The “One Change, One Process” Test
For every type of artifact in your system, ask:
If I need to change this, do I commit a code change and let the pipeline deliver it?
If the answer is yes, the artifact is managed as code. If the answer involves SSH, a UI, a
ticket to another team, or a manual step, it is not.
| Artifact |
Managed as code? |
If not, the risk is… |
| Application source code |
Usually yes |
- |
| Infrastructure (servers, networks, cloud resources) |
Often no |
Snowflake environments, slow provisioning, unreproducible disasters |
| Pipeline definitions |
Sometimes |
Pipeline changes are slow, unreviewed, and risky |
| Database schemas |
Sometimes |
Schema changes require manual coordination and downtime |
| Application configuration |
Sometimes |
Config drift between environments, “works in staging” failures |
| Monitoring and alerting |
Rarely |
Monitoring gaps, unreproducible dashboards, alert fatigue |
| Security policies |
Rarely |
Security as a gate instead of a guardrail, audit failures |
The goal is for every row in this table to be “yes.” You will not get there overnight, but every
artifact you move from manual to code-managed removes a bottleneck and a risk.
How to Get There
Start with what blocks you most
Do not try to move everything to code at once. Identify the artifact type that causes the most
pain or blocks deployments most frequently:
- If environment provisioning takes days, start with infrastructure as code.
- If database changes are the reason you cannot deploy more than once a week, start with
schema migrations as code.
- If pipeline changes require tickets to a platform team, start with pipeline as code.
- If configuration drift causes production incidents, start with configuration as code.
Apply the same practices as application code
Once an artifact is defined as code, treat it with the same rigor as application code:
- Store it in version control (ideally in the same repository as the application it supports)
- Review changes before they are applied
- Test changes automatically (linting, dry-runs, policy checks)
- Deliver changes through a pipeline
- Never modify the artifact outside of this process
Eliminate manual pathways
The hardest part is closing the manual back doors. As long as someone can SSH into a server and
make a change, or click through a UI to modify infrastructure, the code-defined state will drift
from reality.
The principle is the same as Single Path to Production
for application code: the pipeline is the only way any change reaches production. This applies to
infrastructure, configuration, schemas, monitoring, and policies just as much as it applies to
application code.
Measuring Progress
| Metric |
What to look for |
| Artifact types managed as code |
Track how many of the categories above are fully code-managed. The number should increase over time. |
| Manual changes to production |
Count any change made outside of a pipeline (SSH, UI clicks, manual scripts). Target: zero. |
| Environment recreation time |
How long does it take to recreate a production-like environment from scratch? Should decrease as more infrastructure moves to code. |
| Mean time to recovery |
When infrastructure-as-code is in place, recovery from failures is “re-run the pipeline.” MTTR drops dramatically. |
Related Content