This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Phase 1: Foundations

Establish the essential practices for daily integration, testing, and small work decomposition.

Key question: “Can we integrate safely every day?”

This phase establishes the development practices that make continuous delivery possible. Without these foundations, pipeline automation just speeds up a broken process.

What You’ll Do

  1. Adopt trunk-based development - Integrate to trunk at least daily
  2. Build testing fundamentals - Create a fast, reliable test suite
  3. Automate your build - One command to build, test, and package
  4. Decompose work - Break features into small, deliverable increments
  5. Streamline code review - Fast, effective review that doesn’t block flow
  6. Establish working agreements - Shared definitions of done and ready
  7. Everything as code - Infrastructure, pipelines, schemas, monitoring, and security policies in version control, delivered through pipelines

Why This Phase Matters

These practices are the prerequisites for everything that follows. Trunk-based development eliminates merge hell. Testing fundamentals give you the confidence to deploy frequently. Small work decomposition reduces risk per change. Together, they create the feedback loops that drive continuous improvement.

When You’re Ready to Move On

You’re ready for Phase 2: Pipeline when:

  • All developers integrate to trunk at least once per day
  • Your test suite catches real defects and runs in under 10 minutes
  • You can build and package your application with a single command
  • Most work items are completable within 2 days

1 - Trunk-Based Development

Integrate all work to the trunk at least once per day to enable continuous integration.

Phase 1 - Foundations

Trunk-based development is the first foundation to establish. Without daily integration to a shared trunk, the rest of the CD migration cannot succeed. This page covers the core practice, two migration paths, and a tactical guide for getting started.

What Is Trunk-Based Development?

Trunk-based development (TBD) is a branching strategy where all developers integrate their work into a single shared branch - the trunk - at least once per day. The trunk is always kept in a releasable state.

This is a non-negotiable prerequisite for continuous delivery. If your team is not integrating to trunk daily, you are not doing CI, and you cannot do CD. There is no workaround.

“If it hurts, do it more often, and bring the pain forward.”

  • Jez Humble, Continuous Delivery

What TBD Is Not

  • It is not “everyone commits directly to main with no guardrails.” You still test, review, and validate work - you just do it in small increments.
  • It is not incompatible with code review. It requires review to happen quickly.
  • It is not reckless. It is the opposite: small, frequent integrations are far safer than large, infrequent merges.

What Trunk-Based Development Improves

Problem How TBD Helps
Merge conflicts Small changes integrated frequently rarely conflict
Integration risk Bugs are caught within hours, not weeks
Long-lived branches diverge from reality The trunk always reflects the current state of the codebase
“Works on my branch” syndrome Everyone shares the same integration point
Slow feedback CI runs on every integration, giving immediate signal
Large batch deployments Small changes are individually deployable
Fear of deployment Each change is small enough to reason about

Two Migration Paths

There are two valid approaches to trunk-based development. Both satisfy the minimum CD requirement of daily integration. Choose the one that fits your team’s current maturity and constraints.

Path 1: Short-Lived Branches

Developers create branches that live for less than 24 hours. Work is done on the branch, reviewed quickly, and merged to trunk within a single day.

How it works:

  1. Pull the latest trunk
  2. Create a short-lived branch
  3. Make small, focused changes
  4. Open a pull request (or use pair programming as the review)
  5. Merge to trunk before end of day
  6. The branch is deleted after merge

Best for teams that:

  • Currently use long-lived feature branches and need a stepping stone
  • Have regulatory requirements for traceable review records
  • Use pull request workflows they want to keep (but make faster)
  • Are new to TBD and want a gradual transition

Key constraint: The branch must merge to trunk within 24 hours. If it does not, you have a long-lived branch and you have lost the benefit of TBD.

Path 2: Direct Trunk Commits

Developers commit directly to trunk. Quality is ensured through pre-commit checks, pair programming, and strong automated testing.

How it works:

  1. Pull the latest trunk
  2. Make a small, tested change locally
  3. Run the local build and test suite
  4. Push directly to trunk
  5. CI validates the commit immediately

Best for teams that:

  • Have strong automated test coverage
  • Practice pair or mob programming (which provides real-time review)
  • Want maximum integration frequency
  • Have high trust and shared code ownership

Key constraint: This requires excellent test coverage and a culture where the team owns quality collectively. Without these, direct trunk commits become reckless.

How to Choose Your Path

Ask these questions:

  1. Do you have automated tests that catch real defects? If no, start with Path 1 and invest in testing fundamentals in parallel.
  2. Does your organization require documented review approvals? If yes, use Path 1 with rapid pull requests.
  3. Does your team practice pair programming? If yes, Path 2 may work immediately - pairing is a continuous review process.
  4. How large is your team? Teams of 2-4 can adopt Path 2 more easily. Larger teams may start with Path 1 and transition later.

Both paths are valid. The important thing is daily integration to trunk. Do not spend weeks debating which path to use. Pick one, start today, and adjust.

Essential Supporting Practices

Trunk-based development does not work in isolation. These supporting practices make daily integration safe and sustainable.

Feature Flags

When you integrate to trunk daily, incomplete features will exist on trunk. Feature flags let you merge code that is not yet ready for users.

# Simple feature flag example
if feature_flags.is_enabled("new-checkout-flow", user):
    return new_checkout(cart)
else:
    return legacy_checkout(cart)

Rules for feature flags in TBD:

  • Use flags to decouple deployment from release
  • Remove flags within days or weeks - they are temporary by design
  • Keep flag logic simple; avoid nested or dependent flags
  • Test both flag states in your automated test suite

When NOT to use feature flags:

  • New features that can be built and connected in a final commit - use Connect Last instead
  • Behavior changes that replace existing logic - use Branch by Abstraction instead
  • New API routes - build the route, expose it as the last change
  • Bug fixes or hotfixes - deploy immediately without a flag
  • Simple changes where standard deployment is sufficient

Feature flags are covered in more depth in Phase 3: Optimize.

Evolutionary Coding Practices

The ability to make code changes that are not complete features and integrate them to trunk without breaking existing behavior is a core skill for trunk-based development. You never make big-bang changes. You make small changes that limit risk. Feature flags are one approach, but two other patterns are equally important.

Branch by Abstraction

Branch by abstraction lets you gradually replace existing behavior while continuously integrating to trunk. It works in four steps:

// Step 1: Create abstraction (integrate to trunk)
class PaymentProcessor {
  process(payment) {
    return this.implementation.process(payment)
  }
}

// Step 2: Add new implementation alongside old (integrate to trunk)
class StripePaymentProcessor {
  process(payment) {
    // New Stripe implementation
  }
}

// Step 3: Switch implementations (integrate to trunk)
const processor = useNewStripe
  ? new StripePaymentProcessor()
  : new LegacyProcessor()

// Step 4: Remove old implementation (integrate to trunk)

Each step is a separate commit that keeps trunk working. The old behavior runs until you explicitly switch, and you can remove the abstraction layer once the migration is complete.

Connect Last

Connect Last means you build all the components of a feature, each individually tested and integrated to trunk, and wire them into the user-visible path only in the final commit.

// Commits 1-10: Build new checkout components (all tested, all integrated)
function CheckoutStep1() { /* tested, working */ }
function CheckoutStep2() { /* tested, working */ }
function CheckoutStep3() { /* tested, working */ }

// Commit 11: Wire up to UI (final integration)
<Route path="/checkout" component={CheckoutStep1} />

Because nothing references the new code until the last commit, there is no risk of breaking existing behavior during development.

Which Pattern Should I Use?

Pattern Best for Example
Connect Last New features that do not affect existing code Building a new checkout flow, adding a new report page
Branch by Abstraction Replacing or modifying existing behavior Swapping a payment processor, migrating a data layer
Feature Flags Gradual rollout, testing in production, or customer-specific features Dark launches, A/B tests, beta programs

If your change does not touch existing code paths, Connect Last is the simplest option. If you are replacing something that already exists, Branch by Abstraction gives you a safe migration path. Reserve feature flags for cases where you need runtime control over who sees the change.

Commit Small, Commit Often

Each commit should be a small, coherent change that leaves trunk in a working state. If you are committing once a day in a large batch, you are not getting the benefit of TBD.

Guidelines:

  • Each commit should be independently deployable
  • A commit should represent a single logical change
  • If you cannot describe the change in one sentence, it is too big
  • Target multiple commits per day, not one large commit at end of day

Test-Driven Development (TDD) and ATDD

TDD provides the safety net that makes frequent integration sustainable. When every change is accompanied by tests, you can integrate confidently.

  • TDD: Write the test before the code. Red, green, refactor.
  • ATDD (Acceptance Test-Driven Development): Write acceptance criteria as executable tests before implementation.

Both practices ensure that your test suite grows with your code and that trunk remains releasable.

Getting Started: A Tactical Guide

Step 1: Shorten Your Branches (Week 1)

If your team currently uses long-lived feature branches, start by shortening their lifespan.

Current State Target
Branches live for weeks Branches live for < 1 week
Merge once per sprint Merge multiple times per week
Large merge conflicts are normal Conflicts are rare and small

Action: Set a team agreement that no branch lives longer than 2 days. Track branch age as a metric.

Step 2: Integrate Daily (Week 2-3)

Tighten the window from 2 days to 1 day.

Action:

  • Every developer merges to trunk at least once per day, every day they write code
  • If work is not complete, use a feature flag or other technique to merge safely
  • Track integration frequency as your primary metric

Step 3: Ensure Trunk Stays Green (Week 2-3)

Daily integration is only useful if trunk remains in a releasable state.

Action:

  • Run your test suite on every merge to trunk
  • If the build breaks, fixing it becomes the team’s top priority
  • Establish a working agreement: “broken build = stop the line” (see Working Agreements)

Step 4: Remove the Safety Net of Long Branches (Week 4+)

Once the team is integrating daily with a green trunk, eliminate the option of long-lived branches.

Action:

  • Configure branch protection rules to warn or block branches older than 24 hours
  • Remove any workflow that depends on long-lived branches (e.g., “dev” or “release” branches)
  • Celebrate the transition - this is a significant shift in how the team works

Key Pitfalls

1. “We integrate daily, but we also keep our feature branches”

If you are merging to trunk daily but also maintaining a long-lived feature branch, you are not doing TBD. The feature branch will diverge, and merging it later will be painful. The integration to trunk must be the only integration point.

2. “Our builds are too slow for frequent integration”

If your CI pipeline takes 30 minutes, integrating multiple times a day feels impractical. This is a real constraint - address it by investing in build automation and parallelizing your test suite. Target a build time under 10 minutes.

3. “We can’t integrate incomplete features to trunk”

Yes, you can. Use feature flags to hide incomplete work from users. The code exists on trunk, but the feature is not active. This is a standard practice at every company that practices CD.

4. “Code review takes too long for daily integration”

If pull request reviews take 2 days, daily integration is impossible. The solution is to change how you review: pair programming provides continuous review, mob programming reviews in real time, and small changes can be reviewed asynchronously in minutes. See Code Review for specific techniques.

5. “What if someone pushes a bad commit to trunk?”

This is why you have automated tests, CI, and the “broken build = top priority” agreement. Bad commits will happen. The question is how fast you detect and fix them. With TBD and CI, the answer is minutes, not days.

Measuring Success

Track these metrics to verify your TBD adoption:

Metric Target Why It Matters
Integration frequency At least 1 per developer per day Confirms daily integration is happening
Branch age < 24 hours Catches long-lived branches
Build duration < 10 minutes Enables frequent integration without frustration
Merge conflict frequency Decreasing over time Confirms small changes reduce conflicts

Further Reading

This page covers the essentials for Phase 1 of your migration. For detailed guidance on specific scenarios:

Next Step

Once your team is integrating to trunk daily, build the test suite that makes that integration trustworthy. Continue to Testing Fundamentals.

1.1 - TBD Migration Guide

A tactical guide for migrating from GitFlow or long-lived branches to trunk-based development, covering regulated environments, multi-team coordination, and common pitfalls.

Phase 1 - Foundations

This is a detailed companion to the Trunk-Based Development overview. It covers specific migration paths, regulated environment guidance, multi-team strategies, and concrete scenarios.

Continuous delivery requires continuous integration and CI requires very frequent code integration, at least daily, to the trunk. Doing that either requires trunk-based development or worthless process overhead to do multiple merges to accomplish this. So, if you want CI, you’re not getting there without trunk-based development. However, standing up TBD is not as simple as “collapse all the branches.” CD is a quality process, not just automated code delivery. Trunk-based development is the first step in establishing that quality process and in uncovering the problems in the current process.

GitFlow, and other branching models that use long-lived branches, optimize for isolation to protect working code from untested or poorly tested code. They create the illusion of safety while silently increasing risk through long feedback delays. The result is predictable: painful merges, stale assumptions, and feedback that arrives too late to matter.

TBD reverses that. It optimizes for rapid feedback, smaller changes, and collaborative discovery, the ingredients required for CI and continuous delivery.

This article explains how to move from GitFlow (or any long-lived branch pattern) toward TBD, and what “good” actually looks like along the way.


Why Move to Trunk-Based Development?

Long-lived branches hide problems. TBD exposes them early, when they are cheap to fix.

Think of long-lived branches like storing food in a bunker: it feels safe until you open the door and discover half of it rotting. With TBD, teams check freshness every day.

To do CI, teams need:

  • Small changes integrated at least daily
  • Automated tests giving fast, deterministic feedback
  • A single source of truth: the trunk

If your branches live for more than a day or two, you aren’t doing continuous integration. You’re doing periodic integration at best. True CI requires at least daily integration to the trunk.


The First Step: Stop Letting Work Age

The biggest barrier isn’t tooling. It’s habits.

The first meaningful change is simple:

Stop letting branches live long enough to become problems.

Your first goal isn’t true TBD. It’s shorter-lived branches: changes that live for hours or a couple of days, not weeks.

That alone exposes dependency issues, unclear requirements, and missing tests, which is exactly the point. The pain tells you where improvement is needed.


Before You Start: What to Measure

You cannot improve what you don’t measure. Before changing anything, establish baseline metrics, so you can track actual progress.

Essential Metrics to Track Weekly

Branch Lifetime

  • Average time from branch creation to merge
  • Maximum branch age currently open
  • Target: Reduce average from weeks to days, then to hours

Integration Health

  • Number of merge conflicts per week
  • Time spent resolving conflicts
  • Target: Conflicts should decrease as integration frequency increases

Delivery Speed

  • Time from commit to production deployment
  • Number of commits per day reaching production
  • Target: Decrease time to production, increase deployment frequency

Quality Indicators

  • Build/test execution time
  • Test failure rate
  • Production incidents per deployment
  • Target: Fast, reliable tests; stable deployments

Work Decomposition

  • Average pull request size (lines changed)
  • Number of files changed per commit
  • Target: Smaller, more focused changes

Start with just two or three of these. Don’t let measurement become its own project.

The goal isn’t perfect data. It’s visibility into whether you’re actually moving in the right direction.


Path 1: Moving from Long-Lived Branches to Short-Lived Branches

When GitFlow habits are deeply ingrained, this is usually the least-threatening first step.

1. Collapse the Branching Model

Stop using:

  • develop
  • release branches that sit around for weeks
  • feature branches lasting a sprint or more

Move toward:

  • A single main (or trunk)
  • Temporary branches measured in hours or days

2. Integrate Every Few Days, Then Every Day

Set an explicit working agreement:

“Nothing lives longer than 48 hours.”

Once this feels normal, shorten it:

“Integrate at least once per day.”

If a change is too large to merge within a day or two, the problem isn’t the branching model. The problem is the decomposition of work.

3. Test Before You Code

Branch lifetime shortens when you stop guessing about expected behavior. Bring product, QA, and developers together before coding:

  • Write acceptance criteria collaboratively
  • Turn them into executable tests
  • Then write code to make those tests pass

You’ll discover misunderstandings upfront instead of after a week of coding.

This approach is called Behavior-Driven Development (BDD), a collaborative practice where teams define expected behavior in plain language before writing code. BDD bridges the gap between business requirements and technical implementation by using concrete examples that become executable tests.

Key BDD resources:

How to Run a Three Amigos Session

Participants: Product Owner, Developer, Tester (15-30 minutes per story)

Process:

  1. Product describes the user need and expected outcome
  2. Developer asks questions about edge cases and dependencies
  3. Tester identifies scenarios that could fail
  4. Together, write acceptance criteria as examples

Example:

Feature: User password reset

Scenario: Valid reset request
  Given a user with email "user@example.com" exists
  When they request a password reset
  Then they receive an email with a reset link
  And the link expires after 1 hour

Scenario: Invalid email
  Given no user with email "nobody@example.com" exists
  When they request a password reset
  Then they see "If the email exists, a reset link was sent"
  And no email is sent

Scenario: Expired link
  Given a user has a reset link older than 1 hour
  When they click the link
  Then they see "This reset link has expired"
  And they are prompted to request a new one

These scenarios become your automated acceptance tests before you write any implementation code.

From Acceptance Criteria to Tests

Turn those scenarios into executable tests in your framework of choice:

// Example using Jest and Supertest
describe('Password Reset', () => {
  it('sends reset email for valid user', async () => {
    await createUser({ email: 'user@example.com' });

    const response = await request(app)
      .post('/password-reset')
      .send({ email: 'user@example.com' });

    expect(response.status).toBe(200);
    expect(emailService.sentEmails).toHaveLength(1);
    expect(emailService.sentEmails[0].to).toBe('user@example.com');
  });

  it('does not reveal whether email exists', async () => {
    const response = await request(app)
      .post('/password-reset')
      .send({ email: 'nobody@example.com' });

    expect(response.status).toBe(200);
    expect(response.body.message).toBe('If the email exists, a reset link was sent');
    expect(emailService.sentEmails).toHaveLength(0);
  });
});

Now you can write the minimum code to make these tests pass. This drives smaller, more focused changes.

4. Invest in Contract Tests

Most merge pain isn’t from your code. It’s from the interfaces between services. Define interface changes early and codify them with provider/consumer contract tests.

This lets teams integrate frequently without surprises.


Path 2: Committing Directly to the Trunk

This is the cleanest and most powerful version of TBD. It requires discipline, but it produces the most stable delivery pipeline and the least drama.

If the idea of committing straight to main makes people panic, that’s a signal about your current testing process, not a problem with TBD.


How to Choose Your Path

Use this rule of thumb:

  • If your team fears “breaking everything,” start with short-lived branches.
  • If your team collaborates well and writes tests first, go straight to trunk commits.

Both paths require the same skills:

  • Smaller work
  • Better requirements
  • Shared understanding
  • Automated tests
  • A reliable pipeline

The difference is pace.


Essential TBD Practices

These practices apply to both paths, whether you’re using short-lived branches or committing directly to trunk.

Use Feature Flags the Right Way

Feature flags are one of several evolutionary coding practices that allow you to integrate incomplete work safely. Other methods include branch by abstraction and connect-last patterns.

Feature flags are not a testing strategy. They are a release strategy.

Every commit to trunk must:

  • Build
  • Test
  • Deploy safely

Flags let you deploy incomplete work without exposing it prematurely. They don’t excuse poor test discipline.

Start Simple: Boolean Flags

You don’t need a sophisticated feature flag system to start. Begin with environment variables or simple config files.

Simple boolean flag example:

// config/features.js
module.exports = {
  newCheckoutFlow: process.env.FEATURE_NEW_CHECKOUT === 'true',
  enhancedSearch: process.env.FEATURE_ENHANCED_SEARCH === 'true',
};

// In your code
const features = require('./config/features');

app.get('/checkout', (req, res) => {
  if (features.newCheckoutFlow) {
    return renderNewCheckout(req, res);
  }
  return renderOldCheckout(req, res);
});

This is enough for most TBD use cases.

Testing Code Behind Flags

Critical: You must test both code paths, flag on and flag off.

describe('Checkout flow', () => {
  describe('with new checkout flow enabled', () => {
    beforeEach(() => {
      features.newCheckoutFlow = true;
    });

    it('shows new checkout UI', () => {
      // Test new flow
    });
  });

  describe('with new checkout flow disabled', () => {
    beforeEach(() => {
      features.newCheckoutFlow = false;
    });

    it('shows legacy checkout UI', () => {
      // Test old flow
    });
  });
});

If you only test with the flag on, you’ll break production when the flag is off.

Two Types of Feature Flags

Feature flags serve two fundamentally different purposes:

Temporary Release Flags (should be removed):

  • Control rollout of new features
  • Enable gradual deployment
  • Allow quick rollback of changes
  • Test in production before full release
  • Lifecycle: Created for a release, removed once stable (typically 1-4 weeks)

Permanent Configuration Flags (designed to stay):

  • User preferences and settings (dark mode, email notifications, etc.)
  • Customer-specific features (enterprise vs. free tier)
  • A/B testing and experimentation
  • Regional or regulatory variations
  • Operational controls (read-only mode, maintenance mode)
  • Lifecycle: Part of your product’s configuration system

The distinction matters: Temporary release flags create technical debt if not removed. Permanent configuration flags are part of your feature set and belong in your configuration management system.

Most of the feature flags you create for TBD migration will be temporary release flags that must be removed.

Release Flag Lifecycle Management

Temporary release flags are scaffolding, not permanent architecture.

Every temporary release flag should have:

  1. A creation date
  2. A purpose
  3. An expected removal date
  4. An owner responsible for removal

Track your flags:

// flags.config.js
module.exports = {
  flags: [
    {
      name: 'newCheckoutFlow',
      created: '2024-01-15',
      owner: 'checkout-team',
      jiraTicket: 'SHOP-1234',
      removalTarget: '2024-02-15',
      purpose: 'Progressive rollout of redesigned checkout'
    }
  ]
};

Set reminders to remove flags. Permanent flags multiply complexity and slow you down.

When to Remove a Flag

Remove a flag when:

  • The feature is 100% rolled out and stable
  • You’re confident you won’t need to roll back
  • Usually 1-2 weeks after full deployment

Removal process:

  1. Set flag to always-on in code
  2. Deploy and monitor
  3. If stable for 48 hours, delete the conditional logic entirely
  4. Remove the flag from configuration

Common Anti-Patterns to Avoid

Don’t:

  • Let temporary release flags become permanent (if it’s truly permanent, it should be a configuration option)
  • Let release flags accumulate without removal
  • Skip testing both flag states
  • Use flags to hide broken code
  • Create flags for every tiny change

Do:

  • Use release flags for large or risky changes
  • Remove release flags as soon as the feature is stable
  • Clearly document whether each flag is temporary (release) or permanent (configuration)
  • Test both enabled and disabled states
  • Move permanent feature toggles to your configuration management system

Commit Small and Commit Often

If a change is too large to commit today, split it.

Large commits are failed design upstream, not failed integration downstream.

Use TDD and ATDD to Keep Refactors Safe

Refactoring must not break tests. If it does, you’re testing implementation, not behavior. Behavioral tests are what keep trunk commits safe.

Prioritize Interfaces First

Always start by defining and codifying the contract:

  • What is the shape of the request?
  • What is the response?
  • What error states must be handled?

Interfaces are the highest-risk area. Drive them with tests first. Then work inward.


Getting Started: A Tactical Guide

The initial phase sets the tone. Focus on establishing new habits, not perfection.

Step 1: Team Agreement and Baseline

  • Hold a team meeting to discuss the migration
  • Agree on initial branch lifetime limit (start with 48 hours if unsure)
  • Document current baseline metrics (branch age, merge frequency, build time)
  • Identify your slowest-running tests
  • Create a list of known integration pain points
  • Set up a visible tracker (physical board or digital dashboard) for metrics

Step 2: Test Infrastructure Audit

Focus: Find and fix what will slow you down.

  • Run your test suite and time each major section
  • Identify slow tests
  • Look for:
    • Tests with sleeps or arbitrary waits
    • Tests hitting external services unnecessarily
    • Integration tests that could be contract tests
    • Flaky tests masking real issues

Fix or isolate the worst offenders. You don’t need a perfect test suite to start, just one fast enough to not punish frequent integration.

Step 3: First Integrated Change

Pick the smallest possible change:

  • A bug fix
  • A refactoring with existing test coverage
  • A configuration update
  • Documentation improvement

The goal is to validate your process, not to deliver a feature.

Execute:

  1. Create a branch (if using Path 1) or commit directly (if using Path 2)
  2. Make the change
  3. Run tests locally
  4. Integrate to trunk
  5. Deploy through your pipeline
  6. Observe what breaks or slows you down

Step 4: Retrospective

Gather the team:

What went well:

  • Did anyone integrate faster than before?
  • Did you discover useful information about your tests or pipeline?

What hurt:

  • What took longer than expected?
  • What manual steps could be automated?
  • What dependencies blocked integration?

Ongoing commitment:

  • Adjust branch lifetime limit if needed
  • Assign owners to top 3 blockers
  • Commit to integrating at least one change per person

The initial phase won’t feel smooth. That’s expected. You’re learning what needs fixing.


Getting Your Team On Board

Technical changes are easy compared to changing habits and mindsets. Here’s how to build buy-in.

Acknowledge the Fear

When you propose TBD, you’ll hear:

  • “We’ll break production constantly”
  • “Our code isn’t good enough for that”
  • “We need code review on branches”
  • “This won’t work with our compliance requirements”

These concerns are valid signals about your current system. Don’t dismiss them.

Instead: “You’re right that committing directly to trunk with our current test coverage would be risky. That’s why we need to improve our tests first.”

Start with an Experiment

Don’t mandate TBD for the whole team immediately. Propose a time-boxed experiment:

The Proposal:

“Let’s try this for two weeks with a single small feature. We’ll track what goes well and what hurts. After two weeks, we’ll decide whether to continue, adjust, or stop.”

What to measure during the experiment:

  • How many times did we integrate?
  • How long did merges take?
  • Did we catch issues earlier or later than usual?
  • How did it feel compared to our normal process?

After two weeks: Hold a retrospective. Let the data and experience guide the decision.

Pair on the First Changes

Don’t expect everyone to adopt TBD simultaneously. Instead:

  1. Identify one advocate who wants to try it
  2. Pair with them on the first trunk-based changes
  3. Let them experience the process firsthand
  4. Have them pair with the next person

Knowledge transfer through pairing works better than documentation.

Address Code Review Concerns

“But we need code review!” Yes. TBD doesn’t eliminate code review.

Options that work:

  • Pair or mob programming (review happens in real-time)
  • Commit to trunk, review immediately after, fix forward if issues found
  • Very short-lived branches (hours, not days) with rapid review SLA
  • Pairing on code review and review change

The goal is fast feedback, not zero review.

Handle Skeptics and Blockers

You’ll encounter people who don’t want to change. Don’t force it.

Instead:

  • Let them observe the experiment from the outside
  • Share metrics and outcomes transparently
  • Invite them to pair for one change
  • Let success speak louder than arguments

Some people need to see it working before they believe it.

Get Management Support

Managers often worry about:

  • Reduced control
  • Quality risks
  • Slower delivery (ironically)

Address these with data:

  • Show branch age metrics before/after
  • Track cycle time improvements
  • Demonstrate faster feedback on defects
  • Highlight reduced merge conflicts

Frame TBD as a risk reduction strategy, not a risky experiment.


Working in a Multi-Team Environment

Migrating to TBD gets complicated when you depend on teams still using long-lived branches. Here’s how to handle it.

The Core Problem

You want to integrate daily. Your dependency team integrates weekly or monthly. Their API changes surprise you during their big-bang merge.

You can’t force other teams to change. But you can protect yourself.

Strategy 1: Consumer-Driven Contract Tests

Define the contract you need from the upstream service and codify it in tests that run in your pipeline.

Example using Pact:

// Your consumer test
const { pact } = require('@pact-foundation/pact');

describe('User Service Contract', () => {
  it('returns user profile by ID', async () => {
    await provider.addInteraction({
      state: 'user 123 exists',
      uponReceiving: 'a request for user 123',
      withRequest: {
        method: 'GET',
        path: '/users/123',
      },
      willRespondWith: {
        status: 200,
        body: {
          id: 123,
          name: 'Jane Doe',
          email: 'jane@example.com',
        },
      },
    });

    const user = await userService.getUser(123);
    expect(user.name).toBe('Jane Doe');
  });
});

This test runs against your expectations of the API, not the actual service. When the upstream team changes their API, your contract test fails before you integrate their changes.

Share the contract:

  • Publish your contract to a shared repository
  • Upstream team runs provider verification against your contract
  • If they break your contract, they know before merging

Strategy 2: API Versioning with Backwards Compatibility

If you control the shared service:

// Support both old and new API versions
app.get('/api/v1/users/:id', handleV1Users);
app.get('/api/v2/users/:id', handleV2Users);

// Or use content negotiation
app.get('/api/users/:id', (req, res) => {
  const version = req.headers['api-version'] || 'v1';
  if (version === 'v2') {
    return handleV2Users(req, res);
  }
  return handleV1Users(req, res);
});

Migration path:

  1. Deploy new version alongside old version
  2. Update consumers one by one
  3. After all consumers migrated, deprecate old version
  4. Remove old version after deprecation period

Strategy 3: Strangler Fig Pattern

When you depend on a team that won’t change:

  1. Create an anti-corruption layer between your code and theirs
  2. Define your ideal interface in the adapter
  3. Let the adapter handle their messy API
// Your ideal interface
class UserRepository {
  async getUser(id) {
    // Your clean, typed interface
  }
}

// Adapter that deals with their mess
class LegacyUserServiceAdapter extends UserRepository {
  async getUser(id) {
    const response = await fetch(`https://legacy-service/users/${id}`);
    const messyData = await response.json();

    // Transform their format to yours
    return {
      id: messyData.user_id,
      name: `${messyData.first_name} ${messyData.last_name}`,
      email: messyData.email_address,
    };
  }
}

Now your code depends on your interface, not theirs. When they change, you only update the adapter.

Strategy 4: Feature Toggles for Cross-Team Coordination

When multiple teams need to coordinate a release:

  1. Each team develops behind feature flags
  2. Each team integrates to trunk continuously
  3. Features remain disabled until coordination point
  4. Enable flags in coordinated sequence

This decouples development velocity from release coordination.

When You Can’t Integrate with Dependencies

If upstream dependencies block you from integrating daily:

Short term:

  • Use contract tests to detect breaking changes early
  • Create adapters to isolate their changes
  • Document the integration pain as a business cost

Long term:

  • Advocate for those teams to adopt TBD
  • Share your success metrics
  • Offer to help them migrate

You can’t force other teams to change. But you can demonstrate a better way and make it easier for them to follow.


TBD in Regulated Environments

Regulated industries face legitimate compliance requirements: audit trails, change traceability, separation of duties, and documented approval processes. These requirements often lead teams to believe trunk-based development is incompatible with compliance. This is a misconception.

TBD is about integration frequency, not about eliminating controls. You can meet compliance requirements while still integrating at least daily.

The Compliance Concerns

Common regulatory requirements that seem to conflict with TBD:

Audit Trail and Traceability

  • Every change must be traceable to a requirement, ticket, or change request
  • Changes must be attributable to specific individuals
  • History of what changed, when, and why must be preserved

Separation of Duties

  • The person who writes code shouldn’t be the person who approves it
  • Changes must be reviewed before reaching production
  • No single person should have unchecked commit access

Change Control Process

  • Changes must follow a documented approval workflow
  • Risk assessment before deployment
  • Rollback capability for failed changes

Documentation Requirements

  • Changes must be documented before implementation
  • Testing evidence must be retained
  • Deployment procedures must be repeatable and auditable

Short-Lived Branches: The Compliant Path to TBD

Path 1 from this guide (short-lived branches) directly addresses compliance concerns while maintaining the benefits of TBD.

Short-lived branches mean:

  • Branches live for hours to 2 days maximum, not weeks or months
  • Integration happens at least daily
  • Pull requests are small, focused, and fast to review
  • Review and approval happen within the branch lifetime

This approach satisfies both regulatory requirements and continuous integration principles.

How Short-Lived Branches Meet Compliance Requirements

Audit Trail:

Every commit references the change ticket:

git commit -m "JIRA-1234: Add validation for SSN input

Implements requirement REQ-445 from Q4 compliance review.
Changes limited to user input validation layer."

Modern Git hosting platforms (GitHub, GitLab, Bitbucket) automatically track:

  • Who created the branch
  • Who committed each change
  • Who reviewed and approved
  • When it merged
  • Complete diff history

Separation of Duties:

Use pull request workflows:

  1. Developer creates branch from trunk
  2. Developer commits changes (same day)
  3. Second person reviews and approves (within 24 hours)
  4. Automated checks validate (tests, security scans, compliance checks)
  5. Merge to trunk after approval
  6. Automated deployment with gates

This provides stronger separation of duties than long-lived branches because:

  • Reviews happen while context is fresh
  • Reviewers can actually understand the small changeset
  • Automated checks enforce policies consistently

Change Control Process:

Branch protection rules enforce your process:

# Example GitHub branch protection for trunk
required_reviews: 1
required_checks:
  - unit-tests
  - security-scan
  - compliance-validation
dismiss_stale_reviews: true
require_code_owner_review: true

This ensures:

  • No direct commits to trunk (except in documented break-glass scenarios)
  • Required approvals before merge
  • Automated validation gates
  • Audit log of every merge decision

Documentation Requirements:

Pull request templates enforce documentation:

## Change Description
[Link to Jira ticket]

## Risk Assessment
- [ ] Low risk: Configuration only
- [ ] Medium risk: New functionality, backward compatible
- [ ] High risk: Database migration, breaking change

## Testing Evidence
- [ ] Unit tests added/updated
- [ ] Integration tests pass
- [ ] Manual testing completed (attach screenshots if UI change)
- [ ] Security scan passed

## Rollback Plan
[How to rollback if this causes issues in production]

What “Short-Lived” Means in Practice

Hours, not days:

  • Simple bug fixes: 2-4 hours
  • Small feature additions: 4-8 hours
  • Refactoring: 1-2 days

Maximum 2 days: If a branch can’t merge within 2 days, the work is too large. Decompose it further or use feature flags to integrate incomplete work safely.

Daily integration requirement: Even if the feature isn’t complete, integrate what you have:

  • Behind a feature flag if needed
  • As internal APIs not yet exposed
  • As tests and interfaces before implementation

Compliance-Friendly Tooling

Modern platforms provide compliance features built-in:

Git Hosting (GitHub, GitLab, Bitbucket):

  • Immutable audit logs
  • Branch protection rules
  • Required approvals
  • Status check enforcement
  • Signed commits for authenticity

CI/CD Platforms:

  • Deployment approval gates
  • Audit trails of every deployment
  • Environment-specific controls
  • Automated compliance checks

Feature Flag Systems:

  • Change deployment without code deployment
  • Gradual rollout controls
  • Instant rollback capability
  • Audit log of flag changes

Secrets Management:

  • Vault, AWS Secrets Manager, Azure Key Vault
  • Audit log of secret access
  • Rotation policies
  • Environment isolation

Example: Compliant Short-Lived Branch Workflow

Monday 9 AM: Developer creates branch feature/JIRA-1234-add-audit-logging from trunk.

Monday 9 AM - 2 PM: Developer implements audit logging for user authentication events. Commits reference JIRA-1234. Automated tests run on each commit.

Monday 2 PM: Developer opens pull request:

  • Title: “JIRA-1234: Add audit logging for authentication events”
  • Description includes risk assessment, testing evidence, rollback plan
  • Automated checks run: tests, security scan, compliance validation
  • Code owner automatically assigned for review

Monday 3 PM: Code owner reviews (5-10 minutes; change is small and focused). Suggests minor improvement.

Monday 3:30 PM: Developer addresses feedback, pushes update.

Monday 4 PM: Code owner approves. All automated checks pass. Developer merges to trunk.

Monday 4:05 PM: CI/CD pipeline deploys to staging automatically. Automated smoke tests pass.

Monday 4:30 PM: Deployment gate requires manual approval for production. Tech lead approves based on risk assessment.

Monday 4:35 PM: Automated deployment to production. Audit log captures: what deployed, who approved, when, what checks passed.

Total time: 7.5 hours from branch creation to production.

Full compliance maintained. Full audit trail captured. Daily integration achieved.

When Long-Lived Branches Hide Compliance Problems

Ironically, long-lived branches often create compliance risks:

Stale Reviews: Reviewing a 3-week-old, 2000-line pull request is performative, not effective. Reviewers rubber-stamp because they can’t actually understand the changes.

Integration Risk: Big-bang merges after weeks introduce unexpected behavior. The change that was reviewed isn’t the change that actually deployed (due to merge conflicts and integration issues).

Delayed Feedback: Problems discovered weeks after code was written are expensive to fix and hard to trace to requirements.

Audit Trail Gaps: Long-lived branches often have messy commit history, force pushes, and unclear attribution. The audit trail is polluted.

Regulatory Examples Where Short-Lived Branches Work

Financial Services (SOX, PCI-DSS):

  • Short-lived branches with required approvals
  • Automated security scanning on every PR
  • Separation of duties via required reviewers
  • Immutable audit logs in Git hosting platform
  • Feature flags for gradual rollout and instant rollback

Healthcare (HIPAA):

  • Pull request templates documenting PHI handling
  • Automated compliance checks for data access patterns
  • Required security review for any PHI-touching code
  • Audit logs of deployments
  • Environment isolation enforced by CI/CD

Government (FedRAMP, FISMA):

  • Branch protection requiring government code owner approval
  • Automated STIG compliance validation
  • Signed commits for authenticity
  • Deployment gates requiring authority to operate
  • Complete audit trail from commit to production

The Real Choice

The question isn’t “TBD or compliance.”

The real choice is: compliance theater with long-lived branches and risky big-bang merges, or actual compliance with short-lived branches and safe daily integration.

Short-lived branches provide:

  • Better audit trails (small, traceable changes)
  • Better separation of duties (reviewable changes)
  • Better change control (automated enforcement)
  • Lower risk (small, reversible changes)
  • Faster feedback (problems caught early)

That’s not just compatible with compliance. That’s better compliance.


What Will Hurt (At First)

When you migrate to TBD, you’ll expose every weakness you’ve been avoiding:

  • Slow tests
  • Unclear requirements
  • Fragile integration points
  • Architecture that resists small changes
  • Gaps in automated validation
  • Long manual processes in the value stream

This is not a regression. This is the point.

Problems you discover early are problems you can fix cheaply.


Common Pitfalls to Avoid

Teams migrating to TBD often make predictable mistakes. Here’s how to avoid them.

Pitfall 1: Treating TBD as Just a Branch Renaming Exercise

The mistake: Renaming develop to main and calling it TBD.

Why it fails: You’re still doing long-lived feature branches, just with different names. The fundamental integration problems remain.

What to do instead: Focus on integration frequency, not branch names. Measure time-to-merge, not what you call your branches.

Pitfall 2: Merging Daily Without Actually Integrating

The mistake: Committing to trunk every day, but your code doesn’t interact with anyone else’s work. Your tests don’t cover integration points.

Why it fails: You’re batching integration for later. When you finally connect your component to the rest of the system, you discover incompatibilities.

What to do instead: Ensure your tests exercise the boundaries between components. Use contract tests for service interfaces. Integrate at the interface level, not just at the source control level.

Pitfall 3: Skipping Test Investment

The mistake: “We’ll adopt TBD first, then improve our tests later.”

Why it fails: Without fast, reliable tests, frequent integration is terrifying. You’ll revert to long-lived branches because trunk feels unsafe.

What to do instead: Invest in test infrastructure first. Make your slowest tests faster. Fix flaky tests. Only then increase integration frequency.

Pitfall 4: Using Feature Flags as a Testing Escape Hatch

The mistake: “It’s fine to commit broken code as long as it’s behind a flag.”

Why it fails: Untested code is still untested, flag or no flag. When you enable the flag, you’ll discover the bugs you should have caught earlier.

What to do instead: Test both flag states. Flags hide features from users, not from your test suite.

Pitfall 5: Keeping Flags Forever

The mistake: Creating feature flags and never removing them. Your codebase becomes a maze of conditionals.

Why it fails: Every permanent flag doubles your testing surface area and increases complexity. Eventually, no one knows which flags do what.

What to do instead: Set a removal date when creating each flag. Track flags like technical debt. Remove them aggressively once features are stable.

Pitfall 6: Forcing TBD on an Unprepared Team

The mistake: Mandating TBD before the team understands why or how it works.

Why it fails: People resist changes they don’t understand or didn’t choose. They’ll find ways to work around it or sabotage it.

What to do instead: Start with volunteers. Run experiments. Share results. Let success create pull, not push.

Pitfall 7: Ignoring the Need for Small Changes

The mistake: Trying to do TBD while still working on features that take weeks to complete.

Why it fails: If your work naturally takes weeks, you can’t integrate daily. You’ll create work-in-progress commits that don’t add value.

What to do instead: Learn to decompose work into smaller, independently valuable increments. This is a skill that must be developed.

Pitfall 8: No Clear Definition of “Done”

The mistake: Integrating code that “works on my machine” without validating it in a production-like environment.

Why it fails: Integration bugs don’t surface until deployment. By then, you’ve integrated many other changes, making root cause analysis harder.

What to do instead: Define “integrated” as “deployed to a staging environment and validated.” Your pipeline should do this automatically.

Pitfall 9: Treating Trunk as Unstable

The mistake: “Trunk is where we experiment. Stable code goes in release branches.”

Why it fails: If trunk can’t be released at any time, you don’t have CI. You’ve just moved your integration problems to a different branch.

What to do instead: Trunk must always be production-ready. Use feature flags for incomplete work. Fix broken builds immediately.

Pitfall 10: Forgetting That TBD is a Means, Not an End

The mistake: Optimizing for trunk commits without improving cycle time, quality, or delivery speed.

Why it fails: TBD is valuable because it enables fast feedback and low-cost changes. If those aren’t improving, TBD isn’t working.

What to do instead: Measure outcomes, not activities. Track cycle time, defect rates, deployment frequency, and time to restore service.


When to Pause or Pivot

Sometimes TBD migration stalls or causes more problems than it solves. Here’s how to tell if you need to pause and what to do about it.

Signs You’re Not Ready Yet

Red flag 1: Your test suite takes hours to run If developers can’t get feedback in minutes, they can’t integrate frequently. Forcing TBD now will just slow everyone down.

What to do: Pause the TBD migration. Invest 2-4 weeks in making tests faster. Parallelize test execution. Remove or optimize the slowest tests. Resume TBD when feedback takes less than 10 minutes.

Red flag 2: More than half your tests are flaky If tests fail randomly, developers will ignore failures. You’ll integrate broken code without realizing it.

What to do: Stop adding new features. Spend one sprint fixing or deleting flaky tests. Track flakiness metrics. Only resume TBD when you trust your test results.

Red flag 3: Production incidents increased significantly If TBD caused a spike in production issues, something is wrong with your safety net.

What to do: Revert to short-lived branches (48-72 hours) temporarily. Analyze what’s escaping to production. Add tests or checks to catch those issues. Resume direct-to-trunk when the safety net is stronger.

Red flag 4: The team is in constant conflict If people are fighting about the process, frustrated daily, or actively working around it, you’ve lost the team.

What to do: Hold a retrospective. Listen to concerns without defending TBD. Identify the top 3 pain points. Address those first. Resume TBD migration when the team agrees to try again.

Signs You’re Doing It Wrong (But Can Fix It)

Yellow flag 1: Daily commits, but monthly integration You’re committing to trunk, but your code doesn’t connect to the rest of the system until the end.

What to fix: Focus on interface-level integration. Ensure your tests exercise boundaries between components. Use contract tests.

Yellow flag 2: Trunk is broken often If trunk is red more than 5% of the time, something’s wrong with your testing or commit discipline.

What to fix: Make “fix trunk immediately” the top priority. Consider requiring local tests to pass before pushing. Add pre-commit hooks if needed.

Yellow flag 3: Feature flags piling up If you have more than 5 active flags, you’re not cleaning up after yourself.

What to fix: Set a team rule: “For every new flag created, remove an old one.” Dedicate time each sprint to flag cleanup.

How to Pause Gracefully

If you need to pause:

  1. Communicate clearly: “We’re pausing TBD migration for two weeks to fix our test infrastructure. This isn’t abandoning the goal.”

  2. Set a specific resumption date: Don’t let “pause” become “quit.” Schedule a date to revisit.

  3. Fix the blockers: Use the pause to address the specific problems preventing success.

  4. Retrospect and adjust: When you resume, what will you do differently?

Pausing isn’t failure. Pausing to fix the foundation is smart.


What “Good” Looks Like

You know TBD is working when:

  • Branches live for hours, not days
  • Developers collaborate early instead of merging late
  • Product participates in defining behaviors, not just writing stories
  • Tests run fast enough to integrate frequently
  • Deployments are boring
  • You can fix production issues with the same process you use for normal work

When your deployment process enables emergency fixes without special exceptions, you’ve reached the real payoff: lower cost of change, which makes everything else faster, safer, and more sustainable.


Concrete Examples and Scenarios

Theory is useful. Examples make it real. Here are practical scenarios showing how to apply TBD principles.

Scenario 1: Breaking Down a Large Feature

Problem: You need to build a user notification system with email, SMS, and in-app notifications. Estimated: 3 weeks of work.

Old approach (GitFlow): Create a feature/notifications branch. Work for three weeks. Submit a massive pull request. Spend days in code review and merge conflicts.

TBD approach:

Week 1:

  • Day 1: Define notification interface, commit to trunk

    // notifications/NotificationService.ts
    interface NotificationService {
      send(userId: string, message: NotificationMessage): Promise<void>;
    }
    
    interface NotificationMessage {
      title: string;
      body: string;
      priority: 'low' | 'normal' | 'high';
    }
    

    This compiles but doesn’t do anything yet. That’s fine.

  • Day 2: Add in-memory implementation for testing

    class InMemoryNotificationService implements NotificationService {
      private notifications: NotificationMessage[] = [];
    
      async send(userId: string, message: NotificationMessage) {
        this.notifications.push(message);
      }
    }
    

    Now other teams can use the interface in their code and tests.

  • Day 3-5: Implement email notifications behind a feature flag

    class EmailNotificationService implements NotificationService {
      async send(userId: string, message: NotificationMessage) {
        if (!features.emailNotifications) {
          return; // No-op when disabled
        }
        // Real implementation
      }
    }
    

    Commit daily. Deploy. Flag is off in production.

Week 2:

  • Add SMS notifications (same pattern: interface, implementation, feature flag)
  • Enable email notifications for internal users only
  • Iterate based on feedback

Week 3:

  • Add in-app notifications
  • Roll out email and SMS to all users
  • Remove flags for email once stable

Result: Integrated 12-15 times instead of once. Each integration was small and low-risk.

Scenario 2: Database Schema Change

Problem: You need to split the users.name column into first_name and last_name.

Old approach: Update schema, update all code, deploy everything at once. Hope nothing breaks.

TBD approach (expand-contract pattern):

Step 1: Expand (Day 1) Add new columns without removing the old one:

ALTER TABLE users ADD COLUMN first_name VARCHAR(255);
ALTER TABLE users ADD COLUMN last_name VARCHAR(255);

Commit and deploy. Application still uses name column. No breaking change.

Step 2: Dual writes (Day 2-3) Update write path to populate both old and new columns:

async function createUser(name) {
  const [firstName, lastName] = name.split(' ');
  await db.query(
    'INSERT INTO users (name, first_name, last_name) VALUES (?, ?, ?)',
    [name, firstName, lastName]
  );
}

Commit and deploy. Now new data populates both formats.

Step 3: Backfill (Day 4) Migrate existing data in the background:

async function backfillNames() {
  const users = await db.query('SELECT id, name FROM users WHERE first_name IS NULL');
  for (const user of users) {
    const [firstName, lastName] = user.name.split(' ');
    await db.query(
      'UPDATE users SET first_name = ?, last_name = ? WHERE id = ?',
      [firstName, lastName, user.id]
    );
  }
}

Run this as a background job. Commit and deploy.

Step 4: Read from new columns (Day 5) Update read path behind a feature flag:

async function getUser(id) {
  const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
  if (features.useNewNameColumns) {
    return {
      firstName: user.first_name,
      lastName: user.last_name,
    };
  }
  return { name: user.name };
}

Deploy and gradually enable the flag.

Step 5: Contract (Week 2) Once all reads use new columns and flag is removed:

ALTER TABLE users DROP COLUMN name;

Result: Five deployments instead of one big-bang change. Each step was reversible. Zero downtime.

Scenario 3: Refactoring Without Breaking the World

Problem: Your authentication code is a mess. You want to refactor it without breaking production.

TBD approach:

Day 1: Characterization tests Write tests that capture current behavior (warts and all):

describe('Current auth behavior', () => {
  it('accepts password with special characters', () => {
    // Document what currently happens
  });

  it('handles malformed tokens by returning 401', () => {
    // Capture edge case behavior
  });
});

These tests document how the system actually works. Commit.

Day 2-3: Strangler fig pattern Create new implementation alongside old one:

class LegacyAuthService {
  // Existing messy code (don't touch it)
}

class ModernAuthService {
  // Clean implementation
}

class AuthServiceRouter {
  constructor(private legacy: LegacyAuthService, private modern: ModernAuthService) {}

  async authenticate(credentials) {
    if (features.modernAuth) {
      return this.modern.authenticate(credentials);
    }
    return this.legacy.authenticate(credentials);
  }
}

Commit with flag off. Old behavior unchanged.

Day 4-7: Migrate piece by piece Enable modern auth for one endpoint at a time:

if (features.modernAuth && endpoint === '/api/users') {
  return modernAuth.authenticate(credentials);
}

Commit daily. Monitor each endpoint.

Week 2: Remove old code Once all endpoints use modern auth and it’s been stable for a week:

class AuthService {
  async authenticate(credentials) {
    // Just the modern implementation
  }
}

Delete the legacy code entirely.

Result: Continuous refactoring without a “big rewrite” branch. Production was never at risk.

Scenario 4: Working with External API Changes

Problem: A third-party API you depend on is changing their response format next month.

TBD approach:

Week 1: Adapter pattern Create an adapter that normalizes both old and new formats:

class PaymentAPIAdapter {
  async getPaymentStatus(orderId) {
    const response = await fetch(`https://api.payments.com/orders/${orderId}`);
    const data = await response.json();

    // Handle both old and new format
    if (data.payment_status) {
      // Old format
      return {
        status: data.payment_status,
        amount: data.total_amount,
      };
    } else {
      // New format
      return {
        status: data.status.payment,
        amount: data.amounts.total,
      };
    }
  }
}

Commit. Your code now works with both formats.

Week 2-3: Wait for the third-party API to migrate. Your code keeps working.

Week 4 (after API migration): Simplify adapter to only handle new format:

async getPaymentStatus(orderId) {
  const response = await fetch(`https://api.payments.com/orders/${orderId}`);
  const data = await response.json();
  return {
    status: data.status.payment,
    amount: data.amounts.total,
  };
}

Result: No coupling between your deployment schedule and the external API migration. Zero downtime.


References and Further Reading

Trunk-Based Development

Core Resources:

Testing Practices

ATDD and BDD:

Test-Driven Development:

  • “Test-Driven Development: By Example” by Kent Beck - TDD fundamentals
  • “Growing Object-Oriented Software, Guided by Tests” by Steve Freeman and Nat Pryce - TDD at scale

Contract Testing:

Patterns for Incremental Change

Database Migrations:

Legacy Code:

  • “Working Effectively with Legacy Code” by Michael Feathers - Characterization tests and strangler patterns
  • Strangler Fig Application - Incremental rewrites

Team Dynamics and Change Management

  • “Accelerate” by Nicole Forsgren, Jez Humble, and Gene Kim - Data on what drives software delivery performance
  • “Team Topologies” by Matthew Skelton and Manuel Pais - Organizing teams for fast flow
  • State of DevOps Reports - Annual research on delivery practices

Continuous Integration

Communities and Discussions


Final Thought

Migrating from GitFlow to TBD isn’t a matter of changing your branching strategy. It’s a matter of changing your thinking.

Stop optimizing for isolation. Start optimizing for feedback.

Small, tested, integrated changes, delivered continuously, will always outperform big batches delivered occasionally.

That’s why teams migrate to TBD. Not because it’s trendy, but because it’s the only path to real continuous integration and continuous delivery.

2 - Testing Fundamentals

Build a test architecture that gives your pipeline the confidence to deploy any change, even when dependencies outside your control are unavailable.

Phase 1 - Foundations | Adapted from Dojo Consortium

Before you can trust your pipeline, you need a test suite that is fast, deterministic, and catches real defects. But a collection of tests is not enough. You need a test architecture - a deliberate structure where different types of tests work together to give you the confidence to deploy every change, regardless of whether external systems are up, slow, or behaving unexpectedly.

Why Testing Is a Foundation

Continuous delivery requires that trunk always be releasable. The only way to know trunk is releasable is to test it - automatically, on every change. Without a reliable test suite, daily integration is just daily risk.

In many organizations, testing is the single biggest obstacle to CD adoption. Not because teams lack tests, but because the tests they have are slow, flaky, poorly structured, and - most critically - unable to give the pipeline a reliable answer to the question: is this change safe to deploy?

Testing Goals for CD

Your test suite must meet these criteria before it can support continuous delivery:

Goal Target Why
Fast Full suite completes in under 10 minutes Developers need feedback before context-switching
Deterministic Same code always produces the same test result Flaky tests destroy trust and get ignored
Catches real bugs Tests fail when behavior is wrong, not when implementation changes Brittle tests create noise, not signal
Independent of external systems Pipeline can determine deployability without any dependency being available Your ability to deploy cannot be held hostage by someone else’s outage

If your test suite does not meet these criteria today, improving it is your highest-priority foundation work.

Beyond the Test Pyramid

The test pyramid - many unit tests at the base, fewer integration tests in the middle, a handful of end-to-end tests at the top - has been the dominant mental model for test strategy since Mike Cohn introduced it. The core insight is sound: push testing as low as possible. Lower-level tests are faster, more deterministic, and cheaper to maintain. Higher-level tests are slower, more brittle, and more expensive.

But as a prescriptive model, the pyramid is overly simplistic. Teams that treat it as a rigid ratio end up in unproductive debates about whether they have “too many” integration tests or “not enough” unit tests. The shape of your test distribution matters far less than whether your tests, taken together, give you the confidence to deploy.

What actually matters

The pyramid’s principle - write tests with different granularity - remains correct. But for CD, the question is not “do we have the right pyramid shape?” The question is:

Can our pipeline determine that a change is safe to deploy without depending on any system we do not control?

This reframes the testing conversation. Instead of counting tests by type and trying to match a diagram, you design a test architecture where:

  1. Fast, deterministic tests catch the vast majority of defects and run on every commit. These tests use test doubles for anything outside the team’s control. They give you a reliable go/no-go signal in minutes.

  2. Contract tests verify that your test doubles still match reality. They run asynchronously and catch drift between your assumptions and the real world - without blocking your pipeline.

  3. A small number of non-deterministic tests validate that the fully integrated system works. These run post-deployment and provide monitoring, not gating.

This structure means your pipeline can confidently say “yes, deploy this” even if a downstream API is having an outage, a third-party service is slow, or a partner team hasn’t deployed their latest changes yet. Your ability to deliver is decoupled from the reliability of systems you do not own.

The anti-pattern: the ice cream cone

Most teams that struggle with CD have an inverted test distribution - too many slow, expensive end-to-end tests and too few fast, focused tests.

        ┌─────────────────────────┐
        │    Manual Testing       │  ← Most testing happens here
        ├─────────────────────────┤
        │   End-to-End Tests      │  ← Slow, flaky, expensive
        ├─────────────────────────┤
        │  Integration Tests      │  ← Some, but not enough
        ├───────────┤
        │Unit Tests │              ← Too few
        └───────────┘

The ice cream cone makes CD impossible. Manual testing gates block every release. End-to-end tests take hours, fail randomly, and depend on external systems being healthy. The pipeline cannot give a fast, reliable answer about deployability, so deployments become high-ceremony events.

What to Test - and What Not To

Before diving into the architecture, internalize the mindset that makes it work. The test architecture below is not just a structure to follow - it flows from a few principles about what testing should focus on and what it should ignore.

Interfaces are the most important thing to test

Most integration failures originate at interfaces - the boundaries where your system talks to other systems. These boundaries are the highest-risk areas in your codebase, and they deserve the most testing attention. But testing interfaces does not require integrating with the real system on the other side.

When you test an interface you consume, the question is: “Can I understand the response and act accordingly?” If you send a request for a user’s information, you do not test that you get that specific user back. You test that you receive and understand the properties you need - that your code can parse the response structure and make correct decisions based on it. This distinction matters because it keeps your tests deterministic and focused on what you control.

Use contract mocks, virtual services, or any test double that faithfully represents the interface contract. The test validates your side of the conversation, not theirs.

Frontend and backend follow the same pattern

Both frontend and backend applications provide interfaces to consumers and consume interfaces from providers. The only difference is the consumer: a frontend provides an interface for humans, while a backend provides one for machines. The testing strategy is the same.

For a frontend:

  • Validate the interface you provide. The UI contains the components it should and they appear correctly. This is the equivalent of verifying your API returns the right response structure.
  • Test behavior isolated from presentation. Use your unit test framework to test the logic that UI controls trigger, separated from the rendering layer. This gives you the same speed and control you get from testing backend logic in isolation.
  • Verify that controls trigger the right logic. Confirm that user actions invoke the correct behavior, without needing a running backend or browser-based E2E test.

This approach gives you targeted testing with far more control. Testing exception flows - what happens when a service returns an error, when a network request times out, when data is malformed - becomes straightforward instead of requiring elaborate E2E setups that are hard to make fail on demand.

If you cannot fix it, do not test for it

This is the principle that most teams get wrong. You should never test the behavior of services you consume. Testing their behavior is the responsibility of the team that builds them. If their service returns incorrect data, you cannot fix that - so testing for it is waste.

What you should test is how your system responds when a consumed service is unstable or unavailable. Can you degrade gracefully? Do you return a meaningful error? Do you retry appropriately? These are behaviors you own and can fix, so they belong in your test suite.

This principle directly enables the test architecture below. When you stop testing things you cannot fix, you stop depending on external systems in your pipeline. Your tests become faster, more deterministic, and more focused on the code your team actually ships.

Test Architecture for the CD Pipeline

A test architecture is the deliberate structure of how different test types work together across your pipeline to give you deployment confidence. Each layer has a specific role, and the layers reinforce each other.

Layer 1: Unit tests - verify behavior in isolation

Unit tests exercise a unit of behavior - a single meaningful action or decision your code makes - with all external dependencies replaced by test doubles. They use a black box approach: assert on what the code produces, not on how it works internally. They are the fastest and most deterministic tests you have.

Role in CD: Catch logic errors, regressions, and edge cases instantly. Provide the tightest feedback loop - developers should see results in seconds while coding. Because they test behavior rather than implementation, they survive refactoring without breaking.

What they cannot do: Verify that components work together, that your code correctly calls external services, or that the system behaves correctly as a whole.

See Unit Tests for detailed guidance.

Sociable vs solitary unit tests

Unit tests fall into two styles. Solitary unit tests replace every collaborator with a test double so the class under test runs completely alone. Sociable unit tests allow the code to use its real collaborators, only substituting test doubles for external dependencies (databases, network calls, file systems).

Prefer sociable unit tests as your default. Solitary tests can over-specify internal structure, tying your tests to implementation details that break during refactoring. Sociable tests exercise the real interactions between objects, catching integration issues earlier without sacrificing speed. Reserve solitary tests for cases where a collaborator is expensive, non-deterministic, or not yet built.

Layer 2: Integration tests - verify boundaries

Integration tests verify that components interact correctly at their boundaries: database queries return the expected data, HTTP clients serialize requests correctly, message producers format messages as expected. External systems are replaced with test doubles, but internal collaborators are real.

Role in CD: Catch the bugs that unit tests miss - mismatched interfaces, serialization errors, query bugs. These tests are fast enough to run on every commit but realistic enough to catch real integration failures.

What they cannot do: Verify that the system works end-to-end from a user’s perspective, or that your assumptions about external services are still correct.

The line between unit tests and integration tests is often debated. As Ham Vocke writes in The Practical Test Pyramid: the naming matters less than the discipline. The key question is whether the test is fast, deterministic, and tests something your unit tests cannot. If yes, it belongs here.

See Integration Tests for detailed guidance.

Layer 3: Functional tests - verify your system works in isolation

Functional tests (also called component tests) exercise your entire sub-system - your service, your application - from the outside, as a user or consumer would interact with it. All external dependencies are replaced with test doubles. The test boots your application, sends real HTTP requests or simulates real user interactions, and verifies the responses.

Role in CD: This is the layer that proves your system works as a complete unit, independent of everything else. Functional tests answer: “if we deploy this service right now, will it behave correctly for every interaction that is within our control?” Because all external dependencies are stubbed, these tests are deterministic and fast. They can run on every commit.

Why this layer is critical for CD: Functional tests are what allow you to deploy with confidence even when dependencies outside your control are unavailable. Your test doubles simulate the expected behavior of those dependencies. As long as your doubles are accurate (which is what contract tests verify), your functional tests prove your system handles those interactions correctly.

See Functional Tests for detailed guidance.

Layer 4: Contract tests - verify your assumptions about others

Contract tests validate that the test doubles you use in layers 1-3 still accurately represent the real external systems. They run against live dependencies and check contract format - response structures, field names, types, and status codes - not specific data values.

Role in CD: Contract tests are the bridge between your fast, deterministic test suite and the real world. Without them, your test doubles can silently drift from reality, and your functional tests provide false confidence. With them, you know that the assumptions baked into your test doubles are still correct.

Consumer-driven contracts take this further: the consumer of an API publishes expectations (using tools like Pact), and the provider runs those expectations as part of their build. Both teams know immediately when a change would break the contract.

Contract tests are non-deterministic because they hit live systems. They should not block your pipeline. Instead, failures trigger a review: has the contract changed, or was it a transient network issue? If the contract has changed, update your test doubles and re-verify.

See Contract Tests for detailed guidance.

Layer 5: End-to-end tests - verify the integrated system post-deployment

End-to-end tests validate complete user journeys through the fully integrated system with no test doubles. They run against real services, real databases, and real third-party integrations.

Role in CD: E2E tests are monitoring, not gating. They run after deployment to verify that the integrated system works. A small suite of smoke tests can run immediately post-deployment to catch gross integration failures. Broader E2E suites run on a schedule.

Why E2E tests should not gate your pipeline: E2E tests are non-deterministic. They fail for reasons unrelated to your change - network blips, third-party outages, shared environment instability. If your pipeline depends on E2E tests passing before you can deploy, your deployment frequency is limited by the reliability of every system in the chain. This is the opposite of the independence CD requires.

See End-to-End Tests for detailed guidance.

How the layers work together

Pipeline stage    Test layer              Deterministic?   Blocks deploy?
─────────────────────────────────────────────────────────────────────────
On every commit   Unit tests              Yes              Yes
                  Integration tests       Yes              Yes
                  Functional tests        Yes              Yes

Asynchronous      Contract tests          No               No (triggers review)

Post-deployment   E2E smoke tests         No               Triggers rollback if critical
                  Synthetic monitoring    No               Triggers alerts

The critical insight: everything that blocks deployment is deterministic and under your control. Everything that involves external systems runs asynchronously or post-deployment. This is what gives you the independence to deploy any time, regardless of the state of the world around you.

Pre-merge vs post-merge

The table above maps to two distinct phases of your pipeline, each with different goals and constraints.

Pre-merge (before code lands on trunk): Run unit, integration, and functional tests. These must all be deterministic and fast. Target: under 10 minutes total. This is the quality gate that every change must pass. If pre-merge tests are slow, developers batch up changes or skip local runs, both of which undermine continuous integration.

Post-merge (after code lands on trunk, before or after deployment): Re-run the full deterministic suite against the integrated trunk to catch merge-order interactions. Run contract tests, E2E smoke tests, and synthetic monitoring. Target: under 30 minutes for the full post-merge cycle.

Why re-run pre-merge tests post-merge? Two changes can each pass pre-merge independently but conflict when combined on trunk. The post-merge run catches these integration effects. If a post-merge failure occurs, the team fixes it immediately - trunk must always be releasable.

Starting Without Full Coverage

Teams often delay adopting CI because their existing code lacks tests. This is backwards. You do not need tests for existing code to begin. You need one rule applied without exception:

Every new change gets a test. We will not go lower than the current level of code coverage.

Record your current coverage percentage as a baseline. Configure CI to fail if coverage drops below that number. This does not mean the baseline is good enough - it means the trend only moves in one direction. Every bug fix, every new feature, and every refactoring adds tests. Over time, coverage grows organically in the areas that matter most: the code that is actively changing.

Do not attempt to retrofit tests across the entire codebase before starting CI. That approach takes months, delivers no incremental value, and often produces low-quality tests written by developers who are testing code they did not write and do not fully understand.

Test Quality Over Coverage Percentage

Code coverage tells you which lines executed during tests. It does not tell you whether the tests verified anything meaningful. A test suite with 90% coverage and no assertions has high coverage and zero value.

Better questions than “what is our coverage percentage?”:

  • When a test fails, does it point directly to the defect?
  • When we refactor, do tests break because behavior changed or because implementation details shifted?
  • Do our tests catch the bugs that actually reach production?
  • Can a developer trust a green build enough to deploy immediately?

Why coverage mandates are harmful. When teams are required to hit a coverage target, they write tests to satisfy the metric rather than to verify behavior. This produces tests that exercise code paths without asserting outcomes, tests that mirror implementation rather than specify behavior, and tests that inflate the number without improving confidence. The metric goes up while the defect escape rate stays the same. Worse, meaningless tests add maintenance cost and slow down the suite.

Instead of mandating a coverage number, set a floor (as described above) and focus team attention on test quality: mutation testing scores, defect escape rates, and whether developers actually trust the suite enough to deploy on green.

Week 1 Action Plan

If your test suite is not yet ready to support CD, use this focused action plan to make immediate progress.

Day 1-2: Audit your current test suite

Assess where you stand before making changes.

Actions:

  • Run your full test suite 3 times. Note total duration and any tests that pass intermittently (flaky tests).
  • Count tests by type: unit, integration, functional, end-to-end.
  • Identify tests that require external dependencies (databases, APIs, file systems) to run.
  • Record your baseline: total test count, pass rate, duration, flaky test count.
  • Map each test type to a pipeline stage. Which tests gate deployment? Which run asynchronously? Which tests couple your deployment to external systems?

Output: A clear picture of your test distribution and the specific problems to address.

Day 2-3: Fix or remove flaky tests

Flaky tests are worse than no tests. They train developers to ignore failures, which means real failures also get ignored.

Actions:

  • Quarantine all flaky tests immediately. Move them to a separate suite that does not block the build.
  • For each quarantined test, decide: fix it (if the behavior it tests matters) or delete it (if it does not).
  • Common causes of flakiness: timing dependencies, shared mutable state, reliance on external services, test order dependencies.
  • Target: zero flaky tests in your main test suite by end of week.

Day 3-4: Decouple your pipeline from external dependencies

This is the highest-leverage change for CD. Identify every test that calls a real external service and replace that dependency with a test double.

Actions:

  • List every external service your tests depend on: databases, APIs, message queues, file storage, third-party services.
  • For each dependency, decide the right test double approach:
    • In-memory fakes for databases (e.g., SQLite, H2, testcontainers with local instances).
    • HTTP stubs for external APIs (e.g., WireMock, nock, MSW).
    • Fakes for message queues, email services, and other infrastructure.
  • Replace the dependencies in your unit, integration, and functional tests.
  • Move the original tests that hit real services into a separate suite - these become your starting contract tests or E2E smoke tests.

Output: A test suite where everything that blocks the build is deterministic and runs without network access to external systems.

Day 4-5: Add functional tests for critical paths

If you don’t have functional tests (component tests) that exercise your whole service in isolation, start with the most critical paths.

Actions:

  • Identify the 3-5 most critical user journeys or API endpoints in your application.
  • Write a functional test for each: boot the application, stub external dependencies, send a real request or simulate a real user action, verify the response.
  • Each functional test should prove that the feature works correctly assuming external dependencies behave as expected (which your test doubles encode).
  • Run these in CI on every commit.

Day 5: Set up contract tests for your most important dependency

Pick the external dependency that changes most frequently or has caused the most production issues. Set up a contract test for it.

Actions:

  • Write a contract test that validates the response structure (types, required fields, status codes) of the dependency’s API.
  • Run it on a schedule (e.g., every hour or daily), not on every commit.
  • When it fails, update your test doubles to match the new reality and re-verify your functional tests.
  • If the dependency is owned by another team in your organization, explore consumer-driven contracts with a tool like Pact.

Test-Driven Development (TDD)

TDD is the practice of writing the test before the code. It is the most effective way to build a reliable test suite because it ensures every piece of behavior has a corresponding test.

The TDD cycle:

  1. Red: Write a failing test that describes the behavior you want.
  2. Green: Write the minimum code to make the test pass.
  3. Refactor: Improve the code without changing the behavior. The test ensures you do not break anything.

Why TDD supports CD:

  • Every change is automatically covered by a test
  • The test suite grows proportionally with the codebase
  • Tests describe behavior, not implementation, making them more resilient to refactoring
  • Developers get immediate feedback on whether their change works

TDD is not mandatory for CD, but teams that practice TDD consistently have significantly faster and more reliable test suites.

Getting started with TDD

If your team is new to TDD, start small:

  1. Pick one new feature or bug fix this week.
  2. Write the test first, watch it fail.
  3. Write the code to make it pass.
  4. Refactor.
  5. Repeat for the next change.

Do not try to retroactively TDD your entire codebase. Apply TDD to new code and to any code you modify.

Testing Matrix

Use this reference to decide what type of test to write and where it runs in your pipeline.

What You Need to Verify Test Type Speed Deterministic? Blocks Deploy? See Also
A function or method behaves correctly Unit Milliseconds Yes Yes
Components interact correctly at a boundary Integration Milliseconds to seconds Yes Yes
Your whole service works in isolation Functional Seconds Yes Yes
Your test doubles match reality Contract Seconds No No
A critical user journey works end-to-end E2E Minutes No No
Code quality, security, and style compliance Static Analysis Seconds Yes Yes

Best Practices Summary

Do

  • Run tests on every commit. If tests do not run automatically, they will be skipped.
  • Keep the deterministic suite under 10 minutes. If it is slower, developers will stop running it locally.
  • Fix broken tests immediately. A broken test is equivalent to a broken build.
  • Delete tests that do not provide value. A test that never fails and tests trivial behavior is maintenance cost with no benefit.
  • Test behavior, not implementation. Use a black box approach - verify what the code does, not how it does it. As Ham Vocke advises: “if I enter values x and y, will the result be z?” - not the sequence of internal calls that produce z. Avoid white box testing that asserts on internals.
  • Use test doubles for external dependencies. Your deterministic tests should run without network access to external systems.
  • Validate test doubles with contract tests. Test doubles that drift from reality give false confidence.
  • Treat test code as production code. Give it the same care, review, and refactoring attention.

Do Not

  • Do not tolerate flaky tests. Quarantine or delete them immediately.
  • Do not gate your pipeline on non-deterministic tests. E2E and contract test failures should trigger review or alerts, not block deployment.
  • Do not couple your deployment to external system availability. If a third-party API being down prevents you from deploying, your test architecture has a critical gap.
  • Do not write tests after the fact as a checkbox exercise. Tests written without understanding the behavior they verify add noise, not value.
  • Do not test private methods directly. Test the public interface; private methods are tested indirectly.
  • Do not share mutable state between tests. Each test should set up and tear down its own state.
  • Do not use sleep/wait for timing-dependent tests. Use explicit waits, polling, or event-driven assertions.
  • Do not require a running database or external service for unit tests. That makes them integration tests - which is fine, but categorize them correctly.

Using Tests to Find and Eliminate Defect Sources

A test suite that catches bugs is good. A test suite that helps you stop producing those bugs is transformational. Every test failure is evidence of a defect, and every defect has a source. If you treat test failures only as things to fix, you are doing rework. If you treat them as diagnostic data about where your process breaks down, you can make systemic changes that prevent entire categories of defects from occurring.

This is the difference between a team that writes more tests to catch more bugs and a team that changes how it works so that fewer bugs are created in the first place.

Two questions sharpen this thinking:

  1. What is the earliest point we can detect this defect? The later a defect is found, the more expensive it is to fix. A requirements defect caught during example mapping costs minutes. The same defect caught in production costs days of incident response, rollback, and rework.
  2. Can AI help us detect it earlier? AI-assisted tools can now surface defects at stages where only human review was previously possible, shifting detection left without adding manual effort.

Trace every defect to its origin

When a test catches a defect - or worse, when a defect escapes to production - ask: where was this defect introduced, and what would have prevented it from being created?

Defects do not originate randomly. They cluster around specific causes. The CD Defect Detection and Remediation Catalog documents over 30 defect types across eight categories, with detection methods, AI opportunities, and systemic fixes for each. The examples below illustrate the pattern for the defect sources most commonly encountered during a CD migration.

Requirements

Example defects Building the right thing wrong, or the wrong thing right
Earliest detection Discovery - before coding begins, during story refinement or example mapping
Traditional detection UX analytics, task completion tracking, A/B testing (all post-deployment)
AI-assisted detection LLM review of acceptance criteria to flag ambiguity, missing edge cases, or contradictions before development begins. AI-generated test scenarios from user stories to validate completeness.
Systemic fix Acceptance criteria as user outcomes, not implementation tasks. Three Amigos sessions before work starts. Example mapping to surface edge cases before coding begins.

Missing domain knowledge

Example defects Business rules encoded incorrectly, implicit assumptions, tribal knowledge loss
Earliest detection During coding - when the developer writes the logic
Traditional detection Magic number detection, knowledge-concentration metrics, bus factor analysis from git history
AI-assisted detection Identify undocumented business rules, missing context that a new developer would hit, and knowledge gaps. Compare implementation against domain documentation or specification files.
Systemic fix Embed domain rules in code using ubiquitous language (DDD). Pair programming to spread knowledge. Living documentation generated from code. Rotate ownership regularly.

Integration boundaries

Example defects Interface mismatches, wrong assumptions about upstream behavior, race conditions at service boundaries
Earliest detection During design - when defining the interface contract
Traditional detection Consumer-driven contract tests, schema validation, chaos engineering, fault injection
AI-assisted detection Review code and documentation to identify undocumented behavioral assumptions (timeouts, retries, error semantics). Predict which consumers break from API changes based on usage patterns when formal contracts do not exist.
Systemic fix Contract tests mandatory per boundary. API-first design. Document behavioral contracts, not just data schemas. Circuit breakers as default at every external boundary.

Untested edge cases

Example defects Null handling, boundary values, error paths
Earliest detection Pre-commit - through null-safe type systems and static analysis in the IDE
Traditional detection Mutation testing, branch coverage thresholds, property-based testing
AI-assisted detection Analyze code paths and generate tests for untested boundaries, null paths, and error conditions the developer did not consider. Triage surviving mutants by risk.
Systemic fix Require a test for every bug fix. Adopt property-based testing for logic with many input permutations. Boundary value analysis as a standard practice. Enforce null-safe type systems.

Unintended side effects

Example defects Change to module A breaks module B, unexpected feature interactions
Earliest detection At commit time - when CI runs the full test suite
Traditional detection Mutation testing, change impact analysis, feature flag interaction matrix
AI-assisted detection Reason about semantic change impact beyond syntactic dependencies. Map a diff to affected modules and flag untested downstream paths before the commit reaches CI.
Systemic fix Small focused commits. Trunk-based development (integrate daily so side effects surface immediately). Feature flags with controlled rollout. Modular design with clear boundaries.

Accumulated complexity

Example defects Defects cluster in the most complex, most-changed files
Earliest detection Continuously - through static analysis in the IDE and CI
Traditional detection Complexity trends, duplication scoring, dependency cycle detection
AI-assisted detection Identify architectural drift, abstraction decay, and calcified workarounds that static analysis misses. Cross-reference change frequency with defect history to prioritize refactoring.
Systemic fix Refactoring as part of every story, not deferred to a “tech debt sprint.” Dedicated complexity budget. Treat rising complexity as a leading indicator.

Process and deployment

Example defects Long-lived branches causing merge conflicts, manual pipeline steps introducing human error, excessive batching increasing blast radius, weak rollback causing extended outages
Earliest detection Pre-commit for branch age; CI for pipeline and batching issues
Traditional detection Branch age alerts, merge conflict frequency, pipeline audit for manual gates, changes-per-deploy metrics, rollback testing
AI-assisted detection Automated risk scoring from change diffs and deployment history. Blast radius analysis. Auto-approve low-risk changes and flag high-risk with evidence, replacing manual change advisory boards.
Systemic fix Trunk-based development. Automate every step from commit to production. Single-piece flow with feature flags. Blue/green or canary as default deployment strategy.

Data and state

Example defects Null pointer exceptions, schema migration failures, cache invalidation errors, concurrency issues
Earliest detection Pre-commit for null safety; CI for schema compatibility
Traditional detection Null safety static analysis, schema compatibility checks, migration dry-runs, thread sanitizers
AI-assisted detection Predict downstream impact of schema changes by understanding how consumers actually use data. Flag code where optional fields are used without null checks, even in non-strict languages.
Systemic fix Enforce null-safe types. Expand-then-contract for all schema changes. Design for idempotency. Short TTLs over complex cache invalidation.

For the complete catalog covering all defect categories - including product and discovery, dependency and infrastructure, testing and observability gaps, and more - see the CD Defect Detection and Remediation Catalog.

Build a defect feedback loop

Knowing the categories is not enough. You need a process that systematically connects test failures to root causes and root causes to systemic fixes.

Step 1: Classify every defect. When a test fails or a bug is reported, tag it with its origin category from the table above. This takes seconds and builds a dataset over time.

Step 2: Look for patterns. Monthly (or during retrospectives), review the defect classifications. Which categories appear most often? That is where your process is weakest.

Step 3: Apply the systemic fix, not just the local fix. When you fix a bug, also ask: what systemic change would prevent this entire category of bug? If most defects come from integration boundaries, the fix is not “write more integration tests” - it is “make contract tests mandatory for every new boundary.” If most defects come from untested edge cases, the fix is not “increase code coverage” - it is “adopt property-based testing as a standard practice.”

Step 4: Measure whether the fix works. Track defect counts by category over time. If you applied a systemic fix for integration boundary defects and the count does not drop, the fix is not working and you need a different approach.

The test-for-every-bug-fix rule

One of the most effective systemic practices: every bug fix must include a test that reproduces the bug before the fix and passes after. This is non-negotiable for CD because:

  • It proves the fix actually addresses the defect (not just the symptom).
  • It prevents the same defect from recurring.
  • It builds test coverage exactly where the codebase is weakest - the places where bugs actually occur.
  • Over time, it shifts your test suite from “tests we thought to write” to “tests that cover real failure modes.”

Advanced detection techniques

As your test architecture matures, add techniques that find defects humans overlook:

Technique What It Finds When to Adopt
Mutation testing (Stryker, PIT) Tests that pass but do not actually verify behavior - your test suite’s blind spots When basic coverage is in place but defect escape rate is not dropping
Property-based testing Edge cases and boundary conditions across large input spaces that example-based tests miss When defects cluster around unexpected input combinations
Chaos engineering Failure modes in distributed systems - what happens when a dependency is slow, returns errors, or disappears When you have functional tests and contract tests in place and need confidence in failure handling
Static analysis and linting Null safety violations, type errors, security vulnerabilities, dead code From day one - these are cheap and fast

For more examples of mapping defect origins to detection methods and systemic corrections, see the CD Defect Detection and Remediation Catalog.

Measuring Success

Metric Target Why It Matters
Deterministic suite duration < 10 minutes Enables fast feedback loops
Flaky test count 0 in pipeline-gating suite Maintains trust in test results
External dependencies in gating tests 0 Ensures deployment independence
Test coverage trend Increasing Confirms new code is being tested
Defect escape rate Decreasing Confirms tests catch real bugs
Contract test freshness All passing within last 24 hours Confirms test doubles are current

Next Step

With a reliable test suite in place, automate your build process so that building, testing, and packaging happens with a single command. Continue to Build Automation.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0. Additional concepts drawn from Ham Vocke, The Practical Test Pyramid, and Toby Clemson, Testing Strategies in a Microservice Architecture.


3 - Build Automation

Automate your build process so a single command builds, tests, and packages your application.

Phase 1 - Foundations | Adapted from Dojo Consortium

Build automation is the mechanism that turns trunk-based development and testing into a continuous integration loop. If you cannot build, test, and package your application with a single command, you cannot automate your pipeline. This page covers the practices that make your build reproducible, fast, and trustworthy.

What Build Automation Means

Build automation is the practice of scripting every step required to go from source code to a deployable artifact. A single command - or a single CI trigger - should execute the entire sequence:

  1. Compile the source code (if applicable)
  2. Run all automated tests
  3. Package the application into a deployable artifact (container image, binary, archive)
  4. Report the result (pass or fail, with details)

No manual steps. No “run this script, then do that.” No tribal knowledge about which flags to set or which order to run things. One command, every time, same result.

The Litmus Test

Ask yourself: “Can a new team member clone the repository and produce a deployable artifact with a single command within 15 minutes?”

If the answer is no, your build is not fully automated.

Why Build Automation Matters for CD

CD Requirement How Build Automation Supports It
Reproducibility The same commit always produces the same artifact, on any machine
Speed Automated builds can be optimized, cached, and parallelized
Confidence If the build passes, the artifact is trustworthy
Developer experience Developers run the same build locally that CI runs, eliminating “works on my machine”
Pipeline foundation The CI/CD pipeline is just the build running automatically on every commit

Without build automation, every other practice in this guide breaks down. You cannot have continuous integration if the build requires manual intervention. You cannot have a deterministic pipeline if the build produces different results depending on who runs it.

Key Practices

1. Version-Controlled Build Scripts

Your build configuration lives in the same repository as your code. It is versioned, reviewed, and tested alongside the application.

What belongs in version control:

  • Build scripts (Makefile, build.gradle, package.json scripts, Dockerfile)
  • Dependency manifests (requirements.txt, go.mod, pom.xml, package-lock.json)
  • CI/CD pipeline definitions (.github/workflows, .gitlab-ci.yml, Jenkinsfile)
  • Environment setup scripts (docker-compose.yml for local development)

What does not belong in version control:

  • Secrets and credentials (use secret management tools)
  • Environment-specific configuration values (use environment variables or config management)
  • Generated artifacts (build outputs, compiled binaries)

Anti-pattern: Build instructions that exist only in a wiki, a Confluence page, or one developer’s head. If the build steps are not in the repository, they will drift from reality.

2. Dependency Management

All dependencies must be declared explicitly and resolved deterministically.

Practices:

  • Lock files: Use lock files (package-lock.json, Pipfile.lock, go.sum) to pin exact dependency versions. Check lock files into version control.
  • Reproducible resolution: Running the dependency install twice should produce identical results.
  • No undeclared dependencies: Your build should not rely on tools or libraries that happen to be installed on the build machine. If you need it, declare it.
  • Dependency scanning: Automate vulnerability scanning of dependencies as part of the build. Do not wait for a separate security review.

Anti-pattern: “It builds on Jenkins because Jenkins has Java 11 installed, but the Dockerfile uses Java 17.” The build must declare and control its own runtime.

3. Build Caching

Fast builds keep developers in flow. Caching is the primary mechanism for build speed.

What to cache:

  • Dependencies: Download once, reuse across builds. Most build tools (npm, Maven, Gradle, pip) support a local cache.
  • Compilation outputs: Incremental compilation avoids rebuilding unchanged modules.
  • Docker layers: Structure your Dockerfile so that rarely-changing layers (OS, dependencies) are cached and only the application code layer is rebuilt.
  • Test fixtures: Prebuilt test data or container images used by tests.

Guidelines:

  • Cache aggressively for local development and CI
  • Invalidate caches when dependencies or build configuration change
  • Do not cache test results - tests must always run

4. Single Build Script Entry Point

Developers, CI, and CD should all use the same entry point.

# Example: Makefile as the single entry point

.PHONY: build test package all

all: build test package

build:
	./gradlew compileJava

test:
	./gradlew test

package:
	docker build -t myapp:$(GIT_SHA) .

clean:
	./gradlew clean
	docker rmi myapp:$(GIT_SHA) || true

The CI server runs make all. A developer runs make all. The result is the same. There is no separate “CI build script” that diverges from what developers run locally.

5. Artifact Versioning

Every build artifact must be traceable to the exact commit that produced it.

Practices:

  • Tag artifacts with the Git commit SHA or a build number derived from it
  • Store build metadata (commit, branch, timestamp, builder) in the artifact or alongside it
  • Never overwrite an existing artifact - if the version exists, the artifact is immutable

This becomes critical in Phase 2 when you establish immutable artifact practices.

CI Server Setup Basics

The CI server is the mechanism that runs your build automatically. In Phase 1, the setup is straightforward:

What the CI Server Does

  1. Watches the trunk for new commits
  2. Runs the build (the same command a developer would run locally)
  3. Reports the result (pass/fail, test results, build duration)
  4. Notifies the team if the build fails

Minimum CI Configuration

Regardless of which CI tool you use (GitHub Actions, GitLab CI, Jenkins, CircleCI), the configuration follows the same pattern:

# Conceptual CI configuration (adapt to your tool)
trigger:
  branch: main  # Run on every commit to trunk

steps:
  - checkout: source code
  - install: dependencies
  - run: build
  - run: tests
  - run: package
  - report: test results and build status

CI Principles for Phase 1

  • Run on every commit. Not nightly, not weekly, not “when someone remembers.” Every commit to trunk triggers a build.
  • Keep the build green. A failing build is the team’s top priority. Work stops until trunk is green again. (See Working Agreements.)
  • Run the same build everywhere. The CI server runs the same script as local development. No CI-only steps that developers cannot reproduce.
  • Fail fast. Run the fastest checks first (compilation, unit tests) before the slower ones (integration tests, packaging).

Build Time Targets

Build speed directly affects developer productivity and integration frequency. If the build takes 30 minutes, developers will not integrate multiple times per day.

Build Phase Target Rationale
Compilation < 1 minute Developers need instant feedback on syntax and type errors
Unit tests < 3 minutes Fast enough to run before every commit
Integration tests < 5 minutes Must complete before the developer context-switches
Full build (compile + test + package) < 10 minutes The outer bound for fast feedback

If Your Build Is Too Slow

Slow builds are a common constraint that blocks CD adoption. Address them systematically:

  1. Profile the build. Identify which steps take the most time. Optimize the bottleneck, not everything.
  2. Parallelize tests. Most test frameworks support parallel execution. Run independent test suites concurrently.
  3. Use build caching. Avoid recompiling or re-downloading unchanged dependencies.
  4. Split the build. Run fast checks (lint, compile, unit tests) as a “fast feedback” stage. Run slower checks (integration tests, security scans) as a second stage.
  5. Upgrade build hardware. Sometimes the fastest optimization is more CPU and RAM.

The target is under 10 minutes for the feedback loop that developers use on every commit. Longer-running validation (E2E tests, performance tests) can run in a separate stage.

Common Anti-Patterns

Manual Build Steps

Symptom: The build process includes steps like “open this tool and click Run” or “SSH into the build server and execute this script.”

Problem: Manual steps are error-prone, slow, and cannot be parallelized or cached. They are the single biggest obstacle to build automation.

Fix: Script every step. If a human must perform the step today, write a script that performs it tomorrow.

Environment-Specific Builds

Symptom: The build produces different artifacts for different environments (dev, staging, production). Or the build only works on specific machines because of pre-installed tools.

Problem: Environment-specific builds mean you are not testing the same artifact you deploy. Bugs that appear in production but not in staging become impossible to diagnose.

Fix: Build one artifact and configure it per environment at deployment time. The artifact is immutable; the configuration is external. (See Application Config in Phase 2.)

Build Scripts That Only Run in CI

Symptom: The CI pipeline has build steps that developers cannot run locally. Local development uses a different build process.

Problem: Developers cannot reproduce CI failures locally, leading to slow debugging cycles and “push and pray” development.

Fix: Use a single build entry point (Makefile, build script) that both CI and developers use. CI configuration should only add triggers and notifications, not build logic.

Missing Dependency Pinning

Symptom: Builds break randomly because a dependency released a new version overnight.

Problem: Without pinned dependencies, the build is non-deterministic. The same code can produce different results on different days.

Fix: Use lock files. Pin all dependency versions. Update dependencies intentionally, not accidentally.

Long Build Queues

Symptom: Developers commit to trunk, but the build does not run for 20 minutes because the CI server is processing a queue.

Problem: Delayed feedback defeats the purpose of CI. If developers do not see the result of their commit for 30 minutes, they have already moved on.

Fix: Ensure your CI infrastructure can handle your team’s commit frequency. Use parallel build agents. Prioritize builds on the main branch.

Measuring Success

Metric Target Why It Matters
Build duration < 10 minutes Enables fast feedback and frequent integration
Build success rate > 95% Indicates reliable, reproducible builds
Time from commit to build result < 15 minutes (including queue time) Measures the full feedback loop
Developer ability to build locally 100% of team Confirms the build is portable and documented

Next Step

With build automation in place, you can build, test, and package your application reliably. The next foundation is ensuring that the work you integrate daily is small enough to be safe. Continue to Work Decomposition.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.


4 - Work Decomposition

Break features into small, deliverable increments that can be completed in 2 days or less.

Phase 1 - Foundations | Adapted from Dojo Consortium

Trunk-based development requires daily integration, and daily integration requires small work. If a feature takes two weeks to build, you cannot integrate it daily without decomposing it first. This page covers the techniques for breaking work into small, deliverable increments that flow through your pipeline continuously.

Why Small Work Matters for CD

Continuous delivery depends on a simple equation: small changes, integrated frequently, are safer than large changes integrated rarely.

Every practice in Phase 1 reinforces this:

  • Trunk-based development requires that you integrate at least daily. You cannot integrate a two-week feature daily unless you decompose it.
  • Testing fundamentals work best when each change is small enough to test thoroughly.
  • Code review is fast when the change is small. A 50-line change can be reviewed in minutes. A 2,000-line change takes hours - if it gets reviewed at all.

The data supports this. The DORA research consistently shows that smaller batch sizes correlate with higher delivery performance. Small changes have:

  • Lower risk: If a small change breaks something, the blast radius is limited, and the cause is obvious.
  • Faster feedback: A small change gets through the pipeline quickly. You learn whether it works today, not next week.
  • Easier rollback: Rolling back a 50-line change is straightforward. Rolling back a 2,000-line change often requires a new deployment.
  • Better flow: Small work items move through the system predictably. Large work items block queues and create bottlenecks.

The 2-Day Rule

If a work item takes longer than 2 days to complete, it is too big.

This is not arbitrary. Two days gives you at least one integration to trunk per day (the minimum for TBD) and allows for the natural rhythm of development: plan, implement, test, integrate, move on.

When a developer says “this will take a week,” the answer is not “go faster.” The answer is “break it into smaller pieces.”

What “Complete” Means

A work item is complete when it is:

  • Integrated to trunk
  • All tests pass
  • The change is deployable (even if the feature is not yet user-visible)
  • It meets the Definition of Done

If a story requires a feature flag to hide incomplete user-facing behavior, that is fine. The code is still integrated, tested, and deployable.

Story Slicing Techniques

Story slicing is the practice of breaking user stories into the smallest possible increments that still deliver value or make progress toward delivering value.

The INVEST Criteria

Good stories follow INVEST:

Criterion Meaning Why It Matters for CD
Independent Can be developed and deployed without waiting for other stories Enables parallel work and avoids blocking
Negotiable Details can be discussed and adjusted Allows the team to find the smallest valuable slice
Valuable Delivers something meaningful to the user or the system Prevents “technical stories” that do not move the product forward
Estimable Small enough that the team can reasonably estimate it Large stories are unestimable because they hide unknowns
Small Completable within 2 days Enables daily integration and fast feedback
Testable Has clear acceptance criteria that can be automated Supports the testing foundation

Vertical Slicing

The most important slicing technique for CD is vertical slicing: cutting through all layers of the application to deliver a thin but complete slice of functionality.

Vertical slice (correct):

“As a user, I can log in with my email and password.”

This slice touches the UI (login form), the API (authentication endpoint), and the database (user lookup). It is deployable and testable end-to-end.

Horizontal slice (anti-pattern):

“Build the database schema for user accounts.” “Build the authentication API.” “Build the login form UI.”

Each horizontal slice is incomplete on its own. None is deployable. None is testable end-to-end. They create dependencies between work items and block flow.

Slicing Strategies

When a story feels too big, apply one of these strategies:

Strategy How It Works Example
By workflow step Implement one step of a multi-step process “User can add items to cart” (before “user can checkout”)
By business rule Implement one rule at a time “Orders over $100 get free shipping” (before “orders ship to international addresses”)
By data variation Handle one data type first “Support credit card payments” (before “support PayPal”)
By operation Implement CRUD operations separately “Create a new customer” (before “edit customer” or “delete customer”)
By performance Get it working first, optimize later “Search returns results” (before “search returns results in under 200ms”)
By platform Support one platform first “Works on desktop web” (before “works on mobile”)
Happy path first Implement the success case first “User completes checkout” (before “user sees error when payment fails”)

Example: Decomposing a Feature

Original story (too big):

“As a user, I can manage my profile including name, email, avatar, password, notification preferences, and two-factor authentication.”

Decomposed into vertical slices:

  1. “User can view their current profile information” (read-only display)
  2. “User can update their name” (simplest edit)
  3. “User can update their email with verification” (adds email flow)
  4. “User can upload an avatar image” (adds file handling)
  5. “User can change their password” (adds security validation)
  6. “User can configure notification preferences” (adds preferences)
  7. “User can enable two-factor authentication” (adds 2FA flow)

Each slice is independently deployable, testable, and completable within 2 days. Each delivers incremental value. The feature is built up over a series of small deliveries rather than one large batch.

BDD as a Decomposition Tool

Behavior-Driven Development (BDD) is not just a testing practice - it is a powerful tool for decomposing work into small, clear increments.

Three Amigos

Before work begins, hold a brief “Three Amigos” session with three perspectives:

  • Business/Product: What should this feature do? What is the expected behavior?
  • Development: How will we build it? What are the technical considerations?
  • Testing: How will we verify it? What are the edge cases?

This 15-30 minute conversation accomplishes two things:

  1. Shared understanding: Everyone agrees on what “done” looks like before work begins.
  2. Natural decomposition: Discussing specific scenarios reveals natural slice boundaries.

Specification by Example

Write acceptance criteria as concrete examples, not abstract requirements.

Abstract (hard to slice):

“The system should validate user input.”

Concrete (easy to slice):

  • Given an email field, when the user enters “not-an-email”, then the form shows “Please enter a valid email address.”
  • Given a password field, when the user enters fewer than 8 characters, then the form shows “Password must be at least 8 characters.”
  • Given a name field, when the user leaves it blank, then the form shows “Name is required.”

Each concrete example can become its own story or task. The scope is clear, the acceptance criteria are testable, and the work is small.

Given-When-Then Format

Structure acceptance criteria in Given-When-Then format to make them executable:

Feature: User login

  Scenario: Successful login with valid credentials
    Given a registered user with email "user@example.com"
    When they enter their correct password and click "Log in"
    Then they are redirected to the dashboard

  Scenario: Failed login with wrong password
    Given a registered user with email "user@example.com"
    When they enter an incorrect password and click "Log in"
    Then they see the message "Invalid email or password"
    And they remain on the login page

Each scenario is a natural unit of work. Implement one scenario at a time, integrate to trunk after each one.

Task Decomposition Within Stories

Even well-sliced stories may contain multiple tasks. Decompose stories into tasks that can be completed and integrated independently.

Example story: “User can update their name”

Tasks:

  1. Add the name field to the profile API endpoint (backend change, integration test)
  2. Add the name field to the profile form (frontend change, unit test)
  3. Connect the form to the API endpoint (integration, E2E test)

Each task results in a commit to trunk. The story is completed through a series of small integrations, not one large merge.

Guidelines for task decomposition:

  • Each task should take hours, not days
  • Each task should leave trunk in a working state after integration
  • Tasks should be ordered so that the simplest changes come first
  • If a task requires a feature flag or stub to be integrated safely, that is fine

Common Anti-Patterns

Horizontal Slicing

Symptom: Stories are organized by architectural layer: “build the database schema,” “build the API,” “build the UI.”

Problem: No individual slice is deployable or testable end-to-end. Integration happens at the end, which is where bugs are found and schedules slip.

Fix: Slice vertically. Every story should touch all the layers needed to deliver a thin slice of complete functionality.

Technical Stories

Symptom: The backlog contains stories like “refactor the database access layer” or “upgrade to React 18” that do not deliver user-visible value.

Problem: Technical work is important, but when it is separated from feature work, it becomes hard to prioritize and easy to defer. It also creates large, risky changes.

Fix: Embed technical improvements in feature stories. Refactor as you go. If a technical change is necessary, tie it to a specific business outcome and keep it small enough to complete in 2 days.

Stories That Are Really Epics

Symptom: A story has 10+ acceptance criteria, or the estimate is “8 points” or “2 weeks.”

Problem: Large stories hide unknowns, resist estimation, and cannot be integrated daily.

Fix: If a story has more than 3-5 acceptance criteria, it is an epic. Break it into smaller stories using the slicing strategies above.

Splitting by Role Instead of by Behavior

Symptom: Separate stories for “frontend developer builds the UI” and “backend developer builds the API.”

Problem: This creates handoff dependencies and delays integration. The feature is not testable until both stories are complete.

Fix: Write stories from the user’s perspective. The same developer (or pair) implements the full vertical slice.

Deferring “Edge Cases” Indefinitely

Symptom: The team builds the happy path and creates a backlog of “handle error case X” stories that never get prioritized.

Problem: Error handling is not optional. Unhandled edge cases become production incidents.

Fix: Include the most important error cases in the initial story decomposition. Use the “happy path first” slicing strategy, but schedule edge case stories immediately after, not “someday.”

Measuring Success

Metric Target Why It Matters
Story cycle time < 2 days from start to trunk Confirms stories are small enough
Development cycle time Decreasing Shows improved flow from smaller work
Stories completed per week Increasing (with same team size) Indicates better decomposition and less rework
Work in progress Decreasing Fewer large stories blocking the pipeline

Next Step

Small, well-decomposed work flows through the system quickly - but only if code review does not become a bottleneck. Continue to Code Review to learn how to keep review fast and effective.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.


5 - Code Review

Streamline code review to provide fast feedback without blocking flow.

Phase 1 - Foundations | Adapted from Dojo Consortium

Code review is essential for quality, but it is also the most common bottleneck in teams adopting trunk-based development. If reviews take days, daily integration is impossible. This page covers review techniques that maintain quality while enabling the flow that CD requires.

Why Code Review Matters for CD

Code review serves multiple purposes:

  • Defect detection: A second pair of eyes catches bugs that the author missed.
  • Knowledge sharing: Reviews spread understanding of the codebase across the team.
  • Consistency: Reviews enforce coding standards and architectural patterns.
  • Mentoring: Junior developers learn by having their code reviewed and by reviewing others’ code.

These are real benefits. The challenge is that traditional code review - open a pull request, wait for someone to review it, address comments, wait again - is too slow for CD.

In a CD workflow, code review must happen within minutes or hours, not days. The review is still rigorous, but the process is designed for speed.

The Core Tension: Quality vs. Flow

Traditional teams optimize review for thoroughness: detailed comments, multiple reviewers, extensive back-and-forth. This produces high-quality reviews but blocks flow.

CD teams optimize review for speed without sacrificing the quality that matters. The key insight is that most of the quality benefit of code review comes from small, focused reviews done quickly, not from exhaustive reviews done slowly.

Traditional Review CD-Compatible Review
Review happens after the feature is complete Review happens continuously throughout development
Large diffs (hundreds or thousands of lines) Small diffs (< 200 lines, ideally < 50)
Multiple rounds of feedback and revision One round, or real-time feedback during pairing
Review takes 1-3 days Review takes minutes to a few hours
Review is asynchronous by default Review is synchronous by preference
2+ reviewers required 1 reviewer (or pairing as the review)

Synchronous vs. Asynchronous Review

Synchronous Review (Preferred for CD)

In synchronous review, the reviewer and author are engaged at the same time. Feedback is immediate. Questions are answered in real time. The review is done when the conversation ends.

Methods:

  • Pair programming: Two developers work on the same code at the same time. Review is continuous. There is no separate review step because the code was reviewed as it was written.
  • Mob programming: The entire team (or a subset) works on the same code together. Everyone reviews in real time.
  • Over-the-shoulder review: The author walks the reviewer through the change in person or on a video call. The reviewer asks questions and provides feedback immediately.

Advantages for CD:

  • Zero wait time between “ready for review” and “review complete”
  • Higher bandwidth communication (tone, context, visual cues) catches more issues
  • Immediate resolution of questions - no async back-and-forth
  • Knowledge transfer happens naturally through the shared work

Asynchronous Review (When Necessary)

Sometimes synchronous review is not possible - time zones, schedules, or team preferences may require asynchronous review. This is fine, but it must be fast.

Rules for async review in a CD workflow:

  • Review within 2 hours. If a pull request sits for a day, it blocks integration. Set a team working agreement: “pull requests are reviewed within 2 hours during working hours.”
  • Keep changes small. A 50-line change can be reviewed in 5 minutes. A 500-line change takes an hour and reviewers procrastinate on it.
  • Use draft PRs for early feedback. If you want feedback on an approach before the code is complete, open a draft PR. Do not wait until the change is “perfect.”
  • Avoid back-and-forth. If a comment requires discussion, move to a synchronous channel (call, chat). Async comment threads that go 5 rounds deep are a sign the change is too large or the design was not discussed upfront.

Review Techniques Compatible with TBD

Pair Programming as Review

When two developers pair on a change, the code is reviewed as it is written. There is no separate review step, no pull request waiting for approval, and no delay to integration.

How it works with TBD:

  1. Two developers sit together (physically or via screen share)
  2. They discuss the approach, write the code, and review each other’s decisions in real time
  3. When the change is ready, they commit to trunk together
  4. Both developers are accountable for the quality of the code

When to pair:

  • New or unfamiliar areas of the codebase
  • Changes that affect critical paths
  • When a junior developer is working on a change (pairing doubles as mentoring)
  • Any time the change involves design decisions that benefit from discussion

Pair programming satisfies most organizations’ code review requirements because two developers have actively reviewed and approved the code.

Mob Programming as Review

Mob programming extends pairing to the whole team. One person drives (types), one person navigates (directs), and the rest observe and contribute.

When to mob:

  • Establishing new patterns or architectural decisions
  • Complex changes that benefit from multiple perspectives
  • Onboarding new team members to the codebase
  • Working through particularly difficult problems

Mob programming is intensive but highly effective. Every team member understands the code, the design decisions, and the trade-offs.

Rapid Async Review

For teams that use pull requests, rapid async review adapts the pull request workflow for CD speed.

Practices:

  • Auto-assign reviewers. Do not wait for someone to volunteer. Use tools to automatically assign a reviewer when a PR is opened.
  • Keep PRs small. Target < 200 lines of changed code. Smaller PRs get reviewed faster and more thoroughly.
  • Provide context. Write a clear PR description that explains what the change does, why it is needed, and how to verify it. A good description reduces review time dramatically.
  • Use automated checks. Run linting, formatting, and tests before the human review. The reviewer should focus on logic and design, not style.
  • Approve and merge quickly. If the change looks correct, approve it. Do not hold it for nitpicks. Nitpicks can be addressed in a follow-up commit.

What to Review

Not everything in a code change deserves the same level of scrutiny. Focus reviewer attention where it matters most.

High Priority (Reviewer Should Focus Here)

  • Behavior correctness: Does the code do what it is supposed to do? Are edge cases handled?
  • Security: Does the change introduce vulnerabilities? Are inputs validated? Are secrets handled properly?
  • Clarity: Can another developer understand this code in 6 months? Are names clear? Is the logic straightforward?
  • Test coverage: Are the new behaviors tested? Do the tests verify the right things?
  • API contracts: Do changes to public interfaces maintain backward compatibility? Are they documented?
  • Error handling: What happens when things go wrong? Are errors caught, logged, and surfaced appropriately?

Low Priority (Automate Instead of Reviewing)

  • Code style and formatting: Use automated formatters (Prettier, Black, gofmt). Do not waste reviewer time on indentation and bracket placement.
  • Import ordering: Automate with linting rules.
  • Naming conventions: Enforce with lint rules where possible. Only flag naming in review if it genuinely harms readability.
  • Unused variables or imports: Static analysis tools catch these instantly.
  • Consistent patterns: Where possible, encode patterns in architecture decision records and lint rules rather than relying on reviewers to catch deviations.

Rule of thumb: If a style or convention issue can be caught by a machine, do not ask a human to catch it. Reserve human attention for the things machines cannot evaluate: correctness, design, clarity, and security.

Review Scope for Small Changes

In a CD workflow, most changes are small - tens of lines, not hundreds. This changes the economics of review.

Change Size Expected Review Time Review Depth
< 20 lines 2-5 minutes Quick scan: is it correct? Any security issues?
20-100 lines 5-15 minutes Full review: behavior, tests, clarity
100-200 lines 15-30 minutes Detailed review: design, contracts, edge cases
> 200 lines Consider splitting the change Large changes get superficial reviews

Research consistently shows that reviewer effectiveness drops sharply after 200-400 lines. If you are regularly reviewing changes larger than 200 lines, the problem is not the review process - it is the work decomposition.

Working Agreements for Review SLAs

Establish clear team agreements about review expectations. Without explicit agreements, review latency will drift based on individual habits.

Agreement Target
Response time Review within 2 hours during working hours
Reviewer count 1 reviewer (or pairing as the review)
PR size < 200 lines of changed code
Blocking issues only Only block a merge for correctness, security, or significant design issues
Nitpicks Use a “nit:” prefix. Nitpicks are suggestions, not merge blockers
Stale PRs PRs open for > 24 hours are escalated to the team
Self-review Author reviews their own diff before requesting review

How to Enforce Review SLAs

  • Track review turnaround time. If it consistently exceeds 2 hours, discuss it in retrospectives.
  • Make review a first-class responsibility, not something developers do “when they have time.”
  • If a reviewer is unavailable, any other team member can review. Do not create single-reviewer dependencies.
  • Consider pairing as the default and async review as the exception. This eliminates the review bottleneck entirely.

Code Review and Trunk-Based Development

Code review and TBD work together, but only if review does not block integration. Here is how to reconcile them:

TBD Requirement How Review Adapts
Integrate to trunk at least daily Reviews must complete within hours, not days
Branches live < 24 hours PRs are opened and merged within the same day
Trunk is always releasable Reviewers focus on correctness, not perfection
Small, frequent changes Small changes are reviewed quickly and thoroughly

If your team finds that review is the bottleneck preventing daily integration, the most effective solution is to adopt pair programming. It eliminates the review step entirely by making review continuous.

Measuring Success

Metric Target Why It Matters
Review turnaround time < 2 hours Prevents review from blocking integration
PR size (lines changed) < 200 lines Smaller PRs get faster, more thorough reviews
PR age at merge < 24 hours Aligns with TBD branch age constraint
Review rework cycles < 2 rounds Multiple rounds indicate the change is too large or design was not discussed upfront

Next Step

Code review practices need to be codified in team agreements alongside other shared commitments. Continue to Working Agreements to establish your team’s definitions of done, ready, and CI practice.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.


6 - Working Agreements

Establish shared definitions of done and ready to align the team on quality and process.

Phase 1 - Foundations | Adapted from Dojo Consortium

The practices in Phase 1 - trunk-based development, testing, small work, and fast review - only work when the whole team commits to them. Working agreements make that commitment explicit. This page covers the key agreements a team needs before moving to pipeline automation in Phase 2.

Why Working Agreements Matter

A working agreement is a shared commitment that the team creates, owns, and enforces together. It is not a policy imposed from outside. It is the team’s own answer to the question: “How do we work together?”

Without working agreements, CD practices drift. One developer integrates daily; another keeps a branch for a week. One developer fixes a broken build immediately; another waits until after lunch. These inconsistencies compound. Within weeks, the team is no longer practicing CD - they are practicing individual preferences.

Working agreements prevent this drift by making expectations explicit. When everyone agrees on what “done” means, what “ready” means, and how CI works, the team can hold each other accountable without conflict.

Definition of Done

The Definition of Done (DoD) is the team’s shared standard for when a work item is complete. For CD, the Definition of Done must include deployment.

Minimum Definition of Done for CD

A work item is done when all of the following are true:

  • Code is integrated to trunk
  • All automated tests pass
  • Code has been reviewed (via pairing, mob, or pull request)
  • The change is deployable to production
  • No known defects are introduced
  • Relevant documentation is updated (API docs, runbooks, etc.)
  • Feature flags are in place for incomplete user-facing features

Why “Deployed to Production” Matters

Many teams define “done” as “code is merged.” This creates a gap between “done” and “delivered.” Work accumulates in a staging environment, waiting for a release. Risk grows with each unreleased change.

In a CD organization, “done” means the change is in production (or ready to be deployed to production at any time). This is the ultimate test of completeness: the change works in the real environment, with real data, under real load.

In Phase 1, you may not yet have the pipeline to deploy every change to production automatically. That is fine - your DoD should still include “deployable to production” as the standard, even if the deployment step is not yet automated. The pipeline work in Phase 2 will close that gap.

Extending Your Definition of Done

As your CD maturity grows, extend the DoD:

Phase Addition to DoD
Phase 1 (Foundations) Code integrated to trunk, tests pass, reviewed, deployable
Phase 2 (Pipeline) Artifact built and validated by the pipeline
Phase 3 (Optimize) Change deployed to production behind a feature flag
Phase 4 (Deliver on Demand) Change deployed to production and monitored

Definition of Ready

The Definition of Ready (DoR) answers: “When is a work item ready to be worked on?” Pulling unready work into development creates waste - unclear requirements lead to rework, missing acceptance criteria lead to untestable changes, and oversized stories lead to long-lived branches.

Minimum Definition of Ready for CD

A work item is ready when all of the following are true:

  • Acceptance criteria are defined and specific (using Given-When-Then or equivalent)
  • The work item is small enough to complete in 2 days or less
  • The work item is testable - the team knows how to verify it works
  • Dependencies are identified and resolved (or the work item is independent)
  • The team has discussed the work item (Three Amigos or equivalent)
  • The work item is estimated (or the team has agreed estimation is unnecessary for items this small)

Common Mistakes with Definition of Ready

  • Making it too rigid. The DoR is a guideline, not a gate. If the team agrees a work item is understood well enough, it is ready. Do not use the DoR to avoid starting work.
  • Requiring design documents. For small work items (< 2 days), a conversation and acceptance criteria are sufficient. Formal design documents are for larger initiatives.
  • Skipping the conversation. The DoR is most valuable as a prompt for discussion, not as a checklist. The Three Amigos conversation matters more than the checkboxes.

CI Working Agreement

The CI working agreement codifies how the team practices continuous integration. This is the most operationally critical working agreement for CD.

The CI Agreement

The team agrees to the following practices:

Integration:

  • Every developer integrates to trunk at least once per day
  • Branches (if used) live for less than 24 hours
  • No long-lived feature, development, or release branches

Build:

  • All tests must pass before merging to trunk
  • The build runs on every commit to trunk
  • Build results are visible to the entire team

Broken builds:

  • A broken build is the team’s top priority - it is fixed before any new work begins
  • The developer(s) who broke the build are responsible for fixing it immediately
  • If the fix will take more than 10 minutes, revert the change and fix it offline
  • No one commits to a broken trunk (except to fix the break)

Work in progress:

  • Finishing existing work takes priority over starting new work
  • The team limits work in progress to maintain flow
  • If a developer is blocked, they help a teammate before starting a new story

Why “Broken Build = Top Priority”

This is the single most important CI agreement. When the build is broken:

  • No one can integrate safely. Changes are stacking up.
  • Trunk is not releasable. The team has lost its safety net.
  • Every minute the build stays broken, the team accumulates risk.

“Fix the build” is not a suggestion. It is an agreement that the team enforces collectively. If the build is broken and someone starts a new feature instead of fixing it, the team should call that out. This is not punitive - it is the team protecting its own ability to deliver.

Stop the Line - Why All Work Stops

Some teams interpret “fix the build” as “stop merging until it is green.” That is not enough. When the build is red, all feature work stops - not just merges. Every developer on the team shifts attention to restoring green.

This sounds extreme, but the reasoning is straightforward:

  • Work closer to production is more valuable than work further away. A broken trunk means nothing in progress can ship. Fixing the build is the highest-leverage activity anyone on the team can do.
  • Continuing feature work creates a false sense of progress. Code written against a broken trunk is untested against the real baseline. It may compile, but it has not been validated. That is not progress - it is inventory.
  • The team mindset matters more than the individual fix. When everyone stops, the message is clear: the build belongs to the whole team, not just the person who broke it. This shared ownership is what separates teams that practice CI from teams that merely have a CI server.

Two Timelines: Stop vs. Do Not Stop

Consider two teams that encounter the same broken build at 10:00 AM.

Team A stops all feature work:

  • 10:00 - Build breaks. The team sees the alert and stops.
  • 10:05 - Two developers pair on the fix while a third reviews the failing test.
  • 10:20 - Fix is pushed. Build goes green.
  • 10:25 - The team resumes feature work. Total disruption: roughly 30 minutes.

Team B treats it as one person’s problem:

  • 10:00 - Build breaks. The developer who caused it starts investigating alone.
  • 10:30 - Other developers commit new changes on top of the broken trunk. Some changes conflict with the fix in progress.
  • 11:30 - The original developer’s fix does not work because the codebase has shifted underneath them.
  • 14:00 - After multiple failed attempts, the team reverts three commits (the original break plus two that depended on the broken state).
  • 15:00 - Trunk is finally green. The team has lost most of the day, and three developers need to redo work. Total disruption: 5+ hours.

The team that stops immediately pays a small, predictable cost. The team that does not stop pays a large, unpredictable one.

The Revert Rule

If a broken build cannot be fixed within 10 minutes, revert the offending commit and fix the issue on a branch. This keeps trunk green and unblocks the rest of the team. The developer who made the change is not being punished - they are protecting the team’s flow.

Reverting feels uncomfortable at first. Teams worry about “losing work.” But a reverted commit is not lost - the code is still in the Git history. The developer can re-apply their change after fixing the issue. The alternative - a broken trunk for hours while someone debugs - is far more costly.

When to Forward Fix vs. Revert

Not every broken build requires a revert. If the developer who broke it can identify the cause quickly, a forward fix is faster and simpler. The key is a strict time limit:

  1. Start a 15-minute timer the moment the build goes red.
  2. If the developer has a fix ready and pushed within 15 minutes, ship the forward fix.
  3. If the timer expires and the fix is not in trunk, revert immediately - no extensions, no “I’m almost done.”

The timer prevents the most common failure mode: a developer who is “five minutes away” from a fix for an hour. After 15 minutes without a fix, the probability of a quick resolution drops sharply, and the cost to the rest of the team climbs. Revert, restore green, and fix the problem offline without time pressure.

Common Objections to Stop-the-Line

Teams adopting stop-the-line discipline encounter predictable pushback. These responses can help.

Objection Response
“We can’t afford to stop - we have a deadline.” You cannot afford not to stop. Every minute the build is red, you accumulate changes that are untested against the real baseline. Stopping for 20 minutes now prevents losing half a day later. The fastest path to your deadline runs through a green build.
“Stopping kills our velocity.” Velocity that includes work built on a broken trunk is an illusion. Those story points will come back as rework, failed deployments, or production incidents. Real velocity requires a releasable trunk.
“We already stop all the time - it’s not working.” Frequent stops indicate a different problem: the team is merging changes that break the build too often. Address that root cause with better pre-merge testing, smaller commits, and pair programming on risky changes. Stop-the-line is the safety net, not the solution for chronic build instability.
“It’s a known flaky test - we can ignore it.” A flaky test you ignore trains the team to ignore all red builds. Fix the flaky test or remove it. There is no middle ground. A red build must always mean “something is wrong” or the signal loses all value.
“Management won’t support stopping feature work.” Frame it in terms management cares about: lead time and rework cost. Show the two-timeline comparison above. Teams that stop immediately have shorter cycle times and less unplanned rework. This is not about being cautious - it is about being fast.

How Working Agreements Support the CD Migration

Each working agreement maps directly to a Phase 1 practice:

Practice Supporting Agreement
Trunk-based development CI agreement: daily integration, branch age < 24h
Testing fundamentals DoD: all tests pass. CI: tests pass before merge
Build automation CI: build runs on every commit. Broken build = top priority
Work decomposition DoR: work items < 2 days. WIP limits
Code review CI: review within 2 hours. DoD: code reviewed

Without these agreements, individual practices exist in isolation. Working agreements connect them into a coherent way of working.

Template: Create Your Own Working Agreements

Use this template as a starting point. Customize it for your team’s context. The specific targets may differ, but the structure should remain.

Team Working Agreement Template

# [Team Name] Working Agreement
Date: [Date]
Participants: [All team members]

## Definition of Done
A work item is done when:
- [ ] Code is integrated to trunk
- [ ] All automated tests pass
- [ ] Code has been reviewed (method: [pair / mob / PR])
- [ ] The change is deployable to production
- [ ] No known defects are introduced
- [ ] [Add team-specific criteria]

## Definition of Ready
A work item is ready when:
- [ ] Acceptance criteria are defined (Given-When-Then)
- [ ] The item can be completed in [X] days or less
- [ ] The item is testable
- [ ] Dependencies are identified
- [ ] The team has discussed the item
- [ ] [Add team-specific criteria]

## CI Practices
- Integration frequency: at least [X] per developer per day
- Maximum branch age: [X] hours
- Review turnaround: within [X] hours
- Broken build response: fix within [X] minutes or revert
- WIP limit: [X] items per developer

## Review Practices
- Default review method: [pair / mob / async PR]
- PR size limit: [X] lines
- Review focus: [correctness, security, clarity]
- Style enforcement: [automated via linting]

## Meeting Cadence
- Standup: [time, frequency]
- Retrospective: [frequency]
- Working agreement review: [frequency, e.g., monthly]

## Agreement Review
This agreement is reviewed and updated [monthly / quarterly].
Any team member can propose changes at any time.
All changes require team consensus.

Tips for Creating Working Agreements

  1. Include everyone. Every team member should participate in creating the agreement. Agreements imposed by a manager or tech lead are policies, not agreements.
  2. Start simple. Do not try to cover every scenario. Start with the essentials (DoD, DoR, CI) and add specifics as the team identifies gaps.
  3. Make them visible. Post the agreements where the team sees them daily - on a team wiki, in the team channel, or on a physical board.
  4. Review regularly. Agreements should evolve as the team matures. Review them monthly. Remove agreements that are second nature. Add agreements for new challenges.
  5. Enforce collectively. Working agreements are only effective if the team holds each other accountable. This is a team responsibility, not a manager responsibility.
  6. Start with agreements you can keep. If the team is currently integrating once a week, do not agree to integrate three times daily. Agree to integrate daily, practice for a month, then tighten.

Measuring Success

Metric Target Why It Matters
Agreement adherence Team self-reports > 80% adherence Indicates agreements are realistic and followed
Agreement review frequency Monthly Ensures agreements stay relevant
Integration frequency Meets CI agreement target Validates the CI working agreement
Broken build fix time Meets CI agreement target Validates the broken build response agreement

Next Step

With working agreements in place, your team has established the foundations for continuous delivery: daily integration, reliable testing, automated builds, small work, fast review, and shared commitments.

You are ready to move to Phase 2: Pipeline, where you will build the automated path from commit to production.


This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.


7 - Everything as Code

Every artifact that defines your system - infrastructure, pipelines, configuration, database schemas, monitoring - belongs in version control and is delivered through pipelines.

Phase 1 - Foundations

If it is not in version control, it does not exist. If it is not delivered through a pipeline, it is a manual step. Manual steps block continuous delivery. This page establishes the principle that everything required to build, deploy, and operate your system is defined as code, version controlled, reviewed, and delivered through the same automated pipelines as your application.

The Principle

Continuous delivery requires that any change to your system - application code, infrastructure, pipeline configuration, database schema, monitoring rules, security policies - can be made through a single, consistent process: change the code, commit, let the pipeline deliver it.

When something is defined as code:

  • It is version controlled. You can see who changed what, when, and why. You can revert any change. You can trace any production state to a specific commit.
  • It is reviewed. Changes go through the same review process as application code. A second pair of eyes catches mistakes before they reach production.
  • It is tested. Automated validation catches errors before deployment. Linting, dry-runs, and policy checks apply to infrastructure the same way unit tests apply to application code.
  • It is reproducible. You can recreate any environment from scratch. Disaster recovery is “re-run the pipeline,” not “find the person who knows how to configure the server.”
  • It is delivered through a pipeline. No SSH, no clicking through UIs, no manual steps. The pipeline is the only path to production for everything, not just application code.

When something is not defined as code, it is a liability. It cannot be reviewed, tested, or reproduced. It exists only in someone’s head, a wiki page that is already outdated, or a configuration that was applied manually and has drifted from any documented state.

What “Everything” Means

Application code

This is where most teams start, and it is the least controversial. Your application source code is in version control, built and tested by a pipeline, and deployed as an immutable artifact.

If your application code is not in version control, start here. Nothing else in this page matters until this is in place.

Infrastructure

Every server, network, database instance, load balancer, DNS record, and cloud resource should be defined in code and provisioned through automation.

What this looks like:

  • Cloud resources defined in Terraform, Pulumi, CloudFormation, or similar tools
  • Server configuration managed by Ansible, Chef, Puppet, or container images
  • Network topology, firewall rules, and security groups defined declaratively
  • Environment creation is a pipeline run, not a ticket to another team

What this replaces:

  • Clicking through cloud provider consoles to create resources
  • SSH-ing into servers to install packages or change configuration
  • Filing tickets for another team to provision an environment
  • “Snowflake” servers that were configured by hand and nobody knows how to recreate

Why it matters for CD: If creating or modifying an environment requires manual steps, your deployment frequency is limited by the availability and speed of the person who performs those steps. If a production server fails and you cannot recreate it from code, your mean time to recovery is measured in hours or days instead of minutes.

Pipeline definitions

Your CI/CD pipeline configuration belongs in the same repository as the code it builds and deploys. The pipeline is code, not a configuration applied through a UI.

What this looks like:

  • Pipeline definitions in .github/workflows/, .gitlab-ci.yml, Jenkinsfile, or equivalent
  • Pipeline changes go through the same review process as application code
  • Pipeline behavior is deterministic - the same commit always produces the same pipeline behavior
  • Teams can modify their own pipelines without filing tickets

What this replaces:

  • Pipeline configuration maintained through a Jenkins UI that nobody is allowed to touch
  • A “platform team” that owns all pipeline definitions and queues change requests
  • Pipeline behavior that varies depending on server state or installed plugins

Why it matters for CD: The pipeline is the path to production. If the pipeline itself cannot be changed through a reviewed, automated process, it becomes a bottleneck and a risk. Pipeline changes should flow with the same speed and safety as application changes.

Database schemas and migrations

Database schema changes should be defined as versioned migration scripts, stored in version control, and applied through the pipeline.

What this looks like:

  • Migration scripts in the repository (using tools like Flyway, Liquibase, Alembic, or ActiveRecord migrations)
  • Every schema change is a numbered, ordered migration that can be applied and rolled back
  • Migrations run as part of the deployment pipeline, not as a manual step
  • Schema changes follow the expand-then-contract pattern: add the new column, deploy code that uses it, then remove the old column in a later migration

What this replaces:

  • A DBA manually applying SQL scripts during a maintenance window
  • Schema changes that are “just done in production” and not tracked anywhere
  • Database state that has drifted from what is defined in any migration script

Why it matters for CD: Database changes are one of the most common reasons teams cannot deploy continuously. If schema changes require manual intervention, coordinated downtime, or a separate approval process, they become a bottleneck that forces batching. Treating schemas as code with automated migrations removes this bottleneck.

Application configuration

Environment-specific configuration - database connection strings, API endpoints, feature flag states, logging levels - should be defined as code and managed through version control.

What this looks like:

  • Configuration values stored in a config management system (Consul, AWS Parameter Store, environment variable definitions in infrastructure code)
  • Configuration changes are committed, reviewed, and deployed through a pipeline
  • The same application artifact is deployed to every environment; only the configuration differs

What this replaces:

  • Configuration files edited manually on servers
  • Environment variables set by hand and forgotten
  • Configuration that exists only in a deployment runbook

See Application Config for detailed guidance on externalizing configuration.

Monitoring, alerting, and observability

Dashboards, alert rules, SLO definitions, and logging configuration should be defined as code.

What this looks like:

  • Alert rules defined in Terraform, Prometheus rules files, or Datadog monitors-as-code
  • Dashboards defined as JSON or YAML, not built by hand in a UI
  • SLO definitions tracked in version control alongside the services they measure
  • Logging configuration (what to log, where to send it, retention policies) in code

What this replaces:

  • Dashboards built manually in a monitoring UI that nobody knows how to recreate
  • Alert rules that were configured by hand during an incident and never documented
  • Monitoring configuration that exists only on the monitoring server

Why it matters for CD: If you deploy ten times a day, you need to know instantly whether each deployment is healthy. If your monitoring and alerting configuration is manual, it will drift, break, or be incomplete. Monitoring-as-code ensures that every service has consistent, reviewed, reproducible observability.

Security policies

Security controls - access policies, network rules, secret rotation schedules, compliance checks - should be defined as code and enforced automatically.

What this looks like:

  • IAM policies and RBAC rules defined in Terraform or policy-as-code tools (OPA, Sentinel)
  • Security scanning integrated into the pipeline (SAST, dependency scanning, container image scanning)
  • Secret rotation automated and defined in code
  • Compliance checks that run on every commit, not once a quarter

What this replaces:

  • Security reviews that happen at the end of the development cycle
  • Access policies configured through UIs and never audited
  • Compliance as a manual checklist performed before each release

Why it matters for CD: Security and compliance requirements are the most common organizational blockers for CD. When security controls are defined as code and enforced by the pipeline, you can prove to auditors that every change passed security checks automatically. This is stronger evidence than a manual review, and it does not slow down delivery.

The “One Change, One Process” Test

For every type of artifact in your system, ask:

If I need to change this, do I commit a code change and let the pipeline deliver it?

If the answer is yes, the artifact is managed as code. If the answer involves SSH, a UI, a ticket to another team, or a manual step, it is not.

Artifact Managed as code? If not, the risk is…
Application source code Usually yes -
Infrastructure (servers, networks, cloud resources) Often no Snowflake environments, slow provisioning, unreproducible disasters
Pipeline definitions Sometimes Pipeline changes are slow, unreviewed, and risky
Database schemas Sometimes Schema changes require manual coordination and downtime
Application configuration Sometimes Config drift between environments, “works in staging” failures
Monitoring and alerting Rarely Monitoring gaps, unreproducible dashboards, alert fatigue
Security policies Rarely Security as a gate instead of a guardrail, audit failures

The goal is for every row in this table to be “yes.” You will not get there overnight, but every artifact you move from manual to code-managed removes a bottleneck and a risk.

How to Get There

Start with what blocks you most

Do not try to move everything to code at once. Identify the artifact type that causes the most pain or blocks deployments most frequently:

  • If environment provisioning takes days, start with infrastructure as code.
  • If database changes are the reason you cannot deploy more than once a week, start with schema migrations as code.
  • If pipeline changes require tickets to a platform team, start with pipeline as code.
  • If configuration drift causes production incidents, start with configuration as code.

Apply the same practices as application code

Once an artifact is defined as code, treat it with the same rigor as application code:

  • Store it in version control (ideally in the same repository as the application it supports)
  • Review changes before they are applied
  • Test changes automatically (linting, dry-runs, policy checks)
  • Deliver changes through a pipeline
  • Never modify the artifact outside of this process

Eliminate manual pathways

The hardest part is closing the manual back doors. As long as someone can SSH into a server and make a change, or click through a UI to modify infrastructure, the code-defined state will drift from reality.

The principle is the same as Single Path to Production for application code: the pipeline is the only way any change reaches production. This applies to infrastructure, configuration, schemas, monitoring, and policies just as much as it applies to application code.

Measuring Progress

Metric What to look for
Artifact types managed as code Track how many of the categories above are fully code-managed. The number should increase over time.
Manual changes to production Count any change made outside of a pipeline (SSH, UI clicks, manual scripts). Target: zero.
Environment recreation time How long does it take to recreate a production-like environment from scratch? Should decrease as more infrastructure moves to code.
Mean time to recovery When infrastructure-as-code is in place, recovery from failures is “re-run the pipeline.” MTTR drops dramatically.