This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Phase 1: Foundations

Establish the essential practices for daily integration, testing, and small work decomposition.

Key question: “Can we integrate safely every day?”

This phase establishes the development practices that make continuous delivery possible. Without these foundations, pipeline automation just speeds up a broken process.

What You’ll Do

  1. Adopt trunk-based development - Integrate to trunk at least daily
  2. Build testing fundamentals - Create a fast, reliable test suite
  3. Automate your build - One command to build, test, and package
  4. Decompose work - Break features into small, deliverable increments
  5. Streamline code review - Fast, effective review that doesn’t block flow
  6. Establish working agreements - Shared definitions of done and ready
  7. Everything as code - Version-control everything that defines your system: infrastructure, pipelines, schemas, monitoring, and security policies

Why This Phase Matters

Teams that skip these foundations end up automating a broken process. A pipeline that deploys untested code from long-lived branches does not improve delivery. It amplifies risk. These practices ensure that what enters the pipeline is already safe to ship.

When You’re Ready to Move On

You’re ready for Phase 2: Pipeline when:

  • All developers integrate to trunk at least once per day
  • Your test suite catches real defects and runs in under 10 minutes
  • You can build and package your application with a single command
  • Most work items can be completed within 2 days

Next: Phase 2 - Pipeline - build a single automated path from commit to production.


1 - Trunk-Based Development

Integrate all work to the trunk at least once per day to enable continuous integration.

Phase 1 - Foundations

Trunk-based development is the first foundation to establish. Without daily integration to a shared trunk, the rest of the CD migration cannot succeed. This page covers the core practice, two migration paths, and a tactical guide for getting started.

What Is Trunk-Based Development?

Trunk-based development (TBD) is a branching strategy where all developers integrate their work into a single shared branch - the trunk - at least once per day. The trunk is always kept in a releasable state.

This is a non-negotiable prerequisite for continuous delivery. If your team is not integrating to trunk daily, you are not doing CI, and you cannot do CD. There is no workaround.

“If it hurts, do it more often, and bring the pain forward.”

  • Jez Humble, Continuous Delivery

What TBD Is Not

  • It is not “everyone commits directly to main with no guardrails.” You still test, review, and validate work - you just do it in small increments.
  • It is not incompatible with code review. It requires review to happen quickly.
  • It is not reckless. It is the opposite: small, frequent integrations are far safer than large, infrequent merges.

What Trunk-Based Development Improves

ProblemHow TBD Helps
Merge conflictsSmall changes integrated frequently rarely conflict
Integration riskBugs are caught within hours, not weeks
Long-lived branches diverge from realityThe trunk always reflects the current state of the codebase
“Works on my branch” syndromeEveryone shares the same integration point
Slow feedbackCI runs on every integration, giving immediate signal
Large batch deploymentsSmall changes are individually deployable
Fear of deploymentEach change is small enough to reason about

Two Migration Paths

There are two valid approaches to trunk-based development. Both satisfy the minimum CD requirement of daily integration. Choose the one that fits your team’s current maturity and constraints.

Path 1: Short-Lived Branches

Developers create branches that live for less than 24 hours. Work is done on the branch, reviewed quickly, and merged to trunk within a single day.

How it works:

  1. Pull the latest trunk
  2. Create a short-lived branch
  3. Make small, focused changes
  4. Open a pull request (or use pair programming as the review)
  5. Merge to trunk before end of day
  6. The branch is deleted after merge

Best for teams that:

  • Currently use long-lived feature branches and need a stepping stone
  • Have regulatory requirements for traceable review records
  • Use pull request workflows they want to keep (but make faster)
  • Are new to TBD and want a gradual transition

Key constraint: The branch must merge to trunk within 24 hours. If it does not, you have a long-lived branch and you have lost the benefit of TBD.

Path 2: Direct Trunk Commits

Developers commit directly to trunk. Quality is ensured through pre-commit checks, pair programming, and strong automated testing.

How it works:

  1. Pull the latest trunk
  2. Make a small, tested change locally
  3. Run the local build and test suite
  4. Push directly to trunk
  5. CI validates the commit immediately

Best for teams that:

  • Have strong automated test coverage
  • Practice pair or mob programming (which provides real-time review)
  • Want maximum integration frequency
  • Have high trust and shared code ownership

Key constraint: This requires excellent test coverage and a culture where the team owns quality collectively. Without these, direct trunk commits become reckless.

How to Choose Your Path

Ask these questions:

  1. Do you have automated tests that catch real defects? If no, start with Path 1 and invest in testing fundamentals in parallel.
  2. Does your organization require documented review approvals? If yes, use Path 1 with rapid pull requests.
  3. Does your team practice pair programming? If yes, Path 2 may work immediately - pairing is a continuous review process.
  4. How large is your team? Teams of 2-4 can adopt Path 2 more easily. Larger teams may start with Path 1 and transition later.

Both paths are valid. The important thing is daily integration to trunk. Do not spend weeks debating which path to use. Pick one, start today, and adjust.

Essential Supporting Practices

Trunk-based development does not work in isolation. These practices make daily integration safe:

  • Feature flags: Merge incomplete work without exposing it to users.
  • Branch by abstraction: Replace implementations behind stable interfaces without long-lived branches.
  • Connect last: Build new code paths without wiring them in until they are complete.
  • Small, atomic commits: Each commit is a single logical change that leaves trunk releasable.
  • TDD/ATDD: Tests written before code provide the safety net for frequent integration.

The TBD Migration Guide covers each practice in detail with code examples.

Getting Started

Start by shortening branch lifetimes, then tighten to daily integration. The TBD Migration Guide walks through each step with team agreements, metrics, and retrospective checkpoints.

Common Pitfalls

Teams migrating to TBD commonly stumble on slow CI builds, incomplete feature flags, and treating branch renaming as real integration. See Common Pitfalls to Avoid for detailed guidance and fixes.

Measuring Success

Track these metrics to verify your TBD adoption:

MetricTargetWhy It Matters
Integration frequencyAt least 1 per developer per dayConfirms daily integration is happening
Branch age< 24 hoursCatches long-lived branches
Build duration< 10 minutesEnables frequent integration without frustration
Merge conflict frequencyDecreasing over timeConfirms small changes reduce conflicts

Next Step

Once your team is integrating to trunk daily, build the test suite that makes that integration trustworthy. Continue to Testing Fundamentals.

1.1 - TBD Migration Guide

A tactical guide for migrating from GitFlow or long-lived branches to trunk-based development, covering regulated environments, multi-team coordination, and common pitfalls.

Phase 1 - Foundations

This is a detailed companion to the Trunk-Based Development overview. It covers specific migration paths, regulated environment guidance, multi-team strategies, and concrete scenarios.

This guide walks you through migrating from GitFlow or long-lived branches to trunk-based development. It covers two paths (short-lived branches and direct trunk commits), essential practices, regulated-environment compliance, and common pitfalls.

Long-lived branches hide problems. TBD exposes them early, which is why it is the first step toward continuous integration.


Why Move to Trunk-Based Development?

Long-lived branches hide problems. TBD exposes them early, when they are cheap to fix.

Think of long-lived branches like storing food in a bunker: it feels safe until you open the door and discover half of it rotting. With TBD, teams check freshness every day.

To do CI, teams need:

  • Small changes integrated at least daily
  • Automated tests giving fast, deterministic feedback
  • A single source of truth: the trunk

If your branches live for more than a day or two, you aren’t doing continuous integration. You’re doing periodic integration at best. True CI requires at least daily integration to the trunk.


The First Step: Stop Letting Work Age

The biggest barrier isn’t tooling. It’s habits.

The first meaningful change is simple:

Stop letting branches live long enough to become problems.

Your first goal isn’t true TBD. It’s shorter-lived branches: changes that live for hours or a couple of days, not weeks.

That alone exposes dependency issues, unclear requirements, and missing tests, which is exactly the point. The pain tells you where improvement is needed.


Before You Start: What to Measure

You cannot improve what you don’t measure. Before changing anything, establish baseline metrics, so you can track actual progress.

Essential Metrics to Track Weekly

MetricWhat to TrackTarget
Branch LifetimeAverage time from branch creation to mergeReduce from weeks to days, then hours
Integration HealthMerge conflicts per week and time resolving themConflicts decrease as integration frequency increases
Delivery SpeedTime from commit to production deploymentDecrease time to production, increase deployment frequency
Quality IndicatorsBuild/test execution time, test failure rate, incidents per deploymentFast, reliable tests and stable deployments
Work DecompositionAverage pull request size (lines changed)Smaller, more focused changes

Start with just two or three of these. Don’t let measurement become its own project.


Path 1: Moving from Long-Lived Branches to Short-Lived Branches

When GitFlow habits are deeply ingrained, this is usually the least-threatening first step.

1. Collapse the Branching Model

Stop using:

  • develop
  • release branches that sit around for weeks
  • feature branches lasting a sprint or more

Move toward:

  • A single main (or trunk)
  • Temporary branches measured in hours or days

2. Integrate Every Few Days, Then Every Day

Set an explicit working agreement:

“Nothing lives longer than 48 hours.”

Once this feels normal, shorten it:

“Integrate at least once per day.”

If a change is too large to merge within a day or two, the problem isn’t the branching model. The problem is the decomposition of work.

3. Test Before You Code

Branch lifetime shortens when you stop guessing about expected behavior. Bring product, QA, and developers together before coding:

  • Write acceptance criteria collaboratively
  • Turn them into executable tests
  • Then write code to make those tests pass

You’ll discover misunderstandings upfront instead of after a week of coding.

This approach is called Behavior-Driven Development (BDD), a collaborative practice where teams define expected behavior in plain language before writing code. BDD bridges the gap between business requirements and technical implementation by using concrete examples that become executable tests.

Key BDD resources:

How to Run a Three Amigos Session

Participants: Product Owner, Developer, Tester (15-30 minutes per story)

Process:

  1. Product describes the user need and expected outcome
  2. Developer asks questions about edge cases and dependencies
  3. Tester identifies scenarios that could fail
  4. Together, write acceptance criteria as examples

Example:

BDD scenarios for password reset
Feature: User password reset

Scenario: Valid reset request
  Given a user with email "user@example.com" exists
  When they request a password reset
  Then they receive an email with a reset link
  And the link expires after 1 hour

Scenario: Invalid email
  Given no user with email "nobody@example.com" exists
  When they request a password reset
  Then they see "If the email exists, a reset link was sent"
  And no email is sent

Scenario: Expired link
  Given a user has a reset link older than 1 hour
  When they click the link
  Then they see "This reset link has expired"
  And they are prompted to request a new one

These scenarios become your automated acceptance tests before you write any implementation code.

From Acceptance Criteria to Tests

Turn those scenarios into executable tests in your framework of choice:

Acceptance tests for password reset scenarios
// Example using Jest and Supertest
describe('Password Reset', () => {
  it('sends reset email for valid user', async () => {
    await createUser({ email: 'user@example.com' });

    const response = await request(app)
      .post('/password-reset')
      .send({ email: 'user@example.com' });

    expect(response.status).toBe(200);
    expect(emailService.sentEmails).toHaveLength(1);
    expect(emailService.sentEmails[0].to).toBe('user@example.com');
  });

  it('does not reveal whether email exists', async () => {
    const response = await request(app)
      .post('/password-reset')
      .send({ email: 'nobody@example.com' });

    expect(response.status).toBe(200);
    expect(response.body.message).toBe('If the email exists, a reset link was sent');
    expect(emailService.sentEmails).toHaveLength(0);
  });
});

Now you can write the minimum code to make these tests pass. This drives smaller, more focused changes.

4. Invest in Contract Tests

Most merge pain isn’t from your code. It’s from the interfaces between services. Define interface changes early and codify them with provider/consumer contract tests.

This lets teams integrate frequently without surprises.


Path 2: Committing Directly to the Trunk

This is the cleanest and most powerful version of TBD. It requires discipline, but it produces the most stable delivery pipeline and the least drama.

If the idea of committing straight to main makes people panic, that’s a signal about your current testing process, not a problem with TBD.


How to Choose Your Path

Use this rule of thumb:

  • If your team fears “breaking everything,” start with short-lived branches.
  • If your team collaborates well and writes tests first, go straight to trunk commits.

Both paths require the same skills:

  • Smaller work
  • Better requirements
  • Shared understanding
  • Automated tests
  • A reliable pipeline

The difference is pace.


Essential TBD Practices

These practices apply to both paths, whether you’re using short-lived branches or committing directly to trunk.

Use Feature Flags the Right Way

Feature flags are one of several evolutionary coding practices that allow you to integrate incomplete work safely. Other methods include branch by abstraction and connect-last patterns.

Feature flags are not a testing strategy. They are a release strategy.

Every commit to trunk must:

  • Build
  • Test
  • Deploy safely

Flags let you deploy incomplete work without exposing it prematurely. They don’t excuse poor test discipline.

Start Simple: Boolean Flags

You don’t need a sophisticated feature flag system to start. Begin with environment variables or simple config files.

Simple boolean flag example:

Simple boolean feature flags via environment variables
// config/features.js
module.exports = {
  newCheckoutFlow: process.env.FEATURE_NEW_CHECKOUT === 'true',
  enhancedSearch: process.env.FEATURE_ENHANCED_SEARCH === 'true',
};

// In your code
const features = require('./config/features');

app.get('/checkout', (req, res) => {
  if (features.newCheckoutFlow) {
    return renderNewCheckout(req, res);
  }
  return renderOldCheckout(req, res);
});

This is enough for most TBD use cases.

Testing Code Behind Flags

Critical: You must test both code paths, flag on and flag off.

Testing both flag states - enabled and disabled
describe('Checkout flow', () => {
  describe('with new checkout flow enabled', () => {
    beforeEach(() => {
      features.newCheckoutFlow = true;
    });

    it('shows new checkout UI', () => {
      // Test new flow
    });
  });

  describe('with new checkout flow disabled', () => {
    beforeEach(() => {
      features.newCheckoutFlow = false;
    });

    it('shows legacy checkout UI', () => {
      // Test old flow
    });
  });
});

If you only test with the flag on, you’ll break production when the flag is off.

Keep Flags Short-Lived

For TBD, most flags are temporary release flags: they hide incomplete work during integration and get removed once the feature is stable (typically 1-4 weeks). Set a removal date when you create each flag, assign an owner, and treat unremoved flags as technical debt.

For a deeper taxonomy of flag types (release flags vs. permanent configuration flags) and lifecycle management practices, see the feature flag glossary entry.

Commit Small and Commit Often

If a change is too large to commit today, split it.

Large commits are failed design upstream, not failed integration downstream.

Use TDD and ATDD to Keep Refactors Safe

Refactoring must not break tests. If it does, you’re testing implementation, not behavior. Behavioral tests are what keep trunk commits safe.

Prioritize Interfaces First

Always start by defining and codifying the contract:

  • What is the shape of the request?
  • What is the response?
  • What error states must be handled?

Interfaces are the highest-risk area. Drive them with tests first. Then work inward.


Getting Started: A Tactical Guide

The initial phase sets the tone. Focus on establishing new habits, not perfection.

Step 1: Team Agreement and Baseline

  • Hold a team meeting to discuss the migration
  • Agree on initial branch lifetime limit (start with 48 hours if unsure)
  • Document current baseline metrics (branch age, merge frequency, build time)
  • Identify your slowest-running tests
  • Create a list of known integration pain points
  • Set up a visible tracker (physical board or digital dashboard) for metrics

Step 2: Test Infrastructure Audit

Focus: Find and fix what will slow you down.

  • Run your test suite and time each major section
  • Identify slow tests
  • Look for:
    • Tests with sleeps or arbitrary waits
    • Tests hitting external services unnecessarily
    • Integration tests that could be contract tests
    • Flaky tests masking real issues

Fix or isolate the worst offenders. You don’t need a perfect test suite to start, just one fast enough to not punish frequent integration.

Step 3: First Integrated Change

Pick the smallest possible change:

  • A bug fix
  • A refactoring with existing test coverage
  • A configuration update
  • Documentation improvement

The goal is to validate your process, not to deliver a feature.

Execute:

  1. Create a branch (if using Path 1) or commit directly (if using Path 2)
  2. Make the change
  3. Run tests locally
  4. Integrate to trunk
  5. Deploy through your pipeline
  6. Observe what breaks or slows you down

Step 4: Retrospective

Gather the team:

What went well:

  • Did anyone integrate faster than before?
  • Did you discover useful information about your tests or pipeline?

What hurt:

  • What took longer than expected?
  • What manual steps could be automated?
  • What dependencies blocked integration?

Ongoing commitment:

  • Adjust branch lifetime limit if needed
  • Assign owners to top 3 blockers
  • Commit to integrating at least one change per person

The initial phase won’t feel smooth. That’s expected. You’re learning what needs fixing.


Getting Your Team On Board

Technical changes are easy compared to changing habits and mindsets. Here’s how to build buy-in.

Acknowledge the Fear

When you propose TBD, you’ll hear:

  • “We’ll break production constantly”
  • “Our code isn’t good enough for that”
  • “We need code review on branches”
  • “This won’t work with our compliance requirements”

These concerns are valid signals about your current system. Don’t dismiss them.

Instead: “You’re right that committing directly to trunk with our current test coverage would be risky. That’s why we need to improve our tests first.”

Start with an Experiment

Don’t mandate TBD for the whole team immediately. Propose a time-boxed experiment:

The Proposal:

“Let’s try this for two weeks with a single small feature. We’ll track what goes well and what hurts. After two weeks, we’ll decide whether to continue, adjust, or stop.”

What to measure during the experiment:

  • How many times did we integrate?
  • How long did merges take?
  • Did we catch issues earlier or later than usual?
  • How did it feel compared to our normal process?

After two weeks: Hold a retrospective. Let the data and experience guide the decision.

Pair on the First Changes

Don’t expect everyone to adopt TBD simultaneously. Instead:

  1. Identify one advocate who wants to try it
  2. Pair with them on the first trunk-based changes
  3. Let them experience the process firsthand
  4. Have them pair with the next person

Knowledge transfer through pairing works better than documentation.

Address Code Review Concerns

“But we need code review!” Yes. TBD doesn’t eliminate code review.

Options that work:

  • Pair or mob programming (review happens in real-time)
  • Commit to trunk, review immediately after, fix forward if issues found
  • Very short-lived branches (hours, not days) with rapid review SLA
  • Pairing on code review and review change

The goal is fast feedback, not zero review.

Handle Skeptics and Blockers

You’ll encounter people who don’t want to change. Don’t force it.

Instead:

  • Let them observe the experiment from the outside
  • Share metrics and outcomes transparently
  • Invite them to pair for one change
  • Let success speak louder than arguments

Some people need to see it working before they believe it.

Get Management Support

Managers often worry about:

  • Reduced control
  • Quality risks
  • Slower delivery (ironically)

Address these with data:

  • Show branch age metrics before/after
  • Track cycle time improvements
  • Demonstrate faster feedback on defects
  • Highlight reduced merge conflicts

Frame TBD as a risk reduction strategy, not a risky experiment.


Working in a Multi-Team Environment

Migrating to TBD gets complicated when you depend on teams still using long-lived branches. Here’s how to handle it.

The Core Problem

You want to integrate daily. Your dependency team integrates weekly or monthly. Their API changes surprise you during their big-bang merge.

You can’t force other teams to change. But you can protect yourself.

Strategy 1: Consumer-Driven Contract Tests

Define the contract you need from the upstream service and codify it in tests that run in your pipeline.

Example using Pact:

Consumer-driven contract test using Pact
// Your consumer test
const { pact } = require('@pact-foundation/pact');

describe('User Service Contract', () => {
  it('returns user profile by ID', async () => {
    await provider.addInteraction({
      state: 'user 123 exists',
      uponReceiving: 'a request for user 123',
      withRequest: {
        method: 'GET',
        path: '/users/123',
      },
      willRespondWith: {
        status: 200,
        body: {
          id: 123,
          name: 'Jane Doe',
          email: 'jane@example.com',
        },
      },
    });

    const user = await userService.getUser(123);
    expect(user.name).toBe('Jane Doe');
  });
});

This test runs against your expectations of the API, not the actual service. When the upstream team changes their API, your contract test fails before you integrate their changes.

Share the contract:

  • Publish your contract to a shared repository
  • Upstream team runs provider verification against your contract
  • If they break your contract, they know before merging

Strategy 2: API Versioning with Backwards Compatibility

If you control the shared service:

API versioning for backwards-compatible multi-team integration
// Support both old and new API versions
app.get('/api/v1/users/:id', handleV1Users);
app.get('/api/v2/users/:id', handleV2Users);

// Or use content negotiation
app.get('/api/users/:id', (req, res) => {
  const version = req.headers['api-version'] || 'v1';
  if (version === 'v2') {
    return handleV2Users(req, res);
  }
  return handleV1Users(req, res);
});

Migration path:

  1. Deploy new version alongside old version
  2. Update consumers one by one
  3. After all consumers migrated, deprecate old version
  4. Remove old version after deprecation period

Strategy 3: Strangler Fig Pattern

When you depend on a team that won’t change:

  1. Create an anti-corruption layer between your code and theirs
  2. Define your ideal interface in the adapter
  3. Let the adapter handle their messy API
Strangler fig adapter to isolate a legacy dependency
// Your ideal interface
class UserRepository {
  async getUser(id) {
    // Your clean, typed interface
  }
}

// Adapter that deals with their mess
class LegacyUserServiceAdapter extends UserRepository {
  async getUser(id) {
    const response = await fetch(`https://legacy-service/users/${id}`);
    const messyData = await response.json();

    // Transform their format to yours
    return {
      id: messyData.user_id,
      name: `${messyData.first_name} ${messyData.last_name}`,
      email: messyData.email_address,
    };
  }
}

Now your code depends on your interface, not theirs. When they change, you only update the adapter.

Strategy 4: Feature Toggles for Cross-Team Coordination

When multiple teams need to coordinate a release:

  1. Each team develops behind feature flags
  2. Each team integrates to trunk continuously
  3. Features remain disabled until coordination point
  4. Enable flags in coordinated sequence

This decouples development velocity from release coordination.

When You Can’t Integrate with Dependencies

If upstream dependencies block you from integrating daily:

Short term:

  • Use contract tests to detect breaking changes early
  • Create adapters to isolate their changes
  • Document the integration pain as a business cost

Long term:

  • Advocate for those teams to adopt TBD
  • Share your success metrics
  • Offer to help them migrate

You can’t force other teams to change. But you can demonstrate a better way and make it easier for them to follow.


TBD in Regulated Environments

Regulated industries face legitimate compliance requirements: audit trails, change traceability, separation of duties, and documented approval processes. These requirements often lead teams to believe trunk-based development is incompatible with compliance. This is a misconception.

TBD is about integration frequency, not about eliminating controls. You can meet compliance requirements while still integrating at least daily.

The Compliance Concerns

Common regulatory requirements that seem to conflict with TBD:

Audit Trail and Traceability

  • Every change must be traceable to a requirement, ticket, or change request
  • Changes must be attributable to specific individuals
  • History of what changed, when, and why must be preserved

Separation of Duties

  • The person who writes code shouldn’t be the person who approves it
  • Changes must be reviewed before reaching production
  • No single person should have unchecked commit access

Change Control Process

  • Changes must follow a documented approval workflow
  • Risk assessment before deployment
  • Rollback capability for failed changes

Documentation Requirements

  • Changes must be documented before implementation
  • Testing evidence must be retained
  • Deployment procedures must be repeatable and auditable

Short-Lived Branches: The Compliant Path to TBD

Path 1 from this guide (short-lived branches) directly addresses compliance concerns while maintaining the benefits of TBD.

Short-lived branches mean:

  • Branches live for hours to 2 days maximum, not weeks or months
  • Integration happens at least daily
  • Pull requests are small, focused, and fast to review
  • Review and approval happen within the branch lifetime

This approach satisfies both regulatory requirements and continuous integration principles.

How Short-Lived Branches Meet Compliance Requirements

Audit Trail:

Every commit references the change ticket:

Commit message referencing compliance ticket
git commit -m "JIRA-1234: Add validation for SSN input

Implements requirement REQ-445 from Q4 compliance review.
Changes limited to user input validation layer."

Modern Git hosting platforms (GitHub, GitLab, Bitbucket) automatically track:

  • Who created the branch
  • Who committed each change
  • Who reviewed and approved
  • When it merged
  • Complete diff history

Separation of Duties:

Use pull request workflows:

  1. Developer creates branch from trunk
  2. Developer commits changes (same day)
  3. Second person reviews and approves (within 24 hours)
  4. Automated checks validate (tests, security scans, compliance checks)
  5. Merge to trunk after approval
  6. Automated deployment with gates

This provides stronger separation of duties than long-lived branches because:

  • Reviews happen while context is fresh
  • Reviewers can actually understand the small changeset
  • Automated checks enforce policies consistently

Change Control Process:

Branch protection rules enforce your process:

Example GitHub branch protection rules for trunk
# Example GitHub branch protection for trunk
required_reviews: 1
required_checks:
  - unit-tests
  - security-scan
  - compliance-validation
dismiss_stale_reviews: true
require_code_owner_review: true

This ensures:

  • No direct commits to trunk (except in documented break-glass scenarios)
  • Required approvals before merge
  • Automated validation gates
  • Audit log of every merge decision

Documentation Requirements:

Pull request templates enforce documentation:

Pull request template for compliance documentation
## Change Description
[Link to Jira ticket]

## Risk Assessment
- [ ] Low risk: Configuration only
- [ ] Medium risk: New functionality, backward compatible
- [ ] High risk: Database migration, breaking change

## Testing Evidence
- [ ] Unit tests added/updated
- [ ] Integration tests pass
- [ ] Manual testing completed (attach screenshots if UI change)
- [ ] Security scan passed

## Rollback Plan
[How to rollback if this causes issues in production]

What “Short-Lived” Means in Practice

Hours, not days:

  • Simple bug fixes: 2-4 hours
  • Small feature additions: 4-8 hours
  • Refactoring: 1-2 days

Maximum 2 days: If a branch can’t merge within 2 days, the work is too large. Decompose it further or use feature flags to integrate incomplete work safely.

Daily integration requirement: Even if the feature isn’t complete, integrate what you have:

  • Behind a feature flag if needed
  • As internal APIs not yet exposed
  • As tests and interfaces before implementation

Compliance-Friendly Tooling

Modern platforms provide compliance features built-in:

Git Hosting (GitHub, GitLab, Bitbucket):

  • Immutable audit logs
  • Branch protection rules
  • Required approvals
  • Status check enforcement
  • Signed commits for authenticity

Pipeline Platforms:

  • Deployment approval gates
  • Audit trails of every deployment
  • Environment-specific controls
  • Automated compliance checks

Feature Flag Systems:

  • Change deployment without code deployment
  • Gradual rollout controls
  • Instant rollback capability
  • Audit log of flag changes

Secrets Management:

  • Vault, AWS Secrets Manager, Azure Key Vault
  • Audit log of secret access
  • Rotation policies
  • Environment isolation

Example: Compliant Short-Lived Branch Workflow

Monday 9 AM: Developer creates branch feature/JIRA-1234-add-audit-logging from trunk.

Monday 9 AM to 2 PM: Developer implements audit logging for user authentication events. Commits reference JIRA-1234. Automated tests run on each commit.

Monday 2 PM: Developer opens pull request:

  • Title: “JIRA-1234: Add audit logging for authentication events”
  • Description includes risk assessment, testing evidence, rollback plan
  • Automated checks run: tests, security scan, compliance validation
  • Code owner automatically assigned for review

Monday 3 PM: Code owner reviews (5-10 minutes; change is small and focused). Suggests minor improvement.

Monday 3:30 PM: Developer addresses feedback, pushes update.

Monday 4 PM: Code owner approves. All automated checks pass. Developer merges to trunk.

Monday 4:05 PM: Pipeline deploys to staging automatically. Automated smoke tests pass.

Monday 4:30 PM: Deployment gate requires manual approval for production. Tech lead approves based on risk assessment.

Monday 4:35 PM: Automated deployment to production. Audit log captures: what deployed, who approved, when, what checks passed.

Total time: 7.5 hours from branch creation to production.

Full compliance maintained. Full audit trail captured. Daily integration achieved.

When Long-Lived Branches Hide Compliance Problems

Ironically, long-lived branches often create compliance risks:

Stale Reviews: Reviewing a 3-week-old, 2000-line pull request is performative, not effective. Reviewers rubber-stamp because they can’t actually understand the changes.

Integration Risk: Big-bang merges after weeks introduce unexpected behavior. The change that was reviewed isn’t the change that actually deployed (due to merge conflicts and integration issues).

Delayed Feedback: Problems discovered weeks after code was written are expensive to fix and hard to trace to requirements.

Audit Trail Gaps: Long-lived branches often have messy commit history, force pushes, and unclear attribution. The audit trail is polluted.

Regulatory Examples Where Short-Lived Branches Work

Financial Services (SOX, PCI-DSS):

  • Short-lived branches with required approvals
  • Automated security scanning on every PR
  • Separation of duties via required reviewers
  • Immutable audit logs in Git hosting platform
  • Feature flags for gradual rollout and instant rollback

Healthcare (HIPAA):

  • Pull request templates documenting PHI handling
  • Automated compliance checks for data access patterns
  • Required security review for any PHI-touching code
  • Audit logs of deployments
  • Environment isolation enforced by the pipeline

Government (FedRAMP, FISMA):

  • Branch protection requiring government code owner approval
  • Automated STIG compliance validation
  • Signed commits for authenticity
  • Deployment gates requiring authority to operate
  • Complete audit trail from commit to production

What Will Hurt (At First)

When you migrate to TBD, you’ll expose every weakness you’ve been avoiding:

  • Slow tests
  • Unclear requirements
  • Fragile integration points
  • Architecture that resists small changes
  • Gaps in automated validation
  • Long manual processes in the value stream

This is not a regression. This is the point.

Problems you discover early are problems you can fix cheaply.


Common Pitfalls to Avoid

Teams migrating to TBD often make predictable mistakes. The table below summarizes all ten; the three most critical are expanded afterward.

PitfallCategoryWhat to Do Instead
Renaming branches without changing habitsProcessFocus on integration frequency, not branch names
Merging daily without testing integration pointsTestingUse contract tests; integrate at the interface level, not just source control
Skipping test investmentTestingInvest in test infrastructure before increasing integration frequency
Using flags as a testing escape hatchFeature FlagsTest both flag states; flags hide features from users, not from your test suite
Keeping flags foreverFeature FlagsSet a removal date at creation; track flags like technical debt
Forcing TBD on an unprepared teamChange ManagementStart with volunteers, run experiments, let success create pull
Ignoring work decompositionProcessDecompose work into smaller, independently valuable increments
No clear definition of “done”ProcessDefine “integrated” as deployed to a production-like environment and validated
Treating trunk as unstableProcessTrunk must always be production-ready; fix broken builds immediately
Forgetting TBD is a means, not an endOutcomesMeasure cycle time, defect rates, and deployment frequency, not just commit counts

Pitfall 1: Treating TBD as Just a Branch Renaming Exercise

The mistake: Renaming develop to main and calling it TBD.

Why it fails: You’re still doing long-lived feature branches, just with different names. The fundamental integration problems remain.

What to do instead: Focus on integration frequency, not branch names. Measure time-to-merge, not what you call your branches.

Pitfall 2: Merging Daily Without Actually Integrating

The mistake: Committing to trunk every day, but your code doesn’t interact with anyone else’s work. Your tests don’t cover integration points.

Why it fails: You’re batching integration for later. When you finally connect your component to the rest of the system, you discover incompatibilities.

What to do instead: Ensure your tests exercise the boundaries between components. Use contract tests for service interfaces. Integrate at the interface level, not just at the source control level.

Pitfall 5: Keeping Flags Forever

The mistake: Creating feature flags and never removing them. Your codebase becomes a maze of conditionals.

Why it fails: Every permanent flag doubles your testing surface area and increases complexity. Eventually, no one knows which flags do what.

What to do instead: Set a removal date when creating each flag. Track flags like technical debt. Remove them aggressively once features are stable.


When to Pause or Pivot

Sometimes TBD migration stalls or causes more problems than it solves. Here’s how to tell if you need to pause and what to do about it.

Signs You’re Not Ready Yet

Red flag 1: Your test suite takes hours to run If developers can’t get feedback in minutes, they can’t integrate frequently. Forcing TBD now will just slow everyone down.

What to do: Pause the TBD migration. Invest 2-4 weeks in making tests faster. Parallelize test execution. Remove or optimize the slowest tests. Resume TBD when feedback takes less than 10 minutes.

Red flag 2: More than half your tests are flaky If tests fail randomly, developers will ignore failures. You’ll integrate broken code without realizing it.

What to do: Stop adding new features. Spend one sprint fixing or deleting flaky tests. Track flakiness metrics. Only resume TBD when you trust your test results.

Red flag 3: Production incidents increased significantly If TBD caused a spike in production issues, something is wrong with your safety net.

What to do: Revert to short-lived branches (48-72 hours) temporarily. Analyze what’s escaping to production. Add tests or checks to catch those issues. Resume direct-to-trunk when the safety net is stronger.

Red flag 4: The team is in constant conflict If people are fighting about the process, frustrated daily, or actively working around it, you’ve lost the team.

What to do: Hold a retrospective. Listen to concerns without defending TBD. Identify the top 3 pain points. Address those first. Resume TBD migration when the team agrees to try again.

Signs You’re Doing It Wrong (But Can Fix It)

Yellow flag 1: Daily commits, but monthly integration You’re committing to trunk, but your code doesn’t connect to the rest of the system until the end.

What to fix: Focus on interface-level integration. Ensure your tests exercise boundaries between components. Use contract tests.

Yellow flag 2: Trunk is broken often If trunk is red more than 5% of the time, something’s wrong with your testing or commit discipline.

What to fix: Make “fix trunk immediately” the top priority. Consider requiring local tests to pass before pushing. Add pre-commit hooks if needed.

Yellow flag 3: Feature flags piling up If you have more than 5 active flags, you’re not cleaning up after yourself.

What to fix: Set a team rule: “For every new flag created, remove an old one.” Dedicate time each sprint to flag cleanup.

How to Pause Gracefully

If you need to pause:

  1. Communicate clearly: “We’re pausing TBD migration for two weeks to fix our test infrastructure. This isn’t abandoning the goal.”

  2. Set a specific resumption date: Don’t let “pause” become “quit.” Schedule a date to revisit.

  3. Fix the blockers: Use the pause to address the specific problems preventing success.

  4. Retrospect and adjust: When you resume, what will you do differently?

Pausing isn’t failure. Pausing to fix the foundation is smart.


What “Good” Looks Like

You know TBD is working when:

  • Branches live for hours, not days
  • Developers collaborate early instead of merging late
  • Product participates in defining behaviors, not just writing stories
  • Tests run fast enough to integrate frequently
  • Deployments are boring
  • You can fix production issues with the same process you use for normal work

When your deployment process enables emergency fixes without special exceptions, you’ve reached the real payoff: lower cost of change, which makes everything else faster, safer, and more sustainable.


Concrete Examples and Scenarios

Theory is useful. Examples make it real. Here are practical scenarios showing how to apply TBD principles.

Scenario 1: Breaking Down a Large Feature

Problem: You need to build a user notification system with email, SMS, and in-app notifications. Estimated: 3 weeks of work.

Old approach (GitFlow): Create a feature/notifications branch. Work for three weeks. Submit a massive pull request. Spend days in code review and merge conflicts.

TBD approach:

First commit: Define notification interface, commit to trunk

Day 1: NotificationService contract
// notifications/NotificationService.js
// Contract: all implementations must provide send(userId, message)
// message shape: { title, body, priority } where priority is 'low', 'normal', or 'high'

class NotificationService {
  async send(userId, message) {
    throw new Error('Not implemented');
  }
}

This compiles but doesn’t do anything yet. That’s fine.

Next commit: Add in-memory implementation for testing

Day 2: InMemoryNotificationService
class InMemoryNotificationService extends NotificationService {
  constructor() {
    super();
    this.notifications = [];
  }

  async send(userId, message) {
    this.notifications.push(message);
  }
}

Now other teams can use the interface in their code and tests.

Then: Implement email notifications behind a feature flag

Days 3-5: EmailNotificationService behind a flag
class EmailNotificationService extends NotificationService {
  async send(userId, message) {
    if (!features.emailNotifications) {
      return; // No-op when disabled
    }
    // Real email sending implementation
  }
}

Commit daily. Deploy. Flag is off in production.

Continue iterating:

  • Add SMS notifications (same pattern: interface, implementation, feature flag)
  • Enable email notifications for internal users only
  • Add in-app notifications
  • Roll out email and SMS to all users
  • Remove flags for email once stable

Result: Integrated 12-15 times instead of once. Each integration was small and low-risk.

Scenario 2: Database Schema Change

Problem: You need to split the users.name column into first_name and last_name.

Old approach: Update schema, update all code, deploy everything at once. Hope nothing breaks.

TBD approach (expand-contract pattern):

Step 1: Expand Add new columns without removing the old one:

Step 1: add new columns alongside the old one
ALTER TABLE users ADD COLUMN first_name VARCHAR(255);
ALTER TABLE users ADD COLUMN last_name VARCHAR(255);

Commit and deploy. Application still uses name column. No breaking change.

Step 2: Dual writes Update write path to populate both old and new columns:

Step 2: write to both old and new columns
async function createUser(name) {
  const [firstName, lastName] = name.split(' ');
  await db.query(
    'INSERT INTO users (name, first_name, last_name) VALUES (?, ?, ?)',
    [name, firstName, lastName]
  );
}

Commit and deploy. Now new data populates both formats.

Step 3: Backfill Migrate existing data in the background:

Step 3: backfill existing rows
async function backfillNames() {
  const users = await db.query('SELECT id, name FROM users WHERE first_name IS NULL');
  for (const user of users) {
    const [firstName, lastName] = user.name.split(' ');
    await db.query(
      'UPDATE users SET first_name = ?, last_name = ? WHERE id = ?',
      [firstName, lastName, user.id]
    );
  }
}

Run this as a background job. Commit and deploy.

Step 4: Read from new columns Update read path behind a feature flag:

Step 4: read from new columns behind a flag
async function getUser(id) {
  const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
  if (features.useNewNameColumns) {
    return {
      firstName: user.first_name,
      lastName: user.last_name,
    };
  }
  return { name: user.name };
}

Deploy and gradually enable the flag.

Step 5: Contract Once all reads use new columns and flag is removed:

Step 5: drop the old column
ALTER TABLE users DROP COLUMN name;

Result: Five deployments instead of one big-bang change. Each step was reversible. Zero downtime.

Scenario 3: Refactoring Without Breaking the World

Problem: Your authentication code is a mess. You want to refactor it without breaking production.

TBD approach:

Characterization tests Write tests that capture current behavior (warts and all):

Characterization tests for existing auth behavior
describe('Current auth behavior', () => {
  it('accepts password with special characters', () => {
    // Document what currently happens
  });

  it('handles malformed tokens by returning 401', () => {
    // Capture edge case behavior
  });
});

These tests document how the system actually works. Commit.

Strangler fig pattern Create new implementation alongside old one:

Strangler fig - new implementation alongside old
class LegacyAuthService {
  // Existing messy code (don't touch it)
}

class ModernAuthService {
  // Clean implementation
}

class AuthServiceRouter {
  constructor(legacy, modern) {
    this.legacy = legacy;
    this.modern = modern;
  }

  async authenticate(credentials) {
    if (features.modernAuth) {
      return this.modern.authenticate(credentials);
    }
    return this.legacy.authenticate(credentials);
  }
}

Commit with flag off. Old behavior unchanged.

Migrate piece by piece Enable modern auth for one endpoint at a time:

Enable modern auth per endpoint
if (features.modernAuth && endpoint === '/api/users') {
  return modernAuth.authenticate(credentials);
}

Commit daily. Monitor each endpoint.

Remove old code Once all endpoints use modern auth and it has been stable:

Remove the legacy implementation
class AuthService {
  async authenticate(credentials) {
    // Just the modern implementation
  }
}

Delete the legacy code entirely.

Result: Continuous refactoring without a “big rewrite” branch. Production was never at risk.

Scenario 4: Working with External API Changes

Problem: A third-party API you depend on is changing their response format next month.

TBD approach:

Adapter pattern Create an adapter that normalizes both old and new formats:

Adapter handling both old and new API formats
class PaymentAPIAdapter {
  async getPaymentStatus(orderId) {
    const response = await fetch(`https://api.payments.com/orders/${orderId}`);
    const data = await response.json();

    // Handle both old and new format
    if (data.payment_status) {
      // Old format
      return {
        status: data.payment_status,
        amount: data.total_amount,
      };
    } else {
      // New format
      return {
        status: data.status.payment,
        amount: data.amounts.total,
      };
    }
  }
}

Commit. Your code now works with both formats.

After the API migration: Simplify adapter to only handle new format:

Simplified adapter for new format only
async getPaymentStatus(orderId) {
  const response = await fetch(`https://api.payments.com/orders/${orderId}`);
  const data = await response.json();
  return {
    status: data.status.payment,
    amount: data.amounts.total,
  };
}

Result: No coupling between your deployment schedule and the external API migration. Zero downtime.


References and Further Reading


Final Thought

Migrating from GitFlow to TBD isn’t a matter of changing your branching strategy. It’s a matter of changing your thinking.

Stop optimizing for isolation. Start optimizing for feedback.

Small, tested, integrated changes, delivered continuously, will always outperform big batches delivered occasionally.

That’s why teams migrate to TBD. Not because it’s trendy, but because it’s the only path to real continuous integration and continuous delivery.

2 - Testing Fundamentals

Build a test architecture that gives your pipeline the confidence to deploy any change, even when dependencies outside your control are unavailable.

Phase 1 - Foundations

Continuous delivery requires that trunk always be releasable, which means testing it automatically on every change. A collection of tests is not enough. You need a test architecture: different test types working together so the pipeline can confidently deploy any change, even when external systems are unavailable.

Testing Goals for CD

Your test suite must meet these goals before it can support continuous delivery.

GoalTargetHow to Measure
FastCI gating tests < 10 minutes; full acceptance suite < 1 hourCI gating suite duration; full acceptance suite duration
DeterministicSame code always produces the same resultFlaky test count: 0 in the gating suite
Catches real bugsTests fail when behavior is wrong, not when implementation changesDefect escape rate trending down
Independent of external systemsPipeline can determine deployability without any dependency being availableExternal dependencies in gating tests: 0
Test doubles stay currentContract tests confirm test doubles match realityAll contract tests passing within last 24 hours
Coverage trends upEvery new change gets a testCoverage percentage increasing over time

In This Section

PageWhat You’ll Learn
What to TestWhich boundaries matter and how to eliminate external dependencies from your pipeline
Pipeline Test StrategyWhat tests run where in a CD pipeline and how contract tests validate test doubles
Getting StartedAudit your current suite, fix flaky tests, and decouple from external systems
Defect Feedback LoopTrace defects to their origin and prevent entire categories of bugs

The Ice Cream Cone: What to Avoid

An inverted test distribution, with too many slow end-to-end tests and too few fast unit tests, is the most common testing barrier to CD.

The ice cream cone anti-pattern: an inverted test distribution where most testing effort goes to manual and end-to-end tests at the top, with too few fast unit tests at the bottom

The ice cream cone makes CD impossible. Manual testing gates block every release. End-to-end tests take hours, fail randomly, and depend on external systems being healthy. For the test architecture that replaces this, see Pipeline Test Strategy and the Testing reference.

Next Step

Automate your build process so that building, testing, and packaging happen with a single command. Continue to Build Automation.


Content contributed by Dojo Consortium, licensed under CC BY 4.0. Additional concepts drawn from Ham Vocke, The Practical Test Pyramid, and Toby Clemson, Testing Strategies in a Microservice Architecture.


2.1 - What to Test - and What Not To

The principles that determine what belongs in your test suite and what does not - focusing on interfaces, isolating what you control, and applying the same pattern to frontend and backend.

Three principles determine what belongs in your test suite and what does not.

If you cannot fix it, do not test for it

You should never test the behavior of services you consume. Testing their behavior is the responsibility of the team that builds them. If their service returns incorrect data, you cannot fix that, so testing for it is waste.

What you should test is how your system responds when a consumed service is unstable or unavailable. Can you degrade gracefully? Do you return a meaningful error? Do you retry appropriately? These are behaviors you own and can fix, so they belong in your test suite.

This principle directly enables the pipeline test strategy. When you stop testing things you cannot fix, you stop depending on external systems in your pipeline. Your tests become faster, more deterministic, and more focused on the code your team actually ships.

Test interfaces first

Most integration failures originate at interfaces, the boundaries where your system talks to other systems. These boundaries are the highest-risk areas in your codebase, and they deserve the most testing attention. But testing interfaces does not require integrating with the real system on the other side.

When you test an interface you consume, the question is: “Can I understand the response and act accordingly?” If you send a request for a user’s information, you do not test that you get that specific user back. You test that you receive and understand the properties you need - that your code can parse the response structure and make correct decisions based on it. This distinction matters because it keeps your tests deterministic and focused on what you control.

Use contract mocks, virtual services, or any test double that faithfully represents the interface contract. The test validates your side of the conversation, not theirs.

Frontend and backend follow the same pattern

Both frontend and backend applications provide interfaces to consumers and consume interfaces from providers. The only difference is the consumer: a frontend provides an interface for humans, while a backend provides one for machines. The testing strategy is the same.

Test frontend code the same way you test backend code: validate the interface you provide, test logic in isolation, and verify that user actions trigger the correct behavior. The only difference is the consumer (a human instead of a machine).

For a frontend:

  • Validate the interface you provide. The UI contains the components it should and they appear correctly. This is the equivalent of verifying your API returns the right response structure.
  • Test behavior isolated from presentation. Use your unit test framework to test the logic that UI controls trigger, separated from the rendering layer. This gives you the same speed and control you get from testing backend logic in isolation.
  • Verify that controls trigger the right logic. Confirm that user actions invoke the correct behavior, without needing a running backend or browser-based E2E test.

This approach gives you targeted testing with far more control. Testing exception flows - what happens when a service returns an error, when a network request times out, when data is malformed, becomes straightforward instead of requiring elaborate E2E setups that are hard to make fail on demand.

Test Quality Over Coverage Percentage

Code coverage tells you which lines executed during tests. It does not tell you whether the tests verified anything meaningful. A test suite with 90% coverage and no assertions has high coverage and zero value.

Better questions than “what is our coverage percentage?”:

  • When a test fails, does it point directly to the defect?
  • When we refactor, do tests break because behavior changed or because implementation details shifted?
  • Do our tests catch the bugs that actually reach production?
  • Can a developer trust a green build enough to deploy immediately?

Why coverage mandates are harmful

When teams are required to hit a coverage target, they write tests to satisfy the metric rather than to verify behavior. This produces:

  • Tests that exercise code paths without asserting outcomes
  • Tests that mirror implementation rather than specify behavior
  • Tests that inflate the number without improving confidence

The metric goes up while the defect escape rate stays the same. Worse, meaningless tests add maintenance cost and slow down the suite.

Instead of mandating a coverage number, set a coverage floor (see Getting Started) and focus team attention on test quality: mutation testing scores, defect escape rates, and whether developers actually trust the suite enough to deploy on green.


2.2 - Pipeline Test Strategy

What tests run where in a CD pipeline, how contract tests validate the test doubles used inside the pipeline, and why everything that blocks deployment must be deterministic.

Everything that blocks deployment must be deterministic and under your control. Everything that involves external systems runs asynchronously or post-deployment. This gives you the independence to deploy any time, regardless of the state of the world around you.

Tests Inside the Pipeline

These tests run on every commit and block deployment if they fail. They must be fast, deterministic, and free of external dependencies.

Tests inside the pipeline: pre-merge stage runs static analysis, unit tests, integration tests, and component tests in under 10 minutes. Post-merge re-runs the full deterministic suite. All external dependencies are replaced by test doubles.

Every test in this pipeline uses test doubles for external dependencies. No test calls a real external API, database, or third-party service. This means:

  • A downstream outage cannot block your deployment. Your pipeline runs the same whether external systems are healthy or down.
  • Tests are deterministic. The same code always produces the same result.
  • The suite is fast. No network latency, no waiting for external systems to respond.

Why re-run tests post-merge?

Two changes can each pass pre-merge independently but conflict when combined on trunk. The post-merge run catches these integration effects. If a post-merge failure occurs, the team fixes it immediately. Trunk must always be releasable.

Tests Outside the Pipeline

These tests involve real external systems and are therefore non-deterministic. They never block deployment. Instead, they validate assumptions and monitor production health.

Tests outside the pipeline: contract tests run on a schedule to validate test doubles against real APIs. Post-deployment runs E2E smoke tests and synthetic monitoring. Failures trigger test double updates, rollback, or alerts - never block deployment.
Test TypeWhen It RunsWhat It Does on Failure
Contract testsOn a schedule (hourly or daily)Triggers review; team updates test doubles to match new reality
E2E smoke testsAfter each deploymentTriggers rollback if critical path is broken
Synthetic monitoringContinuously in productionTriggers alerts for operations

How Contract Tests Validate Test Doubles

The pipeline’s deterministic tests depend on test doubles to represent external systems. But test doubles can drift from reality. An API adds a required field, changes a response format, or deprecates an endpoint. Contract tests close this gap.

How contract tests validate test doubles: inside the pipeline, your code calls test doubles that return canned responses. Outside the pipeline, contract tests send real requests to external APIs and compare the response schema against test double definitions. A match confirms accuracy; a mismatch triggers an alert to update test doubles and re-verify.
  1. Pipeline tests use test doubles that encode your assumptions about external APIs - response schemas, status codes, error formats.
  2. Contract tests run on a schedule and send real requests to the actual external APIs.
  3. Contract tests compare the real response against what your test doubles return. They check structure and types, not specific data values.
  4. When a contract test passes, your test doubles are confirmed accurate. The pipeline’s deterministic tests are trustworthy.
  5. When a contract test fails, the team is alerted. They update the test doubles to match the new reality, then re-run component tests to verify nothing breaks.

This design means your pipeline never touches external systems, but you still catch when external systems change. You get both speed and accuracy.

Consumer-driven contracts

When the external API is owned by another team in your organization, you can go further with consumer-driven contracts. Instead of your team polling their API on a schedule, both teams share a contract specification (using a tool like Pact):

  • You (the consumer) define the requests you send and the responses you expect.
  • They (the provider) run your contract as part of their build. If a change would break your expectations, their build fails before they deploy.
  • Your test doubles are generated from the contract, guaranteeing they match what the provider actually delivers.

This shifts contract validation from “detect and react” to “prevent.” See Contract Tests for implementation details.

Summary: All Stages at a Glance

StageBlocks Deployment?Uses Test Doubles?Deterministic?
Every CommitYesYes - all external depsYes
Post-MergeYesYes - all external depsYes
Scheduled (Contract)No - triggers reviewNo - hits real APIsNo
Post-Deploy (E2E)No - triggers rollbackNo - real systemNo
Production (Monitoring)No - triggers alertsNo - real systemNo

The Testing reference provides detailed documentation for each test type, including code examples and anti-patterns.


2.3 - Getting Started

Practical steps to audit your test suite, fix flaky tests, decouple from external dependencies, and adopt test-driven development.

Starting Without Full Coverage

Teams often delay adopting CI because their existing code lacks tests. This is backwards. You do not need tests for existing code to begin. You need one rule applied without exception:

Every new change gets a test. We will not go lower than the current level of code coverage.

Record your current coverage percentage as a baseline. Configure CI to fail if coverage drops below that number. This does not mean the baseline is good enough. It means the trend only moves in one direction. Every bug fix, every new feature, and every refactoring adds tests. Over time, coverage grows organically in the areas that matter most: the code that is actively changing.

Do not attempt to retrofit tests across the entire codebase before starting CI. That approach takes months and delivers no incremental value. It also produces low-quality tests written by developers who are testing code they did not write and do not fully understand.

Quick-Start Action Plan

If your test suite is not yet ready to support CD, use this focused action plan to make immediate progress.

1. Audit your current test suite

Assess where you stand before making changes.

Actions:

  • Run your full test suite 3 times. Note total duration and any tests that pass intermittently (flaky tests).
  • Count tests by type: unit, integration, functional, end-to-end.
  • Identify tests that require external dependencies (databases, APIs, file systems) to run.
  • Record your baseline: total test count, pass rate, duration, flaky test count.
  • Map each test type to a pipeline stage. Which tests gate deployment? Which run asynchronously? Which tests couple your deployment to external systems?

Output: A clear picture of your test distribution and the specific problems to address.

2. Fix or remove flaky tests

Flaky tests are worse than no tests. They train developers to ignore failures, which means real failures also get ignored.

Actions:

  • Quarantine all flaky tests immediately. Move them to a separate suite that does not block the build.
  • For each quarantined test, decide: fix it (if the behavior it tests matters) or delete it (if it does not).
  • Common causes of flakiness: timing dependencies, shared mutable state, reliance on external services, test order dependencies.
  • Target: zero flaky tests in your main test suite.

3. Decouple your pipeline from external dependencies

This is the highest-leverage change for CD. Identify every test that calls a real external service and replace that dependency with a test double.

Actions:

  • List every external service your tests depend on: databases, APIs, message queues, file storage, third-party services.
  • For each dependency, decide the right test double approach:
    • In-memory fakes for databases (e.g., SQLite, H2, testcontainers with local instances).
    • HTTP stubs for external APIs (e.g., WireMock, nock, MSW).
    • Fakes for message queues, email services, and other infrastructure.
  • Replace the dependencies in your unit and component tests.
  • Move the original tests that hit real services into a separate suite. These become your starting contract tests or E2E smoke tests.

Output: A test suite where everything that blocks the build is deterministic and runs without network access to external systems.

4. Add component tests for critical paths

If you do not have component tests that exercise your whole service in isolation, start with the most critical paths.

Actions:

  • Identify the 3-5 most critical user journeys or API endpoints in your application.
  • Write a component test for each: boot the application, stub external dependencies, send a real request or simulate a real user action, verify the response.
  • Each component test should prove that the feature works correctly assuming external dependencies behave as expected (which your test doubles encode).
  • Run these in CI on every commit.

Output: Component tests covering your critical paths, running in CI on every commit.

5. Set up contract tests for your most important dependency

Pick the external dependency that changes most frequently or has caused the most production issues. Set up a contract test for it.

Actions:

  • Write a contract test that validates the response structure (types, required fields, status codes) of the dependency’s API.
  • Run it on a schedule (e.g., every hour or daily), not on every commit.
  • When it fails, update your test doubles to match the new reality and re-verify your component tests.
  • If the dependency is owned by another team in your organization, explore consumer-driven contracts with a tool like Pact.

Output: One contract test running on a schedule, with a process to update test doubles when it fails.

6. Adopt TDD for new code

Once your pipeline tests are reliable, adopt TDD for all new work. TDD is the practice of writing the test before the code. It ensures every piece of behavior has a corresponding test.

The TDD cycle

  1. Red: Write a failing test that describes the behavior you want.
  2. Green: Write the minimum code to make the test pass.
  3. Refactor: Improve the code without changing the behavior. The test ensures you do not break anything.

Why TDD matters for CD

  • Every change is automatically covered by a test
  • The test suite grows proportionally with the codebase
  • Tests describe behavior, not implementation, making them more resilient to refactoring
  • Developers get immediate feedback on whether their change works

TDD is not mandatory for CD, but teams that practice TDD consistently have significantly faster and more reliable test suites.

How to start: Pick one new feature or bug fix this week. Write the test first, watch it fail, write the code to make it pass, then refactor. Do not try to retroactively TDD your entire codebase. Apply TDD to new code and to any code you modify.

Output: Team members practicing TDD on new work, with at least one completed red-green-refactor cycle.


2.4 - Defect Feedback Loop

How to trace defects to their origin and make systemic changes that prevent entire categories of bugs from recurring.

Treat every test failure as diagnostic data about where your process breaks down, not just as something to fix. When you identify the systemic source of defects, you can prevent entire categories from recurring.

Two questions sharpen this thinking:

  1. What is the earliest point we can detect this defect? The later a defect is found, the more expensive it is to fix. A requirements defect caught during example mapping costs minutes. The same defect caught in production costs days of incident response, rollback, and rework.
  2. Can AI help us detect it earlier? AI-assisted tools can now surface defects at stages where only human review was previously possible, shifting detection left without adding manual effort.

Trace Every Defect to Its Origin

When a test catches a defect (or worse, when a defect escapes to production) ask: where was this defect introduced, and what would have prevented it from being created?

Defects do not originate randomly. They cluster around specific causes. The CD Defect Detection and Remediation Catalog documents over 30 defect types across eight categories, with detection methods, AI opportunities, and systemic fixes for each.

CategoryExample DefectsEarliest DetectionSystemic Fix
RequirementsBuilding the right thing wrong, or the wrong thing rightDiscovery, during story refinement or example mappingAcceptance criteria as user outcomes, Three Amigos sessions, example mapping
Missing domain knowledgeBusiness rules encoded incorrectly, tribal knowledge lossDuring coding, when the developer writes the logicUbiquitous language (DDD), pair programming, rotate ownership
Integration boundariesInterface mismatches, wrong assumptions about upstream behaviorDuring design, when defining the interface contractContract tests per boundary, API-first design, circuit breakers
Untested edge casesNull handling, boundary values, error pathsPre-commit, through null-safe type systems and static analysisProperty-based testing, boundary value analysis, test for every bug fix
Unintended side effectsChange to module A breaks module BAt commit time, when CI runs the full test suiteSmall commits, trunk-based development, feature flags, modular design
Accumulated complexityDefects cluster in the most complex, most-changed filesContinuously, through static analysis in the IDE and CIRefactoring as part of every story, dedicated complexity budget
Process and deploymentLong-lived branches, manual pipeline steps, excessive batchingPre-commit for branch age; CI for pipeline and batching issuesTrunk-based development, automate every step, blue/green or canary deploys
Data and stateNull pointer exceptions, schema migration failures, concurrency issuesPre-commit for null safety; CI for schema compatibilityNull-safe types, expand-then-contract for schema changes, design for idempotency

For the complete catalog covering all defect categories (including product and discovery, dependency and infrastructure, testing and observability gaps, and more) see the CD Defect Detection and Remediation Catalog.

Build a Defect Feedback Loop

You need a process that systematically connects test failures to root causes and root causes to systemic fixes.

  1. Classify every defect. When a test fails or a bug is reported, tag it with its origin category from the tables above. This takes seconds and builds a dataset over time.
  2. Look for patterns. Monthly (or during retrospectives), review the defect classifications. Which categories appear most often? That is where your process is weakest.
  3. Apply the systemic fix, not just the local fix. When you fix a bug, also ask: what systemic change would prevent this entire category of bug? If most defects come from integration boundaries, the fix is not “write more integration tests.” It is “make contract tests mandatory for every new boundary.” If most defects come from untested edge cases, the fix is not “increase code coverage.” It is “adopt property-based testing as a standard practice.”
  4. Measure whether the fix works. Track defect counts by category over time. If you applied a systemic fix for integration boundary defects and the count does not drop, the fix is not working and you need a different approach.

The Test-for-Every-Bug-Fix Rule

Every bug fix must include a test that reproduces the bug before the fix and passes after. This is non-negotiable for CD because:

  • It proves the fix actually addresses the defect (not just the symptom).
  • It prevents the same defect from recurring.
  • It builds test coverage exactly where the codebase is weakest: the places where bugs actually occur.
  • Over time, it shifts your test suite from “tests we thought to write” to “tests that cover real failure modes.”

Advanced Detection Techniques

As your test architecture matures, add techniques that catch defects before manual review:

TechniqueWhat It FindsWhen to Adopt
Mutation testing (Stryker, PIT)Tests that pass but do not actually verify behavior (your test suite’s blind spots)When basic coverage is in place but defect escape rate is not dropping
Property-based testingEdge cases and boundary conditions across large input spaces that example-based tests missWhen defects cluster around unexpected input combinations
Chaos engineeringFailure modes in distributed systems: what happens when a dependency is slow, returns errors, or disappearsWhen you have component tests and contract tests in place and need confidence in failure handling
Static analysis and lintingNull safety violations, type errors, security vulnerabilities, dead codeFrom day one. These are cheap and fast

For more examples of mapping defect origins to detection methods and systemic corrections, see the CD Defect Detection and Remediation Catalog.


3 - Build Automation

Automate your build process so a single command builds, tests, and packages your application.

Build automation is the single-command loop that makes CI possible. If you cannot build, test, and package with one command, you cannot automate your pipeline.

What Build Automation Means

A single command (or CI trigger) executes the entire sequence from source code to deployable artifact:

  1. Compile the source code (if applicable)
  2. Run all automated tests
  3. Package the application into a deployable artifact (container image, binary, archive)
  4. Report the result (pass or fail, with details)

No manual steps. No “run this script, then do that.” No tribal knowledge about which flags to set or which order to run things. One command, every time, same result.

The Litmus Test

Ask yourself: “Can a new team member clone the repository and produce a deployable artifact with a single command within 15 minutes?”

If the answer is no, your build is not fully automated.

Why Build Automation Matters for CD

Without build automation, every other practice in this guide breaks down. You cannot have continuous integration if the build requires manual intervention. You cannot have a deterministic pipeline if the build produces different results depending on who runs it.

CD RequirementHow Build Automation Supports It
ReproducibilityThe same commit always produces the same artifact, on any machine
SpeedAutomated builds can be optimized, cached, and parallelized
ConfidenceIf the build passes, the artifact is trustworthy
Developer experienceDevelopers run the same build locally that CI runs, eliminating “works on my machine”
Pipeline foundationThe CD pipeline is just the build running automatically on every commit

Key Practices

1. Version-Controlled Build Scripts

Your build configuration lives in the same repository as your code. It is versioned, reviewed, and tested alongside the application.

What belongs in version control:

  • Build scripts (Makefile, build.gradle, package.json scripts, Dockerfile)
  • Dependency manifests (requirements.txt, go.mod, pom.xml, package-lock.json)
  • Pipeline definitions (.github/workflows, .gitlab-ci.yml, Jenkinsfile)
  • Environment setup scripts (docker-compose.yml for local development)

What does not belong in version control:

  • Secrets and credentials (use secret management tools)
  • Environment-specific configuration values (use environment variables or config management)
  • Generated artifacts (build outputs, compiled binaries)

Anti-pattern: Build instructions that exist only in a wiki, a Confluence page, or one developer’s head. If the build steps are not in the repository, they will drift from reality.

2. Dependency Management

All dependencies must be declared explicitly and resolved deterministically.

Practices:

  • Lock files: Use lock files (package-lock.json, Pipfile.lock, go.sum) to pin exact dependency versions. Check lock files into version control.
  • Reproducible resolution: Running the dependency install twice should produce identical results.
  • No undeclared dependencies: Your build should not rely on tools or libraries that happen to be installed on the build machine. If you need it, declare it.
  • Dependency scanning: Automate vulnerability scanning of dependencies as part of the build. Do not wait for a separate security review.

Anti-pattern: “It builds on Jenkins because Jenkins has Java 11 installed, but the Dockerfile uses Java 17.” The build must declare and control its own runtime.

3. Build Caching

Fast builds keep developers in flow. Caching is the primary mechanism for build speed.

What to cache:

  • Dependencies: Download once, reuse across builds. Most build tools (npm, Maven, Gradle, pip) support a local cache.
  • Compilation outputs: Incremental compilation avoids rebuilding unchanged modules.
  • Docker layers: Structure your Dockerfile so that rarely-changing layers (OS, dependencies) are cached and only the application code layer is rebuilt.
  • Test fixtures: Prebuilt test data or container images used by tests.

Guidelines:

  • Cache aggressively for local development and CI
  • Invalidate caches when dependencies or build configuration change
  • Never cache test results. Tests must always run

4. Single Build Script Entry Point

Developers, CI, and CD should all use the same entry point.

Makefile as single build entry point
# Example: Makefile as the single entry point

.PHONY: build test package all

all: build test package

build:
	./gradlew compileJava

test:
	./gradlew test

package:
	docker build -t myapp:$(GIT_SHA) .

clean:
	./gradlew clean
	docker rmi myapp:$(GIT_SHA) || true

The CI server runs make all. A developer runs make all. The result is the same. There is no separate “CI build script” that diverges from what developers run locally.

5. Artifact Versioning

Every build artifact must be traceable to the exact commit that produced it.

Practices:

  • Tag artifacts with the Git commit SHA or a build number derived from it
  • Store build metadata (commit, branch, timestamp, builder) in the artifact or alongside it
  • Never overwrite an existing artifact. If the version exists, the artifact is immutable

This becomes critical in Phase 2 when you establish immutable artifact practices.

CI Server Setup Basics

The CI server is the mechanism that runs your build automatically.

What the CI Server Does

  1. Watches the trunk for new commits
  2. Runs the build (the same command a developer would run locally)
  3. Reports the result (pass/fail, test results, build duration)
  4. Notifies the team if the build fails

Minimum CI Configuration

Regardless of which CI tool you use (GitHub Actions, GitLab CI, Jenkins, CircleCI), the configuration follows the same pattern:

Conceptual minimum CI configuration
# Conceptual CI configuration (adapt to your tool)
trigger:
  branch: main  # Run on every commit to trunk

steps:
  - checkout: source code
  - install: dependencies
  - run: build
  - run: tests
  - run: package
  - report: test results and build status

CI Principles for Phase 1

  • Run on every commit. Not nightly, not weekly, not “when someone remembers.” Every commit to trunk triggers a build.
  • Treat a failing build as the team’s top priority. Stop work until trunk is green again. (See Working Agreements.)
  • Run the same build everywhere. Use the same script in CI and local development. No CI-only steps that developers cannot reproduce.
  • Fail fast. Run the fastest checks first (compilation, unit tests) before the slower ones (integration tests, packaging).

Build Time Targets

Build speed directly affects developer productivity and integration frequency. If the build takes 30 minutes, developers will not integrate multiple times per day.

Build PhaseTargetRationale
Compilation< 1 minuteDevelopers need instant feedback on syntax and type errors
Unit tests< 3 minutesFast enough to run before every commit
Integration tests< 5 minutesMust complete before the developer context-switches
Full build (compile + test + package)< 10 minutesThe outer bound for fast feedback

If Your Build Is Too Slow

Slow builds are a common constraint that blocks CD adoption. Address them systematically:

  1. Profile the build. Identify which steps take the most time. Optimize the bottleneck, not everything.
  2. Parallelize tests. Most test frameworks support parallel execution. Run independent test suites concurrently.
  3. Use build caching. Avoid recompiling or re-downloading unchanged dependencies.
  4. Split the build. Run fast checks (lint, compile, unit tests) as a “fast feedback” stage. Run slower checks (integration tests, security scans) as a second stage.
  5. Upgrade build hardware. Sometimes the fastest optimization is more CPU and RAM.

Common Anti-Patterns

Anti-patternImpactFix
Manual build stepsError-prone, slow, and impossible to parallelize or cache.Script every step so no human intervention is required.
Environment-specific buildsYou are not testing the same artifact you deploy, making production bugs impossible to diagnose.Build one artifact and configure it per environment at deployment time. (See Application Config.)
Build scripts that only run in CIDevelopers cannot reproduce CI failures locally, leading to slow debugging cycles.Use a single build entry point that both CI and developers use.
Missing dependency pinningThe build is non-deterministic; the same code can produce different results on different days.Use lock files and pin all dependency versions.
Long build queuesDelayed feedback defeats the purpose of CI because developers context-switch before seeing results.Ensure CI infrastructure can handle your commit frequency with parallel build agents.

Measuring Success

MetricTargetWhy It Matters
Build duration< 10 minutesEnables fast feedback and frequent integration
Build success rate> 95%Indicates reliable, reproducible builds
Time from commit to build result< 15 minutes (including queue time)Measures the full feedback loop
Developer ability to build locally100% of teamConfirms the build is portable and documented

Next Step

With build automation in place, you can build, test, and package your application reliably. The next foundation is ensuring that the work you integrate daily is small enough to be safe. Continue to Work Decomposition.


4 - Work Decomposition

Break features into small, deliverable increments that can be completed in 2 days or less.

Phase 1 - Foundations

Trunk-based development requires daily integration, and daily integration requires small work. This page covers the techniques for breaking work into small, deliverable increments that flow through your pipeline continuously.

Why Small Work Matters for CD

Continuous delivery depends on a core principle: small changes, integrated frequently, are safer than large changes integrated rarely.

Every practice in Phase 1 reinforces this:

  • Trunk-based development requires that you integrate at least daily. You cannot integrate a two-week feature daily unless you decompose it.
  • Testing fundamentals work best when each change is small enough to test thoroughly.
  • Code review is fast when the change is small. A 50-line change can be reviewed in minutes. A 2,000-line change takes hours - if it gets reviewed at all.

The DORA research consistently shows that smaller batch sizes correlate with higher delivery performance. Small changes have:

  • Lower risk: If a small change breaks something, the blast radius is limited, and the cause is obvious.
  • Faster feedback: A small change gets through the pipeline quickly. You learn whether it works today, not next week.
  • Easier rollback: Rolling back a 50-line change is straightforward. Rolling back a 2,000-line change often requires a new deployment.
  • Better flow: Small work items move through the system predictably. Large work items block queues and create bottlenecks.

The 2-Day Rule

If a work item takes longer than 2 days to complete, it is too big.

Two days gives you at least one integration to trunk per day (the minimum for TBD) and allows for the natural rhythm of development: plan, implement, test, integrate, move on.

When a developer says “this will take a week,” the answer is not “go faster.” The answer is “break it into smaller pieces.”

What “Complete” Means

A work item is complete when it is:

If a story requires a feature flag to hide incomplete user-facing behavior, that is fine. The code is still integrated, tested, and deployable.

Story Slicing Techniques

The INVEST Criteria

Good stories follow INVEST:

CriterionMeaningWhy It Matters for CD
IndependentCan be developed and deployed without waiting for other storiesEnables parallel work
NegotiableDetails can be discussed and adjustedHelps find the smallest valuable slice
ValuableDelivers something meaningful to the user or the systemPrevents technical stories that stall the product
EstimableSmall enough that the team can reasonably estimate itLarge stories hide unknowns
SmallCompletable within 2 daysEnables daily integration
TestableHas clear acceptance criteria that can be automatedSupports the testing foundation

Vertical Slicing

The most important slicing technique for CD is vertical slicing: cutting through all layers of the application to deliver a thin but complete slice of functionality.

Vertical slice (correct):

“As a user, I can log in with my email and password.”

This slice touches the UI (login form), the API (authentication endpoint), and the database (user lookup). It is deployable and testable end-to-end.

Horizontal slice (anti-pattern):

“Build the database schema for user accounts.” “Build the authentication API.” “Build the login form UI.”

Each horizontal slice is incomplete on its own. None is deployable. None is testable end-to-end. They create dependencies between work items and block flow.

Vertical slicing in distributed systems

Not every team owns the full stack from UI to database. A subdomain product team may own a service whose consumers are other services, not humans. The principle still applies: a vertical slice cuts through all layers your team owns and delivers complete, observable behavior through your team’s public interface.

Does this change deliver complete behavior through the interface your team owns? For a full-stack product team, that interface is a UI. For a subdomain team, it is an API contract. If the change only touches one layer beneath that interface, it is a horizontal slice regardless of how you label it.

See Horizontal Slicing for how layer-by-layer splitting fails in distributed systems.

Slicing Strategies

When a story feels too big, apply one of these strategies:

StrategyHow It WorksExample
By workflow stepImplement one step of a multi-step process“User can add items to cart” (before “user can checkout”)
By business ruleImplement one rule at a time“Orders over $100 get free shipping” (before “orders ship to international addresses”)
By data variationHandle one data type first“Support credit card payments” (before “support PayPal”)
By operationImplement CRUD operations separately“Create a new customer” (before “edit customer” or “delete customer”)
By performanceGet it working first, optimize later“Search returns results” (before “search returns results in under 200ms”)
By platformSupport one platform first“Works on desktop web” (before “works on mobile”)
Happy path firstImplement the success case first“User completes checkout” (before “user sees error when payment fails”)

Example: Decomposing a Feature

Original story (too big):

“As a user, I can manage my profile including name, email, avatar, password, notification preferences, and two-factor authentication.”

Decomposed into vertical slices:

  1. “User can view their current profile information” (read-only display)
  2. “User can update their name” (simplest edit)
  3. “User can update their email with verification” (adds email flow)
  4. “User can upload an avatar image” (adds file handling)
  5. “User can change their password” (adds security validation)
  6. “User can configure notification preferences” (adds preferences)
  7. “User can enable two-factor authentication” (adds 2FA flow)

Each slice is independently deployable, testable, and completable within 2 days.

Use BDD scenarios to find slice boundaries

BDD scenarios are the most reliable way to find slice boundaries. Each Given-When-Then scenario becomes a candidate work item with clear scope and testable acceptance criteria. A brief “Three Amigos” conversation (business, development, testing perspectives) before work begins surfaces these scenarios naturally.

Given-When-Then: user login scenarios
Feature: User login

  Scenario: Successful login with valid credentials
    Given a registered user with email "user@example.com"
    When they enter their correct password and click "Log in"
    Then they are redirected to the dashboard

  Scenario: Failed login with wrong password
    Given a registered user with email "user@example.com"
    When they enter an incorrect password and click "Log in"
    Then they see the message "Invalid email or password"
    And they remain on the login page

Each scenario is a natural unit of work. Implement one scenario at a time, integrate to trunk after each one.

Task Decomposition Within Stories

Even well-sliced stories may contain multiple tasks. Decompose stories into tasks that can be completed and integrated independently.

Example story: “User can update their name”

Tasks:

  1. Display the current name on the profile page (read-only, end-to-end through UI and API, integration test)
  2. Add an editable name field that saves successfully (UI, API, and persistence in one pass, E2E test)
  3. Show a validation error when the name is blank (adds one business rule across all layers, unit and E2E test)

Each task delivers a thin vertical slice of behavior and results in a commit to trunk. The story is completed through a series of small integrations, not one large merge.

Guidelines for task decomposition:

  • Each task should take hours, not days
  • Each task should leave trunk in a working state after integration
  • Tasks should be ordered so that the simplest changes come first
  • If a task requires a feature flag or stub to be integrated safely, that is fine

Common Anti-Patterns

  • Horizontal Slicing: Stories organized by layer (“build the schema,” “build the API,” “build the UI”). No individual slice is deployable.
  • Monolithic Work Items: Stories with 10+ acceptance criteria or multi-week estimates. Break them into smaller stories using the slicing strategies above.
  • Technical stories without business context: Backlog items like “refactor the database access layer” that do not tie to a business outcome. Embed technical improvements in feature stories and keep them under 2 days.
  • Splitting by role instead of by behavior: Separate stories for “frontend developer builds the UI” and “backend developer builds the API” create handoff dependencies and delay integration. Write stories from the user’s perspective so the same developer (or pair) implements the full vertical slice.
  • Deferring edge cases indefinitely: Building the happy path and creating a backlog of “handle error case X” stories that never get prioritized. Error handling is not optional. Include the most important error cases in the initial decomposition and schedule them immediately after the happy path, not “someday.”

Measuring Success

MetricTargetWhy It Matters
Story cycle time< 2 days from start to trunkConfirms stories are small enough
Development cycle timeDecreasingShows improved flow from smaller work
Stories completed per weekIncreasing (with same team size)Indicates better decomposition and less rework
Work in progressDecreasingFewer large stories blocking the pipeline

Next Step

Continue to Code Review to learn how to keep review fast and effective without becoming a bottleneck.


5 - Code Review

Streamline code review to provide fast feedback without blocking flow.

Phase 1 - Foundations

Code review is essential for quality, but it is also the most common bottleneck in teams adopting trunk-based development. If reviews take days, daily integration is impossible. This page covers review techniques that maintain quality while enabling the flow that CD requires.

Why Code Review Matters for CD

Automated tools catch syntax errors, style violations, and known vulnerability patterns. Code review exists for the things automation cannot evaluate.

  • Cognitive load and maintainability: Tools can count complexity points, but they cannot judge whether the logic is intuitive. A human reviewer catches over-engineered abstractions and code that will confuse a teammate maintaining it at 3:00 AM.
  • Systemic context: Static analysis sees the code but does not remember the past. A peer reviewer remembers that Service X handles retries poorly and can spot an implementation that is technically correct but will trigger a known systemic weakness. Reviewers also verify that the solution aligns with the platform’s long-term architectural direction.
  • Knowledge distribution: If the author is the only person who understands a critical path, the team is at risk. Review ensures at least one other person shares that context. It is also the primary mechanism for cross-pollinating new patterns and domain knowledge across the team.
  • Novel security and logic bypasses: Automation catches known patterns like SQL injection. It often misses logical security flaws - for example, a change to a discount calculation that accidentally allows a negative total. Human reviewers also verify that the developer did not take a dangerous shortcut that bypasses a policy not yet codified in the pipeline.

These are real benefits. The challenge is that traditional code review - open a pull request, wait for someone to review it, address comments, wait again - is too slow for CD.

In a CD workflow, code review must happen within minutes or hours, not days. The review is still rigorous, but the process is designed for speed.

The Core Tension: Quality vs. Flow

Traditional teams optimize review for thoroughness: detailed comments, multiple reviewers, extensive back-and-forth. This produces high-quality reviews but blocks flow.

CD teams optimize review for speed without sacrificing the quality that matters. The key insight is that most of the quality benefit of code review comes from small, focused reviews done quickly, not from exhaustive reviews done slowly.

Traditional ReviewCD-Compatible Review
Review happens after the feature is completeReview happens continuously throughout development
Large diffs (hundreds or thousands of lines)Small diffs (< 200 lines, ideally < 50)
Multiple rounds of feedback and revisionOne round, or real-time feedback during pairing
Review takes 1-3 daysReview takes minutes to a few hours
Review is asynchronous by defaultReview is synchronous by preference
2+ reviewers required1 reviewer (or pairing as the review)

Synchronous vs. Asynchronous Review

Synchronous Review (Preferred for CD)

In synchronous review, the reviewer and author are engaged at the same time. Feedback is immediate. Questions are answered in real time. The review is done when the conversation ends.

Methods:

  • Pair programming: Two developers work on the same code at the same time. Review is continuous. There is no separate review step because the code was reviewed as it was written.
  • Mob programming: The entire team (or a subset) works on the same code together. Everyone reviews in real time.
  • Over-the-shoulder review: The author walks the reviewer through the change in person or on a video call. The reviewer asks questions and provides feedback immediately.

Advantages for CD:

  • Zero wait time between “ready for review” and “review complete”
  • Higher bandwidth communication (tone, context, visual cues) catches more issues
  • Immediate resolution of questions - no async back-and-forth
  • Knowledge transfer happens naturally through the shared work

Asynchronous Review (When Necessary)

Sometimes synchronous review is not possible - time zones, schedules, or team preferences may require asynchronous review. This is fine, but it must be fast.

Rules for async review in a CD workflow:

  • Review within 2 hours. If a pull request sits for a day, it blocks integration. Set a team working agreement: “pull requests are reviewed within 2 hours during working hours.”
  • Keep changes small. A 50-line change can be reviewed in 5 minutes. A 500-line change takes an hour and reviewers procrastinate on it.
  • Use draft PRs for early feedback. If you want feedback on an approach before the code is complete, open a draft PR. Do not wait until the change is “perfect.”
  • Avoid back-and-forth. If a comment requires discussion, move to a synchronous channel (call, chat). Async comment threads that go 5 rounds deep are a sign the change is too large or the design was not discussed upfront.

Review Techniques Compatible with TBD

Pair Programming as Review

When two developers pair on a change, the code is reviewed as it is written. There is no separate review step, no pull request waiting for approval, and no delay to integration.

How it works with TBD:

  1. Two developers sit together (physically or via screen share)
  2. They discuss the approach, write the code, and review each other’s decisions in real time
  3. When the change is ready, they commit to trunk together
  4. Both developers are accountable for the quality of the code

When to pair:

  • New or unfamiliar areas of the codebase
  • Changes that affect critical paths
  • When a junior developer is working on a change (pairing doubles as mentoring)
  • Any time the change involves design decisions that benefit from discussion

Pair programming satisfies most organizations’ code review requirements because two developers have actively reviewed and approved the code.

Mob Programming as Review

Mob programming extends pairing to the whole team. One person drives (types), one person navigates (directs), and the rest observe and contribute.

When to mob:

  • Establishing new patterns or architectural decisions
  • Complex changes that benefit from multiple perspectives
  • Onboarding new team members to the codebase
  • Working through particularly difficult problems

Mob programming is intensive but highly effective. Every team member understands the code, the design decisions, and the trade-offs.

Rapid Async Review

For teams that use pull requests, rapid async review adapts the pull request workflow for CD speed.

Practices:

  • Auto-assign reviewers. Do not wait for someone to volunteer. Use tools to automatically assign a reviewer when a PR is opened.
  • Keep PRs small. Target < 200 lines of changed code. Smaller PRs get reviewed faster and more thoroughly.
  • Provide context. Write a clear PR description that explains what the change does, why it is needed, and how to verify it. A good description reduces review time dramatically.
  • Use automated checks. Run linting, formatting, and tests before the human review. The reviewer should focus on logic and design, not style.
  • Approve and merge quickly. If the change looks correct, approve it. Do not hold it for nitpicks. Nitpicks can be addressed in a follow-up commit.

What to Review

Not everything in a code change deserves the same level of scrutiny. Focus reviewer attention where it matters most.

High Priority (Reviewer Should Focus Here)

  • Behavior correctness: Does the code do what it is supposed to do? Are edge cases handled?
  • Security: Does the change introduce vulnerabilities? Are inputs validated? Are secrets handled properly?
  • Clarity: Can another developer understand this code in 6 months? Are names clear? Is the logic straightforward?
  • Test coverage: Are the new behaviors tested? Do the tests verify the right things?
  • API contracts: Do changes to public interfaces maintain backward compatibility? Are they documented?
  • Error handling: What happens when things go wrong? Are errors caught, logged, and surfaced appropriately?

Low Priority (Automate Instead of Reviewing)

  • Code style and formatting: Use automated formatters (Prettier, Black, gofmt). Do not waste reviewer time on indentation and bracket placement.
  • Import ordering: Automate with linting rules.
  • Naming conventions: Enforce with lint rules where possible. Only flag naming in review if it genuinely harms readability.
  • Unused variables or imports: Static analysis tools catch these instantly.
  • Consistent patterns: Where possible, encode patterns in architecture decision records and lint rules rather than relying on reviewers to catch deviations.

Rule of thumb: If a style or convention issue can be caught by a machine, do not ask a human to catch it. Reserve human attention for the things machines cannot evaluate: correctness, design, clarity, and security.

Review Scope for Small Changes

In a CD workflow, most changes are small - tens of lines, not hundreds. This changes the economics of review.

Change SizeExpected Review TimeReview Depth
< 20 lines2-5 minutesQuick scan: is it correct? Any security issues?
20-100 lines5-15 minutesFull review: behavior, tests, clarity
100-200 lines15-30 minutesDetailed review: design, contracts, edge cases
> 200 linesConsider splitting the changeLarge changes get superficial reviews

Research consistently shows that reviewer effectiveness drops sharply after 200-400 lines. If you are regularly reviewing changes larger than 200 lines, the problem is not the review process - it is the work decomposition.

Working Agreements for Review SLAs

Establish clear team agreements about review expectations. Without explicit agreements, review latency will drift based on individual habits.

AgreementTarget
Response timeReview within 2 hours during working hours
Reviewer count1 reviewer (or pairing as the review)
PR size< 200 lines of changed code
Blocking issues onlyOnly block a merge for correctness, security, or significant design issues
NitpicksUse a “nit:” prefix. Nitpicks are suggestions, not merge blockers
Stale PRsPRs open for > 24 hours are escalated to the team
Self-reviewAuthor reviews their own diff before requesting review

How to Enforce Review SLAs

  • Track review turnaround time. If it consistently exceeds 2 hours, discuss it in retrospectives.
  • Make review a first-class responsibility, not something developers do “when they have time.”
  • If a reviewer is unavailable, any other team member can review. Do not create single-reviewer dependencies.
  • Consider pairing as the default and async review as the exception. This eliminates the review bottleneck entirely.

Code Review and Trunk-Based Development

Code review and TBD work together, but only if review does not block integration. Here is how to reconcile them:

TBD RequirementHow Review Adapts
Integrate to trunk at least dailyReviews must complete within hours, not days
Branches live < 24 hoursPRs are opened and merged within the same day
Trunk is always releasableReviewers focus on correctness, not perfection
Small, frequent changesSmall changes are reviewed quickly and thoroughly

If your team finds that review is the bottleneck preventing daily integration, the most effective solution is to adopt pair programming. It eliminates the review step entirely by making review continuous.

Measuring Success

MetricTargetWhy It Matters
Review turnaround time< 2 hoursPrevents review from blocking integration
PR size (lines changed)< 200 linesSmaller PRs get faster, more thorough reviews
PR age at merge< 24 hoursAligns with TBD branch age constraint
Review rework cycles< 2 roundsMultiple rounds indicate the change is too large or design was not discussed upfront

Next Step

Code review practices need to be codified in team agreements alongside other shared commitments. Continue to Working Agreements to establish your team’s definitions of done, ready, and CI practice.


6 - Working Agreements

Establish shared definitions of done and ready to align the team on quality and process.

Phase 1 - Foundations

The practices in Phase 1 (trunk-based development, testing, small work, and fast review) only work when the whole team commits to them. Working agreements make that commitment explicit. This page covers the key agreements a team needs before moving to pipeline automation in Phase 2.

Why Working Agreements Matter

A working agreement is a shared commitment that the team creates, owns, and enforces together. No one imposes it from outside. The team answers one question for itself: “How do we work together?”

Without working agreements, CD practices drift. One developer integrates daily; another keeps a branch for a week. One developer fixes a broken build immediately; another waits until after lunch. These inconsistencies compound. Within weeks, the team is no longer practicing CD. They are practicing individual preferences.

Working agreements prevent this drift by making expectations explicit. When everyone agrees on what “done” means, what “ready” means, and how CI works, the team can hold each other accountable without conflict.

Definition of Done

The Definition of Done (DoD) is the team’s shared standard for when a work item is complete. For CD, done means delivered to the end user.

Minimum Definition of Done for CD

A work item is done when all of the following are true:

  • Code is integrated to trunk
  • All automated tests pass
  • Code has been reviewed (via pairing, mob, or pull request)
  • The change is delivered to the end user (or deployable to production at any time)
  • No known defects are introduced
  • Relevant documentation is updated (API docs, runbooks, etc.)
  • Feature flags are in place for incomplete user-facing features

Why “Delivered to the End User” Matters

Many teams define “done” as “code is merged.” This creates a gap between “done” and “delivered.” Work accumulates in a staging environment, waiting for a release. Risk grows with each unreleased change.

In a CD organization, “done” means the change has reached the end user (or is ready to reach them at any time). This is the ultimate test of completeness: the change works in the real environment, with real data, under real load.

In Phase 1, you may not yet have the pipeline to deliver every change automatically. That is fine. Your DoD should still include “delivered to the end user” as the standard, even if the delivery step is not yet automated. The pipeline work in Phase 2 will close that gap.

Extending Your Definition of Done

As your CD maturity grows, extend the DoD:

PhaseAddition to DoD
Phase 1 (Foundations)Code integrated to trunk, tests pass, reviewed, deployable
Phase 2 (Pipeline)Artifact built and validated by the pipeline
Phase 3 (Optimize)Change delivered to users behind a feature flag
Phase 4 (Deliver on Demand)Change delivered to users and monitored

Definition of Ready

The Definition of Ready (DoR) answers: “When is a work item ready to be worked on?”

Pulling unready work into development creates waste. Unclear requirements lead to rework. Missing acceptance criteria lead to untestable changes. Oversized stories lead to long-lived branches.

Minimum Definition of Ready for CD

A work item is ready when all of the following are true:

  • Acceptance criteria are defined and specific (using Given-When-Then or equivalent)
  • The work item is small enough to complete in 2 days or less
  • The work item is testable (the team knows how to verify it works)
  • Dependencies are identified and resolved (or the work item is independent)
  • The team has discussed the work item (Three Amigos or equivalent)
  • The work item is estimated (or the team has agreed estimation is unnecessary for items this small)

Common Mistakes with Definition of Ready

  • Making it too rigid. The DoR is a guideline, not a gate. If the team agrees a work item is understood well enough, it is ready. Do not use the DoR to avoid starting work.
  • Requiring design documents. For small work items (< 2 days), a conversation and acceptance criteria are sufficient. Formal design documents are for larger initiatives.
  • Skipping the conversation. The DoR is most valuable as a prompt for discussion, not as a checklist. The Three Amigos conversation matters more than the checkboxes.

CI Working Agreement

The CI working agreement codifies how the team practices continuous integration. Every other agreement depends on a working CI process, making this the foundation the rest builds on.

The CI Agreement

The team agrees to the following practices:

Integration:

  • Every developer integrates to trunk at least once per day
  • Branches (if used) live for less than 24 hours
  • No long-lived feature, development, or release branches

Build:

  • All tests must pass before merging to trunk
  • The build runs on every commit to trunk
  • Build results are visible to the entire team

Broken builds:

  • A broken build is the team’s top priority. It is fixed before any new work begins
  • The developer(s) who broke the build are responsible for fixing it immediately
  • If the fix will take more than 10 minutes, revert the change and fix it offline
  • No one commits to a broken trunk (except to fix the break)

Work in progress:

  • Finishing existing work takes priority over starting new work
  • The team limits work in progress to maintain flow
  • If a developer is blocked, they help a teammate before starting a new story

Why “Broken Build = Top Priority”

This is the single most important CI agreement. When the build is broken:

  • No one can integrate safely. Changes are stacking up.
  • Trunk is not releasable. The team has lost its safety net.
  • Every minute the build stays broken, the team accumulates risk.

“Fix the build” is not a suggestion. It is an agreement that the team enforces collectively. If the build is broken and someone starts a new feature instead of fixing it, the team should call that out. This is not punitive. It is the team protecting its own ability to deliver.

Stop the Line: Why All Work Stops

Some teams interpret “fix the build” as “stop merging until it is green.” That is not enough. When the build is red, all feature work stops, not just merges. Every developer on the team shifts attention to restoring green.

This sounds extreme, but the reasoning is straightforward:

  • Work closer to production is more valuable than work further away. A broken trunk means nothing in progress can ship. Fixing the build is the highest-leverage activity anyone on the team can do.
  • Continuing feature work creates a false sense of progress. Code written against a broken trunk is untested against the real baseline. It may compile, but it has not been validated. That is not progress. It is inventory.
  • The team mindset matters more than the individual fix. When everyone stops, the message is clear: the build belongs to the whole team, not just the person who broke it. This shared ownership is what separates teams that practice CI from teams that merely have a CI server.

Two Timelines: Stop vs. Do Not Stop

Consider two teams that encounter the same broken build at 10:00 AM.

Team A stops all feature work:

  • 10:00 - Build breaks. The team sees the alert and stops.
  • 10:05 - Two developers pair on the fix while a third reviews the failing test.
  • 10:20 - Fix is pushed. Build goes green.
  • 10:25 - The team resumes feature work. Total disruption: roughly 30 minutes.

Team B treats it as one person’s problem:

  • 10:00 - Build breaks. The developer who caused it starts investigating alone.
  • 10:30 - Other developers commit new changes on top of the broken trunk. Some changes conflict with the fix in progress.
  • 11:30 - The original developer’s fix does not work because the codebase has shifted underneath them.
  • 14:00 - After multiple failed attempts, the team reverts three commits (the original break plus two that depended on the broken state).
  • 15:00 - Trunk is finally green. The team has lost most of the day, and three developers need to redo work. Total disruption: 5+ hours.

The team that stops immediately pays a small, predictable cost. The team that does not stop pays a large, unpredictable one.

The Revert Rule

If a broken build cannot be fixed within 10 minutes, revert the offending commit and fix the issue on a branch. This keeps trunk green and unblocks the rest of the team. The developer who made the change is not being punished. They are protecting the team’s flow.

Reverting feels uncomfortable at first. Teams worry about “losing work.” But a reverted commit is not lost. The code is still in the Git history. The developer can re-apply their change after fixing the issue. The alternative, a broken trunk for hours while someone debugs, is far more costly.

When to Forward Fix vs. Revert

Not every broken build requires a revert. If the developer who broke it can identify the cause quickly, a forward fix is faster and simpler. The key is a strict time limit:

  1. Start a 15-minute timer the moment the build goes red.
  2. If the developer has a fix ready and pushed within 15 minutes, ship the forward fix.
  3. If the timer expires and the fix is not in trunk, revert immediately. No extensions, no “I’m almost done.”

The timer prevents the most common failure mode: a developer who is “five minutes away” from a fix for an hour. After 15 minutes without a fix, the probability of a quick resolution drops sharply, and the cost to the rest of the team climbs. Revert, restore green, and fix the problem offline without time pressure.

Common Objections to Stop-the-Line

Teams adopting stop-the-line discipline encounter predictable pushback. These responses can help.

ObjectionResponse
“We can’t afford to stop. We have a deadline.”Stopping for 20 minutes now prevents losing half a day later. The fastest path to your deadline runs through a green build.
“Stopping kills our velocity.”Velocity built on a broken trunk is an illusion. Those story points will come back as rework or production incidents.
“We already stop all the time. It’s not working.”Frequent stops mean the team is merging changes that break the build too often. Fix that root cause with better pre-merge testing and smaller commits.
“It’s a known flaky test. We can ignore it.”Ignoring a flaky test trains the team to ignore all red builds. Fix it or remove it.
“Management won’t support stopping feature work.”Show the two-timeline comparison above. Teams that stop immediately have shorter lead times and less unplanned rework.

How Working Agreements Support the CD Migration

Each working agreement maps directly to a Phase 1 practice:

PracticeSupporting Agreement
Trunk-based developmentCI agreement: daily integration, branch age < 24h
Testing fundamentalsDoD: all tests pass. CI: tests pass before merge
Build automationCI: build runs on every commit. Broken build = top priority
Work decompositionDoR: work items < 2 days. WIP limits
Code reviewCI: review within 2 hours. DoD: code reviewed

Template: Create Your Own Working Agreements

Use this template as a starting point. Customize it for your team’s context.

Team Working Agreement Template

Team Working Agreement Template
# [Team Name] Working Agreement
Date: [Date]
Participants: [All team members]

## Definition of Done
A work item is done when:
- [ ] Code is integrated to trunk
- [ ] All automated tests pass
- [ ] Code has been reviewed (method: [pair / mob / PR])
- [ ] The change is delivered to the end user (or deployable at any time)
- [ ] No known defects are introduced
- [ ] [Add team-specific criteria]

## Definition of Ready
A work item is ready when:
- [ ] Acceptance criteria are defined (Given-When-Then)
- [ ] The item can be completed in [X] days or less
- [ ] The item is testable
- [ ] Dependencies are identified
- [ ] The team has discussed the item
- [ ] [Add team-specific criteria]

## CI Practices
- Integration frequency: at least [X] per developer per day
- Maximum branch age: [X] hours
- Review turnaround: within [X] hours
- Broken build response: fix within [X] minutes or revert
- WIP limit: [X] items per developer

## Review Practices
- Default review method: [pair / mob / async PR]
- PR size limit: [X] lines
- Review focus: [correctness, security, clarity]
- Style enforcement: [automated via linting]

## Meeting Cadence
- Standup: [time, frequency]
- Retrospective: [frequency]
- Working agreement review: [frequency, e.g., monthly]

## Agreement Review
This agreement is reviewed and updated [monthly / quarterly].
Any team member can propose changes at any time.
All changes require team consensus.

Tips for Creating Working Agreements

  1. Include everyone. Every team member should participate in creating the agreement. Agreements imposed by a manager or tech lead are policies, not agreements.
  2. Start simple. Do not try to cover every scenario. Start with the essentials (DoD, DoR, CI) and add specifics as the team identifies gaps.
  3. Make them visible. Post the agreements where the team sees them daily: on a team wiki, in the team channel, or on a physical board.
  4. Review regularly. Agreements should evolve as the team matures. Review them monthly. Remove agreements that are second nature. Add agreements for new challenges.
  5. Enforce collectively. Working agreements are only effective if the team holds each other accountable. This is a team responsibility, not a manager responsibility.
  6. Start with agreements you can keep. If the team is currently integrating once a week, do not agree to integrate three times daily. Agree to integrate daily, practice for a month, then tighten.

Measuring Success

MetricTargetWhy It Matters
Agreement adherenceTeam self-reports > 80% adherenceIndicates agreements are realistic and followed
Agreement review frequencyMonthlyEnsures agreements stay relevant
Integration frequencyMeets CI agreement targetValidates the CI working agreement
Broken build fix timeMeets CI agreement targetValidates the broken build response agreement

Next Step

With working agreements in place, your team has established the foundations for continuous delivery: daily integration, reliable testing, automated builds, small work, fast review, and shared commitments.

You are ready to move to Phase 2: Pipeline, where you will build the automated path from commit to production.


7 - Everything as Code

Every artifact that defines your system (infrastructure, pipelines, configuration, database schemas, monitoring) belongs in version control and is delivered through pipelines.

Phase 1 - Foundations

If it is not in version control, it does not exist. If it is not delivered through a pipeline, it is a manual step. Manual steps block continuous delivery. This page establishes the principle that everything required to build, deploy, and operate your system is defined as code, version controlled, reviewed, and delivered through the same automated pipelines as your application.

One process for every change

When something is defined as code:

  • It is version controlled. You can see who changed what, when, and why. You can revert any change. You can trace any production state to a specific commit.
  • It is reviewed. Changes go through the same review process as application code. A second pair of eyes catches mistakes before they reach production.
  • It is tested. Automated validation catches errors before deployment. Linting, dry-runs, and policy checks apply to infrastructure the same way unit tests apply to application code.
  • It is reproducible. You can recreate any environment from scratch. Disaster recovery is “re-run the pipeline,” not “find the person who knows how to configure the server.”
  • It is delivered through a pipeline. No SSH, no clicking through UIs, no manual steps. The pipeline is the only path to production for everything, not just application code.

When something is not defined as code, it is a liability. It cannot be reviewed, tested, or reproduced. It exists only in someone’s head, a wiki page that is already outdated, or a configuration that was applied manually and has drifted from any documented state.

What belongs in version control

Application code

Application code in version control is the baseline. If your team is not there yet, start here before reading further.

Infrastructure

Every server, network, database instance, load balancer, DNS record, and cloud resource should be defined in code and provisioned through automation.

What this looks like:

  • Cloud resources defined in Terraform, Pulumi, CloudFormation, or similar tools
  • Server configuration managed by Ansible, Chef, Puppet, or container images
  • Network topology, firewall rules, and security groups defined declaratively
  • Environment creation is a pipeline run, not a ticket to another team

What this replaces:

  • Clicking through cloud provider consoles to create resources
  • SSH-ing into servers to install packages or change configuration
  • Filing tickets for another team to provision an environment
  • “Snowflake” servers that were configured by hand and nobody knows how to recreate

Why it matters for CD: If creating or modifying an environment requires manual steps, your deployment frequency is limited by the availability and speed of the person who performs those steps. If a production server fails and you cannot recreate it from code, your mean time to recovery is measured in hours or days instead of minutes.

Pipeline definitions

Pipeline configuration (.github/workflows/, .gitlab-ci.yml, Jenkinsfile, or equivalent) belongs in the same repository as the code it builds. When pipeline changes go through the same review and automation as application code, teams can modify their own delivery process without tickets or UI-only bottlenecks.

Database schemas and migrations

Database schema changes should be defined as versioned migration scripts, stored in version control, and applied through the pipeline.

What this looks like:

  • Migration scripts in the repository (using tools like Flyway, Liquibase, Alembic, or ActiveRecord migrations)
  • Every schema change is a numbered, ordered migration that can be applied and rolled back
  • Migrations run as part of the deployment pipeline, not as a manual step
  • Schema changes follow the expand-then-contract pattern: add the new column, deploy code that uses it, then remove the old column in a later migration

What this replaces:

  • A DBA manually applying SQL scripts during a maintenance window
  • Schema changes that are “just done in production” and not tracked anywhere
  • Database state that has drifted from what is defined in any migration script

Why it matters for CD: Database changes are one of the most common reasons teams cannot deploy continuously. If schema changes require manual intervention, coordinated downtime, or a separate approval process, they become a bottleneck that forces batching. Treating schemas as code with automated migrations removes this bottleneck.

Application configuration

Environment-specific values (connection strings, API endpoints, feature flag states, logging levels) should live in a config management system and flow through a pipeline so the same artifact is deployed to every environment. When configuration is committed and reviewed like code, you eliminate drift between environments and “works in staging” surprises. See Application Config for detailed guidance.

Monitoring, alerting, and observability

Dashboards, alert rules, SLO definitions, and logging configuration should be defined as code (Terraform, Prometheus rules, Datadog monitors-as-code, or equivalent). When you deploy frequently, you need to know instantly whether each deployment is healthy. Monitoring defined as code ensures every service has consistent, reviewed, reproducible observability instead of hand-built dashboards and undocumented alert rules.

Security policies

Security controls (access policies, network rules, secret rotation schedules, compliance checks) should be defined as code and enforced automatically.

What this looks like:

  • IAM policies and RBAC rules defined in Terraform or policy-as-code tools (OPA, Sentinel)
  • Security scanning integrated into the pipeline (SAST, dependency scanning, container image scanning)
  • Secret rotation automated and defined in code
  • Compliance checks that run on every commit, not once a quarter

What this replaces:

  • Security reviews that happen at the end of the development cycle
  • Access policies configured through UIs and never audited
  • Compliance as a manual checklist performed before each release

Why it matters for CD: Security and compliance requirements are the most common organizational blockers for CD. When security controls are defined as code and enforced by the pipeline, you can prove to auditors that every change passed security checks automatically. This is stronger evidence than a manual review, and it does not slow down delivery.

The “One Change, One Process” Test

For every type of artifact in your system, ask:

If I need to change this, do I commit a code change and let the pipeline deliver it?

If the answer is yes, the artifact is managed as code. If the answer involves SSH, a UI, a ticket to another team, or a manual step, it is not.

ArtifactManaged as code?If not, the risk is…
Application source codeUsually yes-
Infrastructure (servers, networks, cloud resources)Often noSnowflake environments, slow provisioning, unreproducible disasters
Pipeline definitionsSometimesPipeline changes are slow, unreviewed, and risky
Database schemasSometimesSchema changes require manual coordination and downtime
Application configurationSometimesConfig drift between environments, “works in staging” failures
Monitoring and alertingRarelyMonitoring gaps, unreproducible dashboards, alert fatigue
Security policiesRarelySecurity as a gate instead of a guardrail, audit failures

The goal is for every row in this table to be “yes.” You will not get there overnight, but every artifact you move from manual to code-managed removes a bottleneck and a risk.

How to Get There

Start with what blocks you most

Do not try to move everything to code at once. Identify the artifact type that causes the most pain or blocks deployments most frequently:

  • If environment provisioning takes days, start with infrastructure as code.
  • If database changes are the reason you cannot deploy more than once a week, start with schema migrations as code.
  • If pipeline changes require tickets to a platform team, start with pipeline as code.
  • If configuration drift causes production incidents, start with configuration as code.

Apply the same practices as application code

Once an artifact is defined as code, treat it with the same rigor as application code:

  • Store it in version control (ideally in the same repository as the application it supports)
  • Review changes before they are applied
  • Test changes automatically (linting, dry-runs, policy checks)
  • Deliver changes through a pipeline
  • Never modify the artifact outside of this process

Eliminate manual pathways

The hardest part is closing the manual back doors. As long as someone can SSH into a server and make a change, or click through a UI to modify infrastructure, the code-defined state will drift from reality.

The principle is the same as Single Path to Production for application code: the pipeline is the only way any change reaches production. This applies to infrastructure, configuration, schemas, monitoring, and policies just as much as it applies to application code.

Measuring Progress

MetricWhat to look for
Artifact types managed as codeCount of categories fully code-managed; should increase over time
Manual changes to productionChanges made outside a pipeline (SSH, UI, manual scripts); target zero
Environment recreation timeTime to recreate a production-like environment from scratch; should shrink steadily
Mean time to recoveryMTTR drops when recovery means “re-run the pipeline”