This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Testing Fundamentals

Build a test architecture that gives your pipeline the confidence to deploy any change, even when dependencies outside your control are unavailable.

1: What to Test - and What Not To
2: Pipeline Test Strategy
3: Getting Started
4: Defect Feedback Loop

Phase 1 - Foundations

Continuous delivery requires that trunk always be releasable, which means testing it automatically on every change. A collection of tests is not enough. You need a test architecture: different test types working together so the pipeline can confidently deploy any change, even when external systems are unavailable.

Testing Goals for CD

Your test suite must meet these goals before it can support continuous delivery.

Goal	Target	How to Measure
Fast	CI gating tests < 10 minutes; full acceptance suite < 1 hour	CI gating suite duration; full acceptance suite duration
Deterministic	Same code always produces the same result	Flaky test count: 0 in the gating suite
Catches real bugs	Tests fail when behavior is wrong, not when implementation changes	Defect escape rate trending down
Independent of external systems	Pipeline can determine deployability without any dependency being available	External dependencies in gating tests: 0
Test doubles stay current	Contract tests confirm test doubles match reality	All contract tests passing within last 24 hours
Coverage trends up	Every new change gets a test	Coverage percentage increasing over time

In This Section

Page	What You’ll Learn
What to Test	Which boundaries matter and how to eliminate external dependencies from your pipeline
Pipeline Test Strategy	What tests run where in a CD pipeline and how contract tests validate test doubles
Getting Started	Audit your current suite, fix flaky tests, and decouple from external systems
Defect Feedback Loop	Trace defects to their origin and prevent entire categories of bugs

The Ice Cream Cone: What to Avoid

An inverted test distribution, with too many slow end-to-end tests and too few fast unit tests, is the most common testing barrier to CD.

The ice cream cone anti-pattern: an inverted test distribution where most testing effort goes to manual and end-to-end tests at the top, with too few fast unit tests at the bottom

The ice cream cone makes CD impossible. Manual testing gates block every release. End-to-end tests take hours, fail randomly, and depend on external systems being healthy. For the test architecture that replaces this, see Pipeline Test Strategy and the Testing reference.

Next Step

Automate your build process so that building, testing, and packaging happen with a single command. Continue to Build Automation.

Content contributed by Dojo Consortium, licensed under CC BY 4.0. Additional concepts drawn from Ham Vocke, The Practical Test Pyramid, and Toby Clemson, Testing Strategies in a Microservice Architecture.

Flaky Tests - Symptom of non-deterministic tests that destroy pipeline trust
High Coverage, Ineffective Tests - Symptom where coverage metrics mask poor test quality
Refactoring Breaks Tests - Symptom of white-box tests that assert on implementation details
Slow Test Suites - Symptom caused by an inverted test pyramid or missing test doubles
Environment-Dependent Failures - Symptom of tests coupled to external systems
Inverted Test Pyramid - Anti-pattern where too many slow E2E tests replace fast unit tests
Pressure to Skip Testing - Anti-pattern where testing is treated as optional under deadline pressure

1 - What to Test - and What Not To

The principles that determine what belongs in your test suite and what does not - focusing on interfaces, isolating what you control, and applying the same pattern to frontend and backend.

Three principles determine what belongs in your test suite and what does not.

If you cannot fix it, do not test for it

You should never test the behavior of services you consume. Testing their behavior is the responsibility of the team that builds them. If their service returns incorrect data, you cannot fix that, so testing for it is waste.

What you should test is how your system responds when a consumed service is unstable or unavailable. Can you degrade gracefully? Do you return a meaningful error? Do you retry appropriately? These are behaviors you own and can fix, so they belong in your test suite.

This principle directly enables the pipeline test strategy. When you stop testing things you cannot fix, you stop depending on external systems in your pipeline. Your tests become faster, more deterministic, and more focused on the code your team actually ships.

Test interfaces first

Most integration failures originate at interfaces, the boundaries where your system talks to other systems. These boundaries are the highest-risk areas in your codebase, and they deserve the most testing attention. But testing interfaces does not require integrating with the real system on the other side.

When you test an interface you consume, the question is: “Can I understand the response and act accordingly?” If you send a request for a user’s information, you do not test that you get that specific user back. You test that you receive and understand the properties you need - that your code can parse the response structure and make correct decisions based on it. This distinction matters because it keeps your tests deterministic and focused on what you control.

Use contract mocks, virtual services, or any test double that faithfully represents the interface contract. The test validates your side of the conversation, not theirs.

Frontend and backend follow the same pattern

Both frontend and backend applications provide interfaces to consumers and consume interfaces from providers. The only difference is the consumer: a frontend provides an interface for humans, while a backend provides one for machines. The testing strategy is the same.

Test frontend code the same way you test backend code: validate the interface you provide, test logic in isolation, and verify that user actions trigger the correct behavior. The only difference is the consumer (a human instead of a machine).

For a frontend:

Validate the interface you provide. The UI contains the components it should and they appear correctly. This is the equivalent of verifying your API returns the right response structure.
Test behavior isolated from presentation. Use your unit test framework to test the logic that UI controls trigger, separated from the rendering layer. This gives you the same speed and control you get from testing backend logic in isolation.
Verify that controls trigger the right logic. Confirm that user actions invoke the correct behavior, without needing a running backend or browser-based E2E test.

This approach gives you targeted testing with far more control. Testing exception flows - what happens when a service returns an error, when a network request times out, when data is malformed, becomes straightforward instead of requiring elaborate E2E setups that are hard to make fail on demand.

Test Quality Over Coverage Percentage

Code coverage tells you which lines executed during tests. It does not tell you whether the tests verified anything meaningful. A test suite with 90% coverage and no assertions has high coverage and zero value.

Better questions than “what is our coverage percentage?”:

When a test fails, does it point directly to the defect?
When we refactor, do tests break because behavior changed or because implementation details shifted?
Do our tests catch the bugs that actually reach production?
Can a developer trust a green build enough to deploy immediately?

Why coverage mandates are harmful

When teams are required to hit a coverage target, they write tests to satisfy the metric rather than to verify behavior. This produces:

Tests that exercise code paths without asserting outcomes
Tests that mirror implementation rather than specify behavior
Tests that inflate the number without improving confidence

The metric goes up while the defect escape rate stays the same. Worse, meaningless tests add maintenance cost and slow down the suite.

Instead of mandating a coverage number, set a coverage floor (see Getting Started) and focus team attention on test quality: mutation testing scores, defect escape rates, and whether developers actually trust the suite enough to deploy on green.

High Coverage, Ineffective Tests - When coverage metrics mask poor test quality
Refactoring Breaks Tests - Tests that assert on implementation details instead of behavior
Code Coverage Mandates - The anti-pattern of mandating coverage targets
Test Doubles - Patterns for isolating dependencies in tests
Contract Tests - Verifying that test doubles match reality

2 - Pipeline Test Strategy

What tests run where in a CD pipeline, how contract tests validate the test doubles used inside the pipeline, and why everything that blocks deployment must be deterministic.

Everything that blocks deployment must be deterministic and under your control. Everything that involves external systems runs asynchronously or post-deployment. This gives you the independence to deploy any time, regardless of the state of the world around you.

Tests Inside the Pipeline

These tests run on every commit and block deployment if they fail. They must be fast, deterministic, and free of external dependencies.

Every test in this pipeline uses test doubles for external dependencies. No test calls a real external API, database, or third-party service. This means:

A downstream outage cannot block your deployment. Your pipeline runs the same whether external systems are healthy or down.
Tests are deterministic. The same code always produces the same result.
The suite is fast. No network latency, no waiting for external systems to respond.

Why re-run tests post-merge?

Two changes can each pass pre-merge independently but conflict when combined on trunk. The post-merge run catches these integration effects. If a post-merge failure occurs, the team fixes it immediately. Trunk must always be releasable.

Tests Outside the Pipeline

These tests involve real external systems and are therefore non-deterministic. They never block deployment. Instead, they validate assumptions and monitor production health.

Test Type	When It Runs	What It Does on Failure
Contract tests	On a schedule (hourly or daily)	Triggers review; team updates test doubles to match new reality
E2E smoke tests	After each deployment	Triggers rollback if critical path is broken
Synthetic monitoring	Continuously in production	Triggers alerts for operations

How Contract Tests Validate Test Doubles

The pipeline’s deterministic tests depend on test doubles to represent external systems. But test doubles can drift from reality. An API adds a required field, changes a response format, or deprecates an endpoint. Contract tests close this gap.

Pipeline tests use test doubles that encode your assumptions about external APIs - response schemas, status codes, error formats.
Contract tests run on a schedule and send real requests to the actual external APIs.
Contract tests compare the real response against what your test doubles return. They check structure and types, not specific data values.
When a contract test passes, your test doubles are confirmed accurate. The pipeline’s deterministic tests are trustworthy.
When a contract test fails, the team is alerted. They update the test doubles to match the new reality, then re-run component tests to verify nothing breaks.

This design means your pipeline never touches external systems, but you still catch when external systems change. You get both speed and accuracy.

Consumer-driven contracts

When the external API is owned by another team in your organization, you can go further with consumer-driven contracts. Instead of your team polling their API on a schedule, both teams share a contract specification (using a tool like Pact):

You (the consumer) define the requests you send and the responses you expect.
They (the provider) run your contract as part of their build. If a change would break your expectations, their build fails before they deploy.
Your test doubles are generated from the contract, guaranteeing they match what the provider actually delivers.

This shifts contract validation from “detect and react” to “prevent.” See Contract Tests for implementation details.

Summary: All Stages at a Glance

Stage	Blocks Deployment?	Uses Test Doubles?	Deterministic?
Every Commit	Yes	Yes - all external deps	Yes
Post-Merge	Yes	Yes - all external deps	Yes
Scheduled (Contract)	No - triggers review	No - hits real APIs	No
Post-Deploy (E2E)	No - triggers rollback	No - real system	No
Production (Monitoring)	No - triggers alerts	No - real system	No

The Testing reference provides detailed documentation for each test type, including code examples and anti-patterns.

Testing Reference - Full reference for each test type
Test Doubles - Patterns for stubs, mocks, fakes, spies, and dummies
Contract Tests - Consumer-driven and provider contracts in detail
Test Feedback Speed - The cognitive science behind the 10-minute target
Pipeline Architecture - The pipeline structure these tests feed into
Pipeline Reference Architecture - Reference diagrams for single-team and multi-team pipelines

3 - Getting Started

Practical steps to audit your test suite, fix flaky tests, decouple from external dependencies, and adopt test-driven development.

Starting Without Full Coverage

Teams often delay adopting CI because their existing code lacks tests. This is backwards. You do not need tests for existing code to begin. You need one rule applied without exception:

Every new change gets a test. We will not go lower than the current level of code coverage.

Record your current coverage percentage as a baseline. Configure CI to fail if coverage drops below that number. This does not mean the baseline is good enough. It means the trend only moves in one direction. Every bug fix, every new feature, and every refactoring adds tests. Over time, coverage grows organically in the areas that matter most: the code that is actively changing.

Do not attempt to retrofit tests across the entire codebase before starting CI. That approach takes months and delivers no incremental value. It also produces low-quality tests written by developers who are testing code they did not write and do not fully understand.

Quick-Start Action Plan

If your test suite is not yet ready to support CD, use this focused action plan to make immediate progress.

1. Audit your current test suite

Assess where you stand before making changes.

Actions:

Run your full test suite 3 times. Note total duration and any tests that pass intermittently (flaky tests).
Count tests by type: unit, integration, functional, end-to-end.
Identify tests that require external dependencies (databases, APIs, file systems) to run.
Record your baseline: total test count, pass rate, duration, flaky test count.
Map each test type to a pipeline stage. Which tests gate deployment? Which run asynchronously? Which tests couple your deployment to external systems?

Output: A clear picture of your test distribution and the specific problems to address.

2. Fix or remove flaky tests

Flaky tests are worse than no tests. They train developers to ignore failures, which means real failures also get ignored.

Actions:

Quarantine all flaky tests immediately. Move them to a separate suite that does not block the build.
For each quarantined test, decide: fix it (if the behavior it tests matters) or delete it (if it does not).
Common causes of flakiness: timing dependencies, shared mutable state, reliance on external services, test order dependencies.
Target: zero flaky tests in your main test suite.

3. Decouple your pipeline from external dependencies

This is the highest-leverage change for CD. Identify every test that calls a real external service and replace that dependency with a test double.

Actions:

List every external service your tests depend on: databases, APIs, message queues, file storage, third-party services.
For each dependency, decide the right test double approach:
- In-memory fakes for databases (e.g., SQLite, H2, testcontainers with local instances).
- HTTP stubs for external APIs (e.g., WireMock, nock, MSW).
- Fakes for message queues, email services, and other infrastructure.
Replace the dependencies in your unit and component tests.
Move the original tests that hit real services into a separate suite. These become your starting contract tests or E2E smoke tests.

Output: A test suite where everything that blocks the build is deterministic and runs without network access to external systems.

4. Add component tests for critical paths

If you do not have component tests that exercise your whole service in isolation, start with the most critical paths.

Actions:

Identify the 3-5 most critical user journeys or API endpoints in your application.
Write a component test for each: boot the application, stub external dependencies, send a real request or simulate a real user action, verify the response.
Each component test should prove that the feature works correctly assuming external dependencies behave as expected (which your test doubles encode).
Run these in CI on every commit.

Output: Component tests covering your critical paths, running in CI on every commit.

5. Set up contract tests for your most important dependency

Pick the external dependency that changes most frequently or has caused the most production issues. Set up a contract test for it.

Actions:

Write a contract test that validates the response structure (types, required fields, status codes) of the dependency’s API.
Run it on a schedule (e.g., every hour or daily), not on every commit.
When it fails, update your test doubles to match the new reality and re-verify your component tests.
If the dependency is owned by another team in your organization, explore consumer-driven contracts with a tool like Pact.

Output: One contract test running on a schedule, with a process to update test doubles when it fails.

6. Adopt TDD for new code

Once your pipeline tests are reliable, adopt TDD for all new work. TDD is the practice of writing the test before the code. It ensures every piece of behavior has a corresponding test.

The TDD cycle

Red: Write a failing test that describes the behavior you want.
Green: Write the minimum code to make the test pass.
Refactor: Improve the code without changing the behavior. The test ensures you do not break anything.

Why TDD matters for CD

Every change is automatically covered by a test
The test suite grows proportionally with the codebase
Tests describe behavior, not implementation, making them more resilient to refactoring
Developers get immediate feedback on whether their change works

TDD is not mandatory for CD, but teams that practice TDD consistently have significantly faster and more reliable test suites.

How to start: Pick one new feature or bug fix this week. Write the test first, watch it fail, write the code to make it pass, then refactor. Do not try to retroactively TDD your entire codebase. Apply TDD to new code and to any code you modify.

Output: Team members practicing TDD on new work, with at least one completed red-green-refactor cycle.

Pipeline Test Strategy - Where these tests fit in your pipeline
Flaky Tests - Symptom of non-deterministic tests
Test Doubles - Patterns for isolating dependencies
Contract Tests - Verifying test doubles match reality
No Contract Testing Between Services - The anti-pattern this action plan addresses
A Large Codebase Has No Automated Tests - Starting testing on a brownfield codebase

4 - Defect Feedback Loop

How to trace defects to their origin and make systemic changes that prevent entire categories of bugs from recurring.

Treat every test failure as diagnostic data about where your process breaks down, not just as something to fix. When you identify the systemic source of defects, you can prevent entire categories from recurring.

Two questions sharpen this thinking:

What is the earliest point we can detect this defect? The later a defect is found, the more expensive it is to fix. A requirements defect caught during example mapping costs minutes. The same defect caught in production costs days of incident response, rollback, and rework.
Can AI help us detect it earlier? AI-assisted tools can now surface defects at stages where only human review was previously possible, shifting detection left without adding manual effort.

Trace Every Defect to Its Origin

When a test catches a defect (or worse, when a defect escapes to production) ask: where was this defect introduced, and what would have prevented it from being created?

Defects do not originate randomly. They cluster around specific causes. The CD Defect Detection and Remediation Catalog documents over 30 defect types across eight categories, with detection methods, AI opportunities, and systemic fixes for each.

Category	Example Defects	Earliest Detection	Systemic Fix
Requirements	Building the right thing wrong, or the wrong thing right	Discovery, during story refinement or example mapping	Acceptance criteria as user outcomes, Three Amigos sessions, example mapping
Missing domain knowledge	Business rules encoded incorrectly, tribal knowledge loss	During coding, when the developer writes the logic	Ubiquitous language (DDD), pair programming, rotate ownership
Integration boundaries	Interface mismatches, wrong assumptions about upstream behavior	During design, when defining the interface contract	Contract tests per boundary, API-first design, circuit breakers
Untested edge cases	Null handling, boundary values, error paths	Pre-commit, through null-safe type systems and static analysis	Property-based testing, boundary value analysis, test for every bug fix
Unintended side effects	Change to module A breaks module B	At commit time, when CI runs the full test suite	Small commits, trunk-based development, feature flags, modular design
Accumulated complexity	Defects cluster in the most complex, most-changed files	Continuously, through static analysis in the IDE and CI	Refactoring as part of every story, dedicated complexity budget
Process and deployment	Long-lived branches, manual pipeline steps, excessive batching	Pre-commit for branch age; CI for pipeline and batching issues	Trunk-based development, automate every step, blue/green or canary deploys
Data and state	Null pointer exceptions, schema migration failures, concurrency issues	Pre-commit for null safety; CI for schema compatibility	Null-safe types, expand-then-contract for schema changes, design for idempotency

For the complete catalog covering all defect categories (including product and discovery, dependency and infrastructure, testing and observability gaps, and more) see the CD Defect Detection and Remediation Catalog.

Build a Defect Feedback Loop

You need a process that systematically connects test failures to root causes and root causes to systemic fixes.

Classify every defect. When a test fails or a bug is reported, tag it with its origin category from the tables above. This takes seconds and builds a dataset over time.
Look for patterns. Monthly (or during retrospectives), review the defect classifications. Which categories appear most often? That is where your process is weakest.
Apply the systemic fix, not just the local fix. When you fix a bug, also ask: what systemic change would prevent this entire category of bug? If most defects come from integration boundaries, the fix is not “write more integration tests.” It is “make contract tests mandatory for every new boundary.” If most defects come from untested edge cases, the fix is not “increase code coverage.” It is “adopt property-based testing as a standard practice.”
Measure whether the fix works. Track defect counts by category over time. If you applied a systemic fix for integration boundary defects and the count does not drop, the fix is not working and you need a different approach.

The Test-for-Every-Bug-Fix Rule

Every bug fix must include a test that reproduces the bug before the fix and passes after. This is non-negotiable for CD because:

It proves the fix actually addresses the defect (not just the symptom).
It prevents the same defect from recurring.
It builds test coverage exactly where the codebase is weakest: the places where bugs actually occur.
Over time, it shifts your test suite from “tests we thought to write” to “tests that cover real failure modes.”

Advanced Detection Techniques

As your test architecture matures, add techniques that catch defects before manual review:

Technique	What It Finds	When to Adopt
Mutation testing (Stryker, PIT)	Tests that pass but do not actually verify behavior (your test suite’s blind spots)	When basic coverage is in place but defect escape rate is not dropping
Property-based testing	Edge cases and boundary conditions across large input spaces that example-based tests miss	When defects cluster around unexpected input combinations
Chaos engineering	Failure modes in distributed systems: what happens when a dependency is slow, returns errors, or disappears	When you have component tests and contract tests in place and need confidence in failure handling
Static analysis and linting	Null safety violations, type errors, security vulnerabilities, dead code	From day one. These are cheap and fast

For more examples of mapping defect origins to detection methods and systemic corrections, see the CD Defect Detection and Remediation Catalog.

Systemic Defect Fixes - Detailed reference for each defect category
High Coverage, Ineffective Tests - When tests pass but do not catch real defects
Refactoring Breaks Tests - Tests that break on implementation changes
Retrospectives - Where defect pattern review fits in the improvement cycle
Metrics-Driven Improvement - Using defect escape rate as a key metric

Testing Fundamentals

Testing Goals for CD

In This Section

The Ice Cream Cone: What to Avoid

Next Step

Related Content

1 - What to Test - and What Not To

If you cannot fix it, do not test for it

Test interfaces first

Frontend and backend follow the same pattern

Test Quality Over Coverage Percentage

Why coverage mandates are harmful

Related Content

2 - Pipeline Test Strategy

Tests Inside the Pipeline

Why re-run tests post-merge?

Tests Outside the Pipeline

How Contract Tests Validate Test Doubles

Consumer-driven contracts

Summary: All Stages at a Glance

Related Content

3 - Getting Started

Starting Without Full Coverage

Quick-Start Action Plan

1. Audit your current test suite

2. Fix or remove flaky tests

3. Decouple your pipeline from external dependencies

4. Add component tests for critical paths

5. Set up contract tests for your most important dependency

6. Adopt TDD for new code

The TDD cycle

Why TDD matters for CD

Related Content

4 - Defect Feedback Loop

Trace Every Defect to Its Origin

Build a Defect Feedback Loop

The Test-for-Every-Bug-Fix Rule

Advanced Detection Techniques

Related Content