This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Testing Fundamentals

Build a test architecture that gives your pipeline the confidence to deploy any change, even when dependencies outside your control are unavailable.

Phase 1 - Foundations

Continuous delivery requires that trunk always be releasable, which means testing it automatically on every change. A collection of tests is not enough. You need a test architecture: different test types working together so the pipeline can confidently deploy any change, even when external systems are unavailable.

Testing Goals for CD

Your test suite must meet these goals before it can support continuous delivery.

GoalTargetHow to Measure
FastCI gating tests < 10 minutes; full acceptance suite < 1 hourCI gating suite duration; full acceptance suite duration
DeterministicSame code always produces the same resultFlaky test count: 0 in the gating suite
Catches real bugsTests fail when behavior is wrong, not when implementation changesDefect escape rate trending down
Independent of external systemsPipeline can determine deployability without any dependency being availableExternal dependencies in gating tests: 0
Test doubles stay currentContract tests confirm test doubles match realityAll contract tests passing within last 24 hours
Coverage trends upEvery new change gets a testCoverage percentage increasing over time

In This Section

PageWhat You’ll Learn
What to TestWhich boundaries matter and how to eliminate external dependencies from your pipeline
Pipeline Test StrategyWhat tests run where in a CD pipeline and how contract tests validate test doubles
Getting StartedAudit your current suite, fix flaky tests, and decouple from external systems
Defect Feedback LoopTrace defects to their origin and prevent entire categories of bugs

The Ice Cream Cone: What to Avoid

An inverted test distribution, with too many slow end-to-end tests and too few fast unit tests, is the most common testing barrier to CD.

The ice cream cone anti-pattern: an inverted test distribution where most testing effort goes to manual and end-to-end tests at the top, with too few fast unit tests at the bottom

The ice cream cone makes CD impossible. Manual testing gates block every release. End-to-end tests take hours, fail randomly, and depend on external systems being healthy. For the test architecture that replaces this, see Pipeline Test Strategy and the Testing reference.

Next Step

Automate your build process so that building, testing, and packaging happen with a single command. Continue to Build Automation.


Content contributed by Dojo Consortium, licensed under CC BY 4.0. Additional concepts drawn from Ham Vocke, The Practical Test Pyramid, and Toby Clemson, Testing Strategies in a Microservice Architecture.


1 - What to Test - and What Not To

The principles that determine what belongs in your test suite and what does not - focusing on interfaces, isolating what you control, and applying the same pattern to frontend and backend.

Three principles determine what belongs in your test suite and what does not.

If you cannot fix it, do not test for it

You should never test the behavior of services you consume. Testing their behavior is the responsibility of the team that builds them. If their service returns incorrect data, you cannot fix that, so testing for it is waste.

What you should test is how your system responds when a consumed service is unstable or unavailable. Can you degrade gracefully? Do you return a meaningful error? Do you retry appropriately? These are behaviors you own and can fix, so they belong in your test suite.

This principle directly enables the pipeline test strategy. When you stop testing things you cannot fix, you stop depending on external systems in your pipeline. Your tests become faster, more deterministic, and more focused on the code your team actually ships.

Test interfaces first

Most integration failures originate at interfaces, the boundaries where your system talks to other systems. These boundaries are the highest-risk areas in your codebase, and they deserve the most testing attention. But testing interfaces does not require integrating with the real system on the other side.

When you test an interface you consume, the question is: “Can I understand the response and act accordingly?” If you send a request for a user’s information, you do not test that you get that specific user back. You test that you receive and understand the properties you need - that your code can parse the response structure and make correct decisions based on it. This distinction matters because it keeps your tests deterministic and focused on what you control.

Use contract mocks, virtual services, or any test double that faithfully represents the interface contract. The test validates your side of the conversation, not theirs.

Frontend and backend follow the same pattern

Both frontend and backend applications provide interfaces to consumers and consume interfaces from providers. The only difference is the consumer: a frontend provides an interface for humans, while a backend provides one for machines. The testing strategy is the same.

Test frontend code the same way you test backend code: validate the interface you provide, test logic in isolation, and verify that user actions trigger the correct behavior. The only difference is the consumer (a human instead of a machine).

For a frontend:

  • Validate the interface you provide. The UI contains the components it should and they appear correctly. This is the equivalent of verifying your API returns the right response structure.
  • Test behavior isolated from presentation. Use your unit test framework to test the logic that UI controls trigger, separated from the rendering layer. This gives you the same speed and control you get from testing backend logic in isolation.
  • Verify that controls trigger the right logic. Confirm that user actions invoke the correct behavior, without needing a running backend or browser-based E2E test.

This approach gives you targeted testing with far more control. Testing exception flows - what happens when a service returns an error, when a network request times out, when data is malformed, becomes straightforward instead of requiring elaborate E2E setups that are hard to make fail on demand.

Test Quality Over Coverage Percentage

Code coverage tells you which lines executed during tests. It does not tell you whether the tests verified anything meaningful. A test suite with 90% coverage and no assertions has high coverage and zero value.

Better questions than “what is our coverage percentage?”:

  • When a test fails, does it point directly to the defect?
  • When we refactor, do tests break because behavior changed or because implementation details shifted?
  • Do our tests catch the bugs that actually reach production?
  • Can a developer trust a green build enough to deploy immediately?

Why coverage mandates are harmful

When teams are required to hit a coverage target, they write tests to satisfy the metric rather than to verify behavior. This produces:

  • Tests that exercise code paths without asserting outcomes
  • Tests that mirror implementation rather than specify behavior
  • Tests that inflate the number without improving confidence

The metric goes up while the defect escape rate stays the same. Worse, meaningless tests add maintenance cost and slow down the suite.

Instead of mandating a coverage number, set a coverage floor (see Getting Started) and focus team attention on test quality: mutation testing scores, defect escape rates, and whether developers actually trust the suite enough to deploy on green.


2 - Pipeline Test Strategy

What tests run where in a CD pipeline, how contract tests validate the test doubles used inside the pipeline, and why everything that blocks deployment must be deterministic.

Everything that blocks deployment must be deterministic and under your control. Everything that involves external systems runs asynchronously or post-deployment. This gives you the independence to deploy any time, regardless of the state of the world around you.

Tests Inside the Pipeline

These tests run on every commit and block deployment if they fail. They must be fast, deterministic, and free of external dependencies.

Tests inside the pipeline: pre-merge stage runs static analysis, unit tests, integration tests, and component tests in under 10 minutes. Post-merge re-runs the full deterministic suite. All external dependencies are replaced by test doubles.

Every test in this pipeline uses test doubles for external dependencies. No test calls a real external API, database, or third-party service. This means:

  • A downstream outage cannot block your deployment. Your pipeline runs the same whether external systems are healthy or down.
  • Tests are deterministic. The same code always produces the same result.
  • The suite is fast. No network latency, no waiting for external systems to respond.

Why re-run tests post-merge?

Two changes can each pass pre-merge independently but conflict when combined on trunk. The post-merge run catches these integration effects. If a post-merge failure occurs, the team fixes it immediately. Trunk must always be releasable.

Tests Outside the Pipeline

These tests involve real external systems and are therefore non-deterministic. They never block deployment. Instead, they validate assumptions and monitor production health.

Tests outside the pipeline: contract tests run on a schedule to validate test doubles against real APIs. Post-deployment runs E2E smoke tests and synthetic monitoring. Failures trigger test double updates, rollback, or alerts - never block deployment.
Test TypeWhen It RunsWhat It Does on Failure
Contract testsOn a schedule (hourly or daily)Triggers review; team updates test doubles to match new reality
E2E smoke testsAfter each deploymentTriggers rollback if critical path is broken
Synthetic monitoringContinuously in productionTriggers alerts for operations

How Contract Tests Validate Test Doubles

The pipeline’s deterministic tests depend on test doubles to represent external systems. But test doubles can drift from reality. An API adds a required field, changes a response format, or deprecates an endpoint. Contract tests close this gap.

How contract tests validate test doubles: inside the pipeline, your code calls test doubles that return canned responses. Outside the pipeline, contract tests send real requests to external APIs and compare the response schema against test double definitions. A match confirms accuracy; a mismatch triggers an alert to update test doubles and re-verify.
  1. Pipeline tests use test doubles that encode your assumptions about external APIs - response schemas, status codes, error formats.
  2. Contract tests run on a schedule and send real requests to the actual external APIs.
  3. Contract tests compare the real response against what your test doubles return. They check structure and types, not specific data values.
  4. When a contract test passes, your test doubles are confirmed accurate. The pipeline’s deterministic tests are trustworthy.
  5. When a contract test fails, the team is alerted. They update the test doubles to match the new reality, then re-run component tests to verify nothing breaks.

This design means your pipeline never touches external systems, but you still catch when external systems change. You get both speed and accuracy.

Consumer-driven contracts

When the external API is owned by another team in your organization, you can go further with consumer-driven contracts. Instead of your team polling their API on a schedule, both teams share a contract specification (using a tool like Pact):

  • You (the consumer) define the requests you send and the responses you expect.
  • They (the provider) run your contract as part of their build. If a change would break your expectations, their build fails before they deploy.
  • Your test doubles are generated from the contract, guaranteeing they match what the provider actually delivers.

This shifts contract validation from “detect and react” to “prevent.” See Contract Tests for implementation details.

Summary: All Stages at a Glance

StageBlocks Deployment?Uses Test Doubles?Deterministic?
Every CommitYesYes - all external depsYes
Post-MergeYesYes - all external depsYes
Scheduled (Contract)No - triggers reviewNo - hits real APIsNo
Post-Deploy (E2E)No - triggers rollbackNo - real systemNo
Production (Monitoring)No - triggers alertsNo - real systemNo

The Testing reference provides detailed documentation for each test type, including code examples and anti-patterns.


3 - Getting Started

Practical steps to audit your test suite, fix flaky tests, decouple from external dependencies, and adopt test-driven development.

Starting Without Full Coverage

Teams often delay adopting CI because their existing code lacks tests. This is backwards. You do not need tests for existing code to begin. You need one rule applied without exception:

Every new change gets a test. We will not go lower than the current level of code coverage.

Record your current coverage percentage as a baseline. Configure CI to fail if coverage drops below that number. This does not mean the baseline is good enough. It means the trend only moves in one direction. Every bug fix, every new feature, and every refactoring adds tests. Over time, coverage grows organically in the areas that matter most: the code that is actively changing.

Do not attempt to retrofit tests across the entire codebase before starting CI. That approach takes months and delivers no incremental value. It also produces low-quality tests written by developers who are testing code they did not write and do not fully understand.

Quick-Start Action Plan

If your test suite is not yet ready to support CD, use this focused action plan to make immediate progress.

1. Audit your current test suite

Assess where you stand before making changes.

Actions:

  • Run your full test suite 3 times. Note total duration and any tests that pass intermittently (flaky tests).
  • Count tests by type: unit, integration, functional, end-to-end.
  • Identify tests that require external dependencies (databases, APIs, file systems) to run.
  • Record your baseline: total test count, pass rate, duration, flaky test count.
  • Map each test type to a pipeline stage. Which tests gate deployment? Which run asynchronously? Which tests couple your deployment to external systems?

Output: A clear picture of your test distribution and the specific problems to address.

2. Fix or remove flaky tests

Flaky tests are worse than no tests. They train developers to ignore failures, which means real failures also get ignored.

Actions:

  • Quarantine all flaky tests immediately. Move them to a separate suite that does not block the build.
  • For each quarantined test, decide: fix it (if the behavior it tests matters) or delete it (if it does not).
  • Common causes of flakiness: timing dependencies, shared mutable state, reliance on external services, test order dependencies.
  • Target: zero flaky tests in your main test suite.

3. Decouple your pipeline from external dependencies

This is the highest-leverage change for CD. Identify every test that calls a real external service and replace that dependency with a test double.

Actions:

  • List every external service your tests depend on: databases, APIs, message queues, file storage, third-party services.
  • For each dependency, decide the right test double approach:
    • In-memory fakes for databases (e.g., SQLite, H2, testcontainers with local instances).
    • HTTP stubs for external APIs (e.g., WireMock, nock, MSW).
    • Fakes for message queues, email services, and other infrastructure.
  • Replace the dependencies in your unit and component tests.
  • Move the original tests that hit real services into a separate suite. These become your starting contract tests or E2E smoke tests.

Output: A test suite where everything that blocks the build is deterministic and runs without network access to external systems.

4. Add component tests for critical paths

If you do not have component tests that exercise your whole service in isolation, start with the most critical paths.

Actions:

  • Identify the 3-5 most critical user journeys or API endpoints in your application.
  • Write a component test for each: boot the application, stub external dependencies, send a real request or simulate a real user action, verify the response.
  • Each component test should prove that the feature works correctly assuming external dependencies behave as expected (which your test doubles encode).
  • Run these in CI on every commit.

Output: Component tests covering your critical paths, running in CI on every commit.

5. Set up contract tests for your most important dependency

Pick the external dependency that changes most frequently or has caused the most production issues. Set up a contract test for it.

Actions:

  • Write a contract test that validates the response structure (types, required fields, status codes) of the dependency’s API.
  • Run it on a schedule (e.g., every hour or daily), not on every commit.
  • When it fails, update your test doubles to match the new reality and re-verify your component tests.
  • If the dependency is owned by another team in your organization, explore consumer-driven contracts with a tool like Pact.

Output: One contract test running on a schedule, with a process to update test doubles when it fails.

6. Adopt TDD for new code

Once your pipeline tests are reliable, adopt TDD for all new work. TDD is the practice of writing the test before the code. It ensures every piece of behavior has a corresponding test.

The TDD cycle

  1. Red: Write a failing test that describes the behavior you want.
  2. Green: Write the minimum code to make the test pass.
  3. Refactor: Improve the code without changing the behavior. The test ensures you do not break anything.

Why TDD matters for CD

  • Every change is automatically covered by a test
  • The test suite grows proportionally with the codebase
  • Tests describe behavior, not implementation, making them more resilient to refactoring
  • Developers get immediate feedback on whether their change works

TDD is not mandatory for CD, but teams that practice TDD consistently have significantly faster and more reliable test suites.

How to start: Pick one new feature or bug fix this week. Write the test first, watch it fail, write the code to make it pass, then refactor. Do not try to retroactively TDD your entire codebase. Apply TDD to new code and to any code you modify.

Output: Team members practicing TDD on new work, with at least one completed red-green-refactor cycle.


4 - Defect Feedback Loop

How to trace defects to their origin and make systemic changes that prevent entire categories of bugs from recurring.

Treat every test failure as diagnostic data about where your process breaks down, not just as something to fix. When you identify the systemic source of defects, you can prevent entire categories from recurring.

Two questions sharpen this thinking:

  1. What is the earliest point we can detect this defect? The later a defect is found, the more expensive it is to fix. A requirements defect caught during example mapping costs minutes. The same defect caught in production costs days of incident response, rollback, and rework.
  2. Can AI help us detect it earlier? AI-assisted tools can now surface defects at stages where only human review was previously possible, shifting detection left without adding manual effort.

Trace Every Defect to Its Origin

When a test catches a defect (or worse, when a defect escapes to production) ask: where was this defect introduced, and what would have prevented it from being created?

Defects do not originate randomly. They cluster around specific causes. The CD Defect Detection and Remediation Catalog documents over 30 defect types across eight categories, with detection methods, AI opportunities, and systemic fixes for each.

CategoryExample DefectsEarliest DetectionSystemic Fix
RequirementsBuilding the right thing wrong, or the wrong thing rightDiscovery, during story refinement or example mappingAcceptance criteria as user outcomes, Three Amigos sessions, example mapping
Missing domain knowledgeBusiness rules encoded incorrectly, tribal knowledge lossDuring coding, when the developer writes the logicUbiquitous language (DDD), pair programming, rotate ownership
Integration boundariesInterface mismatches, wrong assumptions about upstream behaviorDuring design, when defining the interface contractContract tests per boundary, API-first design, circuit breakers
Untested edge casesNull handling, boundary values, error pathsPre-commit, through null-safe type systems and static analysisProperty-based testing, boundary value analysis, test for every bug fix
Unintended side effectsChange to module A breaks module BAt commit time, when CI runs the full test suiteSmall commits, trunk-based development, feature flags, modular design
Accumulated complexityDefects cluster in the most complex, most-changed filesContinuously, through static analysis in the IDE and CIRefactoring as part of every story, dedicated complexity budget
Process and deploymentLong-lived branches, manual pipeline steps, excessive batchingPre-commit for branch age; CI for pipeline and batching issuesTrunk-based development, automate every step, blue/green or canary deploys
Data and stateNull pointer exceptions, schema migration failures, concurrency issuesPre-commit for null safety; CI for schema compatibilityNull-safe types, expand-then-contract for schema changes, design for idempotency

For the complete catalog covering all defect categories (including product and discovery, dependency and infrastructure, testing and observability gaps, and more) see the CD Defect Detection and Remediation Catalog.

Build a Defect Feedback Loop

You need a process that systematically connects test failures to root causes and root causes to systemic fixes.

  1. Classify every defect. When a test fails or a bug is reported, tag it with its origin category from the tables above. This takes seconds and builds a dataset over time.
  2. Look for patterns. Monthly (or during retrospectives), review the defect classifications. Which categories appear most often? That is where your process is weakest.
  3. Apply the systemic fix, not just the local fix. When you fix a bug, also ask: what systemic change would prevent this entire category of bug? If most defects come from integration boundaries, the fix is not “write more integration tests.” It is “make contract tests mandatory for every new boundary.” If most defects come from untested edge cases, the fix is not “increase code coverage.” It is “adopt property-based testing as a standard practice.”
  4. Measure whether the fix works. Track defect counts by category over time. If you applied a systemic fix for integration boundary defects and the count does not drop, the fix is not working and you need a different approach.

The Test-for-Every-Bug-Fix Rule

Every bug fix must include a test that reproduces the bug before the fix and passes after. This is non-negotiable for CD because:

  • It proves the fix actually addresses the defect (not just the symptom).
  • It prevents the same defect from recurring.
  • It builds test coverage exactly where the codebase is weakest: the places where bugs actually occur.
  • Over time, it shifts your test suite from “tests we thought to write” to “tests that cover real failure modes.”

Advanced Detection Techniques

As your test architecture matures, add techniques that catch defects before manual review:

TechniqueWhat It FindsWhen to Adopt
Mutation testing (Stryker, PIT)Tests that pass but do not actually verify behavior (your test suite’s blind spots)When basic coverage is in place but defect escape rate is not dropping
Property-based testingEdge cases and boundary conditions across large input spaces that example-based tests missWhen defects cluster around unexpected input combinations
Chaos engineeringFailure modes in distributed systems: what happens when a dependency is slow, returns errors, or disappearsWhen you have component tests and contract tests in place and need confidence in failure handling
Static analysis and lintingNull safety violations, type errors, security vulnerabilities, dead codeFrom day one. These are cheap and fast

For more examples of mapping defect origins to detection methods and systemic corrections, see the CD Defect Detection and Remediation Catalog.