This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Testing
Testing types, patterns, and best practices for building confidence in your delivery pipeline.
A reliable test suite is essential for continuous delivery. These pages cover the
different types of tests, when to use each, and best practices for test architecture.
Test Types
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
1 - Unit Tests
Fast, deterministic tests that verify individual functions, methods, or components in isolation with test doubles for dependencies.
Definition
A unit test is a deterministic test that exercises a discrete unit of the application – such as
a function, method, or UI component – in isolation to determine whether it behaves as expected.
All external dependencies are replaced with test doubles so the test runs
quickly and produces the same result every time.
When testing the behavior of functions, prefer testing public APIs (methods, interfaces,
exported functions) over private internals. Testing private implementation details creates
change-detector tests that break during routine refactoring without adding safety.
The purpose of unit tests is to:
- Verify the functionality of a single unit (method, class, function) in isolation.
- Cover high-complexity logic where many input permutations exist, such as business rules, calculations, and state transitions.
- Keep cyclomatic complexity visible and manageable through good separation of concerns.
When to Use
- During development – run the relevant subset of unit tests continuously while writing
code. TDD (Red-Green-Refactor) is the most effective workflow.
- On every commit – use pre-commit hooks or watch-mode test runners so broken tests never
reach the remote repository.
- In CI – execute the full unit test suite on every pull request and on the trunk after
merge to verify nothing was missed locally.
Unit tests are the right choice when the behavior under test can be exercised without network
access, file system access, or database connections. If you need any of those, you likely need
an integration test or a functional test instead.
Characteristics
| Property |
Value |
| Speed |
Milliseconds per test |
| Determinism |
Always deterministic |
| Scope |
Single function, method, or component |
| Dependencies |
All replaced with test doubles |
| Network |
None |
| Database |
None |
| Breaks build |
Yes |
Examples
A JavaScript unit test verifying a pure utility function:
A Java unit test using Mockito to isolate the system under test:
Anti-Patterns
- Testing private methods – private implementations are meant to change. Test the public
interface that calls them instead.
- No assertions – a test that runs code without asserting anything provides false
confidence. Lint rules like
jest/expect-expect can catch this.
- Disabling or skipping tests – skipped tests erode confidence over time. Fix or remove
them.
- Testing implementation details – asserting on internal state or call order rather than
observable output creates brittle tests that break during refactoring.
- Ice cream cone testing – relying primarily on slow E2E tests while neglecting fast unit
tests inverts the test pyramid and slows feedback.
- Chasing coverage numbers – gaming coverage metrics (e.g., running code paths without
meaningful assertions) creates a false sense of confidence. Focus on use-case coverage
instead.
Connection to CD Pipeline
Unit tests occupy the base of the test pyramid. They run in the earliest stages of the
CI/CD pipeline and provide the fastest feedback loop:
- Local development – watch mode reruns tests on every save.
- Pre-commit – hooks run the suite before code reaches version control.
- PR verification – CI runs the full suite and blocks merge on failure.
- Trunk verification – CI reruns tests on the merged HEAD to catch integration issues.
Because unit tests are fast and deterministic, they should always break the build on failure.
A healthy CD pipeline depends on a large, reliable unit test suite that gives developers
confidence to ship small changes frequently.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
2 - Integration Tests
Deterministic tests that verify how units interact together or with external system boundaries using test doubles for non-deterministic dependencies.
Definition
An integration test is a deterministic test that verifies how the unit under test interacts
with other units without directly accessing external sub-systems. It may validate multiple
units working together (sometimes called a “sociable unit test”) or the portion of the code
that interfaces with an external network dependency while using a test double to represent
that dependency.
For clarity: an “integration test” is not a test that broadly integrates multiple
sub-systems. That is an end-to-end test.
When to Use
Integration tests provide the best balance of speed, confidence, and cost. Use them when:
- You need to verify that multiple units collaborate correctly – for example, a service
calling a repository that calls a data mapper.
- You need to validate the interface layer to an external system (HTTP client, message
producer, database query) while keeping the external system replaced by a test double.
- You want to confirm that a refactoring did not break behavior. Integration tests that
avoid testing implementation details survive refactors without modification.
- You are building a front-end component that composes child components and needs to verify
the assembled behavior from the user’s perspective.
If the test requires a live network call to a system outside localhost, it is either a
contract test or an E2E test.
Characteristics
| Property |
Value |
| Speed |
Milliseconds to low seconds |
| Determinism |
Always deterministic |
| Scope |
Multiple units or a unit plus its boundary |
| Dependencies |
External systems replaced with test doubles |
| Network |
Localhost only |
| Database |
Localhost / in-memory only |
| Breaks build |
Yes |
Examples
A JavaScript integration test verifying that a connector returns structured data:
Subcategories
Service integration tests – Validate how the system under test responds to information
from an external service. Use virtual services or static mocks; pair with
contract tests to keep the doubles current.
Database integration tests – Validate query logic against a controlled data store. Prefer
in-memory databases, isolated DB instances, or personalized datasets over shared live data.
Front-end integration tests – Render the component tree and interact with it the way a
user would. Follow the accessibility order of operations for element selection: visible text
and labels first, ARIA roles second, test IDs only as a last resort.
Anti-Patterns
- Peeking behind the curtain – using tools that expose component internals (e.g.,
Enzyme’s
instance() or state()) instead of testing from the user’s perspective.
- Mocking too aggressively – replacing every collaborator turns an integration test into a
unit test and removes the value of testing real interactions. Only mock what is necessary to
maintain determinism.
- Testing implementation details – asserting on internal state, private methods, or call
counts rather than observable output.
- Introducing a test user – creating an artificial actor that would never exist in
production. Write tests from the perspective of a real end-user or API consumer.
- Tolerating flaky tests – non-deterministic integration tests erode trust. Fix or remove
them immediately.
- Duplicating E2E scope – if the test integrates multiple deployed sub-systems with live
network calls, it belongs in the E2E category, not here.
Connection to CD Pipeline
Integration tests form the largest portion of a healthy test suite (the “trophy” or the
middle of the pyramid). They run alongside unit tests in the earliest CI stages:
- Local development – run in watch mode or before committing.
- PR verification – CI executes the full suite; failures block merge.
- Trunk verification – CI reruns on the merged HEAD.
Because they are deterministic and fast, integration tests should always break the build.
A team whose refactors break many tests likely has too few integration tests and too many
fine-grained unit tests. As Kent C. Dodds advises: “Write tests, not too many, mostly
integration.”
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
3 - Functional Tests
Deterministic tests that verify all modules of a sub-system work together from the actor’s perspective, using test doubles for external dependencies.
Definition
A functional test is a deterministic test that verifies all modules of a sub-system are
working together. It introduces an actor – typically a user interacting with the UI or a
consumer calling an API – and validates the ingress and egress of that actor within the
system boundary. External sub-systems are replaced with test doubles to
keep the test deterministic.
Functional tests cover broad-spectrum behavior: UI interactions, presentation logic, and
business logic flowing through the full sub-system. They differ from
end-to-end tests in that side effects are mocked and never cross boundaries
outside the system’s control.
Functional tests are sometimes called component tests.
When to Use
- You need to verify a complete user-facing feature from input to output within a single
deployable unit (e.g., a service or a front-end application).
- You want to test how the UI, business logic, and data layers interact without depending
on live external services.
- You need to simulate realistic user workflows – filling in forms, navigating pages,
submitting API requests – while keeping the test fast and repeatable.
- You are validating acceptance criteria for a user story and want a test that maps
directly to the specified behavior.
If the test needs to reach a live external dependency, it is an E2E test. If it
tests a single unit in isolation, it is a unit test.
Characteristics
| Property |
Value |
| Speed |
Seconds (slower than unit, faster than E2E) |
| Determinism |
Always deterministic |
| Scope |
All modules within a single sub-system |
| Dependencies |
External systems replaced with test doubles |
| Network |
Localhost only |
| Database |
Localhost / in-memory only |
| Breaks build |
Yes |
Examples
A functional test for a REST API using an in-process server and mocked downstream services:
A front-end functional test exercising a login flow with a mocked auth service:
Anti-Patterns
- Using live external services – this makes the test non-deterministic and slow. Use test
doubles for anything outside the sub-system boundary.
- Testing through the database – sharing a live database between tests introduces ordering
dependencies and flakiness. Use in-memory databases or mocked data layers.
- Ignoring the actor’s perspective – functional tests should interact with the system the
way a user or consumer would. Reaching into internal APIs or bypassing the UI defeats the
purpose.
- Duplicating unit test coverage – functional tests should focus on feature-level behavior
and happy/critical paths, not every edge case. Leave permutation testing to unit tests.
- Slow test setup – if spinning up the sub-system takes too long, invest in faster
bootstrapping (in-memory stores, lazy initialization) rather than skipping functional tests.
Connection to CD Pipeline
Functional tests run after unit and integration tests in the pipeline, typically as part of
the same CI stage:
- PR verification – functional tests run against the sub-system in isolation, giving
confidence that the feature works before merge.
- Trunk verification – the same tests run on the merged HEAD to catch conflicts.
- Pre-deployment gate – functional tests can serve as the final deterministic gate before
a build artifact is promoted to a staging environment.
Because functional tests are deterministic, they should break the build on failure.
They are more expensive than unit and integration tests, so teams should focus on
happy-path and critical-path scenarios while keeping the total count manageable.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
4 - End-to-End Tests
Non-deterministic tests that validate the entire software system along with its integration with external interfaces and production-like scenarios.
Definition
End-to-end (E2E) tests validate the entire software system, including its integration with
external interfaces. They exercise complete production-like scenarios using real (or
production-like) data and environments to simulate real-time settings. No test doubles are
used – the test hits live services, databases, and third-party integrations just as a real
user would.
Because they depend on external systems, E2E tests are typically non-deterministic: they
can fail for reasons unrelated to code correctness, such as network instability or
third-party outages.
When to Use
E2E tests should be the least-used test type due to their high cost in execution time and
maintenance. Use them for:
- Happy-path validation of critical business flows (e.g., user signup, checkout, payment
processing).
- Smoke testing a deployed environment to verify that key integrations are functioning.
- Cross-team workflows that span multiple sub-systems and cannot be tested any other way.
Do not use E2E tests to cover edge cases, error handling, or input validation – those
scenarios belong in unit, integration, or
functional tests.
Vertical vs. Horizontal E2E Tests
Vertical E2E tests target features under the control of a single team:
- Favoriting an item and verifying it persists across refresh.
- Creating a saved list and adding items to it.
Horizontal E2E tests span multiple teams:
- Navigating from the homepage through search, item detail, cart, and checkout.
Horizontal tests are significantly more complex and fragile. Due to their large failure
surface area, they are not suitable for blocking release pipelines.
Characteristics
| Property |
Value |
| Speed |
Seconds to minutes per test |
| Determinism |
Typically non-deterministic |
| Scope |
Full system including external integrations |
| Dependencies |
Real services, databases, third-party APIs |
| Network |
Full network access |
| Database |
Live databases |
| Breaks build |
Generally no (see guidance below) |
Examples
A vertical E2E test verifying user lookup through a live web interface:
A browser-based E2E test using a tool like Playwright:
Anti-Patterns
- Using E2E tests as the primary safety net – this is the “ice cream cone” anti-pattern.
E2E tests are slow and fragile; the majority of your confidence should come from unit and
integration tests.
- Blocking the pipeline with horizontal E2E tests – these tests span too many teams and
failure surfaces. Run them asynchronously and review failures out of band.
- Ignoring flaky failures – E2E tests often fail for environmental reasons. Track the
frequency and root cause of failures. If a test is not providing signal, fix it or remove
it.
- Testing edge cases in E2E – exhaustive input validation and error-path testing should
happen in cheaper, faster test types.
- Not capturing failure context – E2E failures are expensive to debug. Capture
screenshots, network logs, and video recordings automatically on failure.
Connection to CD Pipeline
E2E tests run in the later stages of the delivery pipeline, after the build artifact has
passed all deterministic tests and has been deployed to a staging or pre-production
environment:
- Post-deployment smoke tests – a small, fast suite of vertical E2E tests verifies that
the deployment succeeded and critical paths work.
- Scheduled regression suites – broader E2E suites (including horizontal tests) run on a
schedule rather than on every commit.
- Production monitoring – customer experience alarms (synthetic monitoring) are a form of
continuous E2E testing that runs in production.
Because E2E tests are non-deterministic, they should not break the build in most cases. A
team may choose to gate on a small set of highly reliable vertical E2E tests, but must invest
in reducing false positives to make this valuable. CD pipelines should be optimized for rapid
recovery of production issues rather than attempting to prevent all defects with slow,
fragile E2E gates.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
5 - Contract Tests
Non-deterministic tests that validate test doubles by verifying API contract format against live external systems.
Definition
A contract test validates that the test doubles used in
integration tests still accurately represent the real external system.
Contract tests run against the live external sub-system and exercise the portion of the
code that interfaces with it. Because they depend on live services, contract tests are
non-deterministic and should not break the build. Instead, failures should trigger a
review to determine whether the contract has changed and the test doubles need updating.
A contract test validates contract format, not specific data. It verifies that response
structures, field names, types, and status codes match expectations – not that particular
values are returned.
Contract tests have two perspectives:
- Provider – the team that owns the API verifies that all changes are backwards compatible
(unless a new API version is introduced). Every build should validate the provider contract.
- Consumer – the team that depends on the API verifies that they can still consume the
properties they need, following
Postel’s Law: “Be conservative in
what you do, be liberal in what you accept from others.”
When to Use
- You have integration tests that use test doubles (mocks, stubs, recorded
responses) to represent external services, and you need assurance those doubles remain
accurate.
- You consume a third-party or cross-team API that may change without notice.
- You provide an API to other teams and want to ensure that your changes do not break their
expectations (consumer-driven contracts).
- You are adopting contract-driven development, where contracts are defined during design
so that provider and consumer teams can work in parallel using shared mocks and fakes.
Characteristics
| Property |
Value |
| Speed |
Seconds (depends on network latency) |
| Determinism |
Non-deterministic (hits live services) |
| Scope |
Interface boundary between two systems |
| Dependencies |
Live external sub-system |
| Network |
Yes – calls the real dependency |
| Database |
Depends on the provider |
| Breaks build |
No – failures trigger review, not build failure |
Examples
A provider contract test verifying that an API response matches the expected schema:
A consumer-driven contract test using Pact:
Anti-Patterns
- Using contract tests to validate business logic – contract tests verify structure and
format, not behavior. Business logic belongs in functional tests.
- Breaking the build on contract test failure – because these tests hit live systems,
failures may be caused by network issues or temporary outages, not actual contract changes.
Treat failures as signals to investigate.
- Neglecting to update test doubles – when a contract test fails because the upstream API
changed, the test doubles in your integration tests must be updated to match. Ignoring
failures defeats the purpose.
- Running contract tests too infrequently – the frequency should be proportional to the
volatility of the interface. Highly active APIs need more frequent contract validation.
- Testing specific data values – asserting that
name equals "Alice" makes the test
brittle. Assert on types, required fields, and response codes instead.
Connection to CD Pipeline
Contract tests run asynchronously from the main CI build, typically on a schedule:
- Provider side – provider contract tests (schema validation, response code checks) are
often implemented as deterministic unit tests and run on every commit as part of the
provider’s CI pipeline.
- Consumer side – consumer contract tests run on a schedule (e.g., hourly or daily)
against the live provider. Failures are reviewed and may trigger updates to test doubles
or conversations between teams.
- Consumer-driven contracts – when using tools like Pact, the consumer publishes
contract expectations and the provider runs them continuously. Both teams communicate when
contracts break.
Contract tests are the bridge that keeps your fast, deterministic integration test suite
honest. Without them, test doubles can silently drift from reality, and your integration
tests provide false confidence.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
6 - Static Analysis
Code analysis tools that evaluate non-running code for security vulnerabilities, complexity, and best practice violations.
Definition
Static analysis (also called static testing) evaluates non-running code against rules for
known good practices. Unlike other test types that execute code and observe behavior, static
analysis inspects source code, configuration files, and dependency manifests to detect
problems before the code ever runs.
Static analysis serves several key purposes:
- Catches errors that would otherwise surface at runtime.
- Warns of excessive complexity that degrades the ability to change code safely.
- Identifies security vulnerabilities and coding patterns that provide attack vectors.
- Enforces coding standards by removing subjective style debates from code reviews.
- Alerts to dependency issues – outdated packages, known CVEs, license incompatibilities,
or supply-chain compromises.
When to Use
Static analysis should run continuously, at every stage where feedback is possible:
- In the IDE – real-time feedback as developers type, via editor plugins and language
server integrations.
- On save – format-on-save and lint-on-save catch issues immediately.
- Pre-commit – hooks prevent problematic code from entering version control.
- In CI – the full suite of static checks runs on every PR and on the trunk after merge,
verifying that earlier local checks were not bypassed.
Static analysis is always applicable. Every project, regardless of language or platform,
benefits from linting, formatting, and dependency scanning.
Characteristics
| Property |
Value |
| Speed |
Seconds (typically the fastest test category) |
| Determinism |
Always deterministic |
| Scope |
Entire codebase (source, config, dependencies) |
| Dependencies |
None (analyzes code at rest) |
| Network |
None (except dependency scanners) |
| Database |
None |
| Breaks build |
Yes |
Examples
Linting
A .eslintrc.json configuration enforcing test quality rules:
Type Checking
TypeScript catches type mismatches at compile time, eliminating entire classes of runtime
errors:
Dependency Scanning
Tools like npm audit, Snyk, or Dependabot scan for known vulnerabilities:
Types of Static Analysis
| Type |
Purpose |
| Linting |
Catches common errors and enforces best practices |
| Formatting |
Enforces consistent code style, removing subjective debates |
| Complexity analysis |
Flags overly deep or long code blocks that breed defects |
| Type checking |
Prevents type-related bugs, replacing some unit tests |
| Security scanning |
Detects known vulnerabilities and dangerous coding patterns |
| Dependency scanning |
Checks for outdated, hijacked, or insecurely licensed deps |
Anti-Patterns
- Disabling rules instead of fixing code – suppressing linter warnings or ignoring
security findings erodes the value of static analysis over time.
- Not customizing rules – default rulesets are a starting point. Write custom rules for
patterns that come up repeatedly in code reviews.
- Running static analysis only in CI – by the time CI reports a formatting error, the
developer has context-switched. IDE plugins and pre-commit hooks provide immediate feedback.
- Ignoring dependency vulnerabilities – known CVEs in dependencies are a direct attack
vector. Treat high-severity findings as build-breaking.
- Treating static analysis as optional – static checks should be mandatory and enforced.
If developers can bypass them, they will.
Connection to CD Pipeline
Static analysis is the first gate in the CD pipeline, providing the fastest feedback:
- IDE / local development – plugins run in real time as code is written.
- Pre-commit – hooks run linters and formatters, blocking commits that violate rules.
- PR verification – CI runs the full static analysis suite (linting, type checking,
security scanning, dependency auditing) and blocks merge on failure.
- Trunk verification – the same checks re-run on the merged HEAD to catch anything
missed.
- Scheduled scans – dependency and security scanners run on a schedule to catch newly
disclosed vulnerabilities in existing dependencies.
Because static analysis requires no running code, no test environment, and no external
dependencies, it is the cheapest and fastest form of quality verification. A mature CD
pipeline treats static analysis failures the same as test failures: they break the build.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
7 - Test Doubles
Patterns for isolating dependencies in tests: stubs, mocks, fakes, spies, and dummies.
Definition
Test doubles are stand-in objects that replace real production dependencies during testing.
The term comes from the film industry’s “stunt double” – just as a stunt double replaces an
actor for dangerous scenes, a test double replaces a costly or non-deterministic dependency
to make tests fast, isolated, and reliable.
Test doubles allow you to:
- Remove non-determinism by replacing network calls, databases, and file systems with
predictable substitutes.
- Control test conditions by forcing specific states, error conditions, or edge cases that
would be difficult to reproduce with real dependencies.
- Increase speed by eliminating slow I/O operations.
- Isolate the system under test so that failures point directly to the code being tested,
not to an external dependency.
Types of Test Doubles
| Type |
Description |
Example Use Case |
| Dummy |
Passed around but never actually used. Fills parameter lists. |
A required logger parameter in a constructor. |
| Stub |
Provides canned answers to calls made during the test. Does not respond to anything outside what is programmed. |
Returning a fixed user object from a repository. |
| Spy |
A stub that also records information about how it was called (arguments, call count, order). |
Verifying that an analytics event was sent once. |
| Mock |
Pre-programmed with expectations about which calls will be made. Verification happens on the mock itself. |
Asserting that sendEmail() was called with specific arguments. |
| Fake |
Has a working implementation, but takes shortcuts not suitable for production. |
An in-memory database replacing PostgreSQL. |
Choosing the Right Double
- Use stubs when you need to supply data but do not care how it was requested.
- Use spies when you need to verify call arguments or call count.
- Use mocks when the interaction itself is the primary thing being verified.
- Use fakes when you need realistic behavior but cannot use the real system.
- Use dummies when a parameter is required by the interface but irrelevant to the test.
When to Use
Test doubles are used in every layer of deterministic testing:
- Unit tests – nearly all dependencies are replaced with test doubles to
achieve full isolation.
- Integration tests – external sub-systems (APIs, databases, message
queues) are replaced, but internal collaborators remain real.
- Functional tests – dependencies that cross the sub-system boundary
are replaced to maintain determinism.
Test doubles should be used less in later pipeline stages.
End-to-end tests use no test doubles by design.
Examples
A JavaScript stub providing a canned response:
A Java spy verifying interaction:
A fake in-memory repository:
Anti-Patterns
- Mocking what you do not own – wrapping a third-party API in a thin adapter and mocking
the adapter is safer than mocking the third-party API directly. Direct mocks couple your
tests to the library’s implementation.
- Over-mocking – replacing every collaborator with a mock turns the test into a mirror of
the implementation. Tests become brittle and break on every refactor. Only mock what is
necessary to maintain determinism.
- Not validating test doubles – if the real dependency changes its contract, your test
doubles silently drift. Use contract tests to keep doubles honest.
- Complex mock setup – if setting up mocks requires dozens of lines, the system under test
may have too many dependencies. Consider refactoring the production code rather than adding
more mocks.
- Using mocks to test implementation details – asserting on the exact sequence and count
of internal method calls creates change-detector tests. Prefer asserting on observable
output.
Connection to CD Pipeline
Test doubles are a foundational technique that enables the fast, deterministic tests required
for continuous delivery:
- Early pipeline stages (static analysis, unit tests, integration tests) rely heavily on
test doubles to stay fast and deterministic. This is where the majority of defects are
caught.
- Later pipeline stages (E2E tests, production monitoring) use fewer or no test doubles,
trading speed for realism.
- Contract tests run asynchronously to validate that test doubles still match reality,
closing the gap between the deterministic and non-deterministic stages of the pipeline.
The guiding principle from Justin Searls applies: “Don’t poke too many holes in reality.”
Use test doubles when you must, but prefer real implementations when they are fast and
deterministic.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.