This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Applied Testing Strategies

Practical guidance for fully testing eight common component patterns: API providers, API consumers, scheduled jobs, user interfaces, event consumers, event producers, CLI tools and libraries, and stateful services.

A practical guide for fully testing eight common component patterns. Builds on the test-type definitions in Architecting Tests for CD and the deterministic-pipeline model used throughout this site.

This is a set of recommended patterns to consider when designing a test suite, not a prescriptive checklist. The patterns describe shapes of components teams commonly build; the lists of positive cases, negative cases, and pipeline placements are common things to consider for that shape, not an all-inclusive set. Use them as a starting point for the conversation about what your component actually needs.

That said, three goals apply to every pattern:

  1. Cover the positive paths - the component does what it should under expected inputs.
  2. Cover the negative paths - the component fails safely, predictably, and observably under bad inputs, broken dependencies, and adverse conditions.
  3. Validate the test doubles - every double used to keep deterministic tests fast must be backed by a non-deterministic check that the double still matches reality.

If the third point is missing, the first two lie to you over time.

How to use this section

Terminology

Two phrases that look similar but mean different things:

  • Adapter integration test (Toby Clemson’s “integration test”): a narrow test of a single boundary adapter (HTTP client, DB query layer, message-broker client) exercised against the real external dependency or a high-fidelity stand-in. Pins the adapter’s protocol behavior - serialization, deserialization, headers, error mapping - not the behavior of the dependency itself. Runs in-band only when the team has full control over the dependency (typically a per-test testcontainer) and the test is fully deterministic; otherwise runs out-of-band on a schedule.
  • Out-of-band integration check (this site’s Integration Tests): runs out-of-band on a schedule or post-deploy against real external systems. Confirms that doubles used by in-band tests still match reality. Failures trigger review, not a build break.

When this section says bare “integration test,” it’s the gateway flavor unless qualified.

Cross-cutting principles

Six principles apply to every pattern. The first three are short pointers to pages that own the topic; the last three are unique to this section.

1. In-band tests are deterministic; out-of-band checks confirm reality

In-band tests run in the commit-to-deploy pipeline and gate the build. They must be deterministic, which means test doubles replace anything that crosses the component boundary - downstream services, message brokers, schedulers, browsers talking to real backends. Out-of-band checks run on a schedule or post-deploy against the real systems those doubles stand in for. They confirm the doubles still match reality. Failures trigger review or rollback, not a build break. See the architecture in Architecting Tests for CD.

2. Test doubles need their own tests

Every double is traceable to a contract test pinning its claims and an out-of-band check confirming the claims still hold. The mechanics live in Test Doubles.

3. Test through the public interface

Public methods for classes; HTTP routing for services; rendered DOM for UIs; the entrypoint the scheduler invokes for jobs. See Component Tests. Reflection, package-private back doors, and asserting on private state are tested-the-wrong-thing in disguise.

4. Sociable unit tests dominate; solitary unit tests are the narrow exception

Domain logic in a real system lives in how behaviors collaborate, not in any single class. A sociable unit test drives the actual collaborators that implement a domain operation - validators, domain services, repositories backed by an in-memory or testcontainer double - and asserts on the observable outcome of that operation: the response, the persisted state, the event emitted. That is the bulk of the suite. Solitary unit tests are reserved for genuinely complex pure logic with no collaborators worth wiring up - pricing math, parsers, scheduling arithmetic.

Organize the suite around domain operations (“place an order,” “cancel a subscription within the grace period”), not around the classes or methods that happen to implement them. Tests written this way survive refactoring, catch bugs that live in the interactions between collaborators, and document what the component does to a stakeholder who can’t read the code. Tests written one-class-at-a-time with mocks for every collaborator do none of that.

5. Negative paths get equal weight

For every “it works” test, ask: malformed input, dependency timeout, dependency 500, dependency 200-with-malformed-body, slow response, partial write, duplicate request, missing or wrong authn/authz. Negative paths are where production incidents come from.

6. Name tests in domain terms, not implementation terms

A test name is documentation. places_order_with_valid_payment_creates_order_and_emits_OrderPlaced survives refactoring; OrderService.processPayment_returns_PaymentResult does not. The translation rule: if the name only makes sense to someone who has read the code, rewrite it. Highest-ROI change a team can make to an existing suite without any new infrastructure. For more on what to avoid, see Testing Antipatterns.

The layered approach (unit, integration, component, contract, end-to-end) this section builds on comes from Toby Clemson, Testing Strategies in a Microservice Architecture.

1 - Pre-Ship Checklist

Quick audit for any component before it ships. Walk back to the section that needs attention for any item that fails.

Use this as a set of prompts for a quick self-audit, not a list of gates that must all pass. Items that don’t apply to a component can be ignored; items the list doesn’t mention but your component clearly needs should be added. Walk back to the pattern or cross-cutting concern that needs attention for any item that prompts a “we should fix that.”

  • The bulk of the suite is sociable unit tests that exercise how behaviors collaborate to deliver a domain operation. Solitary unit tests are reserved for genuinely complex pure logic.
  • Tests are organized around domain operations, not around classes or methods. Test names read as something a stakeholder would recognize.
  • Every public-interface contract (inbound and outbound) has a contract test running in the pipeline.
  • Classes are tested through their public methods only. No reflection, no test-only visibility relaxations, no asserting on private state.
  • Every consumed external dependency is wrapped in a gateway the team owns; doubles are of the gateway, not of the third-party library.
  • Every boundary adapter has an adapter integration test against the real dependency or a high-fidelity stand-in (testcontainer, WireMock with provider fixtures).
  • The bulk of testing runs in-band in the pipeline and gates the build; out-of-band checks against real systems run on a schedule and trigger review on failure, never a build break.
  • Every test double has a corresponding non-deterministic check that exercises the real dependency on a schedule or post-deploy.
  • Every documented failure mode has a negative test.
  • Every error response has a test that verifies the error envelope, status code, and any side effects (or absence thereof).
  • Time, randomness, and the network are injected, not called directly. No sleep in tests. Use bounded polling or a fake clock.
  • All deterministic tests run pre-commit and in CI Stage 1, and fail the build on failure.
  • All post-deploy integration checks run out of pipeline and trigger review on failure, never blocking a commit.
  • Pipeline gates map to defect sources from the Systemic Defect Fixes catalog. If a defect category has no automated check, that’s a known risk.
  • Authn and authz are tested across every protected endpoint, not as one-offs per feature.
  • Database migrations are tested forward, backward (where supported), and on representative data volume against the production engine.
  • Fixtures are generated from the schema or built through Object Mother / builder helpers, not inline literals.
  • Failure-path tests assert on observability (metric incremented, structured log emitted with correlation ID), not just the response.
  • Per-endpoint perf budgets exist for hot paths; load tests gate production promotion; soak tests run out of pipeline.
  • Flaky tests are quarantined with a dated owner and time-boxed remediation. No permanent quarantine list.
  • The deterministic suite respects the pattern’s time budget (under 5 to 8 minutes per component, under 10 minutes total).

2 - Patterns

Eight common component patterns and how to test each fully. Each page covers what to verify, positive and negative cases, double validation, pipeline placement, and a small code example.

Each page in this subsection covers one component pattern. The structure is the same on every page so you can scan-compare:

  1. What needs covered - the layers of testing the pattern typically benefits from.
  2. Positive test cases - common success behaviors worth testing.
  3. Negative test cases - common failure modes that produce production incidents.
  4. Test double validation - how the doubles in pipeline tests stay honest.
  5. Pipeline placement - where each test type tends to run.
  6. Example - a short code sample illustrating one of the harder cases for that pattern.

These are recommended starting points, not exhaustive lists or required gates. Real components have details these pages don’t capture; ignore items that don’t apply, and add items the pattern doesn’t mention but your component clearly needs. The goal is to prompt the conversation, not to constrain it.

API provider, API consumer, scheduled job, and user interface are covered in depth. Event consumer, event producer, CLI/library, and stateful service are deliberately briefer sketches: the same six principles apply, the same checklist still prompts useful questions, and the test double validation model is the same. Use the briefer sketches as a starting point and expand the depth in your own runbooks for the patterns your services actually use.

The patterns

  • API provider - a backend service exposing an HTTP/gRPC/GraphQL API and owning its own data.
  • API consumer - the above, plus outbound calls to other services. The most failure-prone pattern.
  • Scheduled job - a service triggered on a cron, queue, or external scheduler.
  • User interface - a UI that renders data and accepts user interaction.
  • Event consumer - a service that consumes messages from a broker.
  • Event producer - a service that produces messages to a broker.
  • CLI tool or library - a binary or package consumed by other developers.
  • Stateful service - a service that maintains long-lived in-memory state.

2.1 - API Provider

A backend service that exposes an HTTP/gRPC/GraphQL API and owns its own data. No outbound calls to other services in your control.

A backend service that exposes an HTTP/gRPC/GraphQL API and owns its own data. No outbound calls to other services in your control.

What needs covered

LayerConcernTest type
Domain logicBusiness rules, invariants, state transitionsSolitary unit tests
Module collaborationValidators + repositories + domain working togetherSociable unit tests
Persistence adapterQuery correctness, transaction boundaries, migrations against the real DB engineAdapter integration tests (testcontainers running production engine and version)
Assembled componentRouting, validation, business logic, and persistence wired together through the controller layerComponent tests with persistence either real (testcontainers) or doubled (in-memory repository)
Served APIWhat downstream consumers depend onProvider-side contract tests
API Provider: layers and the tests that cover eachFour architectural layers stacked top to bottom. The first three are inside the component boundary; the fourth (the database) is external, drawn with a dashed border. Each layer band shows its name, a one-line description, and the test types that exercise it as small coloured pills. HTTP and API surface is exercised by component tests and provider contract tests. Domain logic is exercised by solitary unit, sociable unit, and component tests. The persistence adapter is exercised by sociable unit, adapter integration, and component tests. The external database is doubled in component tests (in-memory or testcontainer) and used real in adapter integration tests.API Provider: Layers and the Tests That Cover EachINSIDE THE COMPONENT BOUNDARYHTTP / API surfaceComponentProvider contractRouting, auth, validation, status codesDomain logicSolitary unitSociable unitComponentBusiness rules, invariants, state transitionsPersistence adapterSociable unitAdapter integrationComponentQueries, transactions, migrationscomponent boundaryOUTSIDE THE BOUNDARYDatabase (external)ComponentAdapter integrationProduction engine. Doubled in component tests; real in adapter integration tests.internal layerreal code under testexternal (dashed border)doubled in this test
Layered diagram of an API provider showing four architectural layers stacked top to bottom. The first three are inside the component boundary: HTTP and API surface (covered by component tests and provider contract tests), domain logic (covered by solitary unit, sociable unit, and component tests), and persistence adapter (covered by sociable unit, adapter integration, and component tests). Below the dashed component boundary, the external database is doubled in component tests (in-memory or testcontainer) and used real in adapter integration tests against the production engine.

Positive test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Documented endpoints: return the expected shape and status for valid input.
  • Auth: succeeds for valid credentials and tokens.
  • Pagination, filtering, sorting: all return the documented results.
  • Idempotency: idempotent operations are idempotent; non-idempotent operations create exactly one record.
  • Success-path side effects: events emitted and audit log entries happen on the success path.

Negative test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Malformed body: bad JSON, missing required fields, wrong types, extra fields handled per the documented policy (reject vs. ignore).
  • Out-of-range values: negatives where positives are expected, oversize strings, unicode edge cases.
  • Auth failures: missing token, expired token, valid token with insufficient scope, valid token for a different tenant.
  • Authorization boundaries: user A cannot read or modify user B’s resources.
  • Resource not found: referenced IDs don’t exist, return 404 not 500.
  • Concurrency: two writes to the same resource at once, optimistic-lock conflict handled with the documented status code.
  • Persistence failure: DB unavailable, deadlock, constraint violation. The error envelope is correct and no partial state is committed.
  • Rate limiting and request size limits: both enforce as documented.
  • Idempotency under retry: same idempotency key within the window returns the original result, not a duplicate write.

Test double validation

Doubles in this pattern are mostly around persistence. Two layers keep them honest:

  1. Adapter integration tests run against a real instance of your production database engine (the same major version, same extensions). If component tests use an in-memory SQLite shim while production runs Postgres, the shim is the lie. The adapter integration test exercises every query and migration against a Postgres testcontainer in CI.
  2. Provider-side contract tests verify the API still satisfies every published consumer expectation. See Consumer and Provider Perspectives. Provider verification is where you discover that a “harmless” field rename broke a consumer before that consumer deploys.

Pipeline placement

  • Unit + sociable unit tests: pre-commit and CI Stage 1.
  • Adapter integration tests against testcontainers: CI Stage 1 if fast, Stage 2 otherwise.
  • Component tests: CI Stage 1.
  • Provider-side contract verification: CD Stage 1 (Contract and Boundary Validation).

Example: component test

A flow-oriented component test for an order-placement endpoint. The full app is assembled with an in-memory order repository and an in-memory event bus. The test drives the assembled component through its HTTP handlers and asserts on observable outcomes (status, persisted state, emitted event):

@SpringBootTest
@AutoConfigureMockMvc
class OrderPlacementTest {

  @Autowired MockMvc mvc;
  @Autowired InMemoryOrderRepo orderRepo;
  @Autowired InMemoryEventBus events;

  @Test
  void places_order_with_valid_payment_creates_order_and_emits_OrderPlaced() throws Exception {
    var body = """
      { "items": [{"sku": "A1", "qty": 2}], "paymentToken": "pm_ok" }
      """;

    var result = mvc.perform(post("/orders")
        .header("Authorization", "Bearer tok_valid")
        .contentType(APPLICATION_JSON)
        .content(body))
      .andExpect(status().isCreated())
      .andReturn();

    var orderId = JsonPath.<String>read(result.getResponse().getContentAsString(), "$.id");
    assertThat(orderRepo.findById(orderId)).isPresent();
    assertThat(events.published()).anyMatch(e ->
        e.type().equals("OrderPlaced") && e.orderId().equals(orderId));
  }
}
public class OrderPlacementTests : IClassFixture<WebApplicationFactory<Program>>
{
    private readonly HttpClient client;
    private readonly InMemoryOrderRepo orderRepo = new();
    private readonly InMemoryEventBus events = new();

    public OrderPlacementTests(WebApplicationFactory<Program> factory)
    {
        client = factory.WithWebHostBuilder(b => b.ConfigureServices(s =>
        {
            s.AddSingleton<IOrderRepo>(orderRepo);
            s.AddSingleton<IEventBus>(events);
        })).CreateClient();
    }

    [Fact]
    public async Task Places_order_with_valid_payment_creates_order_and_emits_OrderPlaced()
    {
        client.DefaultRequestHeaders.Authorization = new("Bearer", "tok_valid");
        var body = new { items = new[] { new { sku = "A1", qty = 2 } }, paymentToken = "pm_ok" };

        var response = await client.PostAsJsonAsync("/orders", body);

        response.StatusCode.Should().Be(HttpStatusCode.Created);
        var created = await response.Content.ReadFromJsonAsync<OrderCreated>();
        orderRepo.FindById(created!.Id).Should().NotBeNull();
        events.Published.Should().Contain(e =>
            e.Type == "OrderPlaced" && e.OrderId == created.Id);
    }
}
import request from "supertest";
import { buildApp } from "./app.js";
import { InMemoryOrderRepo } from "./test/in-memory-order-repo.js";
import { InMemoryEventBus } from "./test/in-memory-event-bus.js";

test("places order with valid payment creates order and emits OrderPlaced", async () => {
  const orderRepo = new InMemoryOrderRepo();
  const events = new InMemoryEventBus();
  const app = buildApp({ orderRepo, events });

  const res = await request(app)
    .post("/orders")
    .set("Authorization", "Bearer tok_valid")
    .send({ items: [{ sku: "A1", qty: 2 }], paymentToken: "pm_ok" });

  expect(res.status).toBe(201);
  expect(orderRepo.findById(res.body.id)).toBeDefined();
  expect(events.published).toContainEqual(
    expect.objectContaining({ type: "OrderPlaced", orderId: res.body.id })
  );
});

The test asserts on what a real caller can observe, not on private methods or call sequences inside the controller.

2.2 - API Consumer

An API provider that also consumes one or more upstream APIs. The most failure-prone pattern in distributed systems and the one that gets the most testing attention.

Same as API provider, plus outbound HTTP/gRPC calls to services the team does not own (or does own but deploys independently). This is the most failure-prone pattern in distributed systems and gets the most testing attention.

What needs covered

Everything from the API provider pattern, plus:

LayerConcernTest type
Outbound HTTP clientRequest shape, response parsing, status code handling, header propagation, timeout enforcementAdapter integration tests (against WireMock or, periodically, the real downstream)
Consumed API contractThe fields and status codes the consumer depends onConsumer-side contract tests
Resilience under degraded dependenciesRetries, circuit breaking, backoff, fallback, partial-failure compensationComponent tests with fault-injecting client doubles
Composite behaviorThe service still returns useful responses when downstreams misbehaveComponent tests
API Consumer: layers and the tests that cover eachSeven architectural layers stacked top to bottom. The first five (HTTP and API surface, domain logic and orchestration, resilience policy, outbound HTTP client, and persistence adapter) are inside the component boundary. Below the dashed component boundary, the external database and the external downstream service are drawn with dashed borders. Each band shows its name, a one-line description, and the test types that exercise it as small coloured pills. Component tests cover all internal layers including resilience, with both downstream service and database doubled. Adapter integration tests pin the outbound and persistence protocols against real containers. Consumer contract tests pin the outbound boundary. Out-of-band integration tests exercise the real downstream service to confirm doubles still match reality.API Consumer: Layers and the Tests That Cover EachINSIDE THE COMPONENT BOUNDARYHTTP / API surface (inbound)ComponentRouting, auth, validationDomain logic and orchestrationSolitary unitSociable unitComponentComposes calls, computes resultsResilience policyComponentRetry, circuit breaker, timeout, fallbackOutbound HTTP clientAdapter integrationConsumer contractComp.Request build, response parse, headers, deadlinesPersistence adapterSociable unitComponentQueries, transactions, migrationscomponent boundaryOUTSIDE THE BOUNDARYDatabase (external)ComponentAdapter integ.Production engine. Doubled in component; real in adapter integration.Downstream service (external)ComponentAdapter integ.OOB integrationThird-party or in-house API. Doubled in pipeline tests; OOB integration uses the real downstream on a schedule.internal layerreal code under testexternal (dashed border)doubled in this test
Layered diagram of an API consumer with seven architectural layers. The first five (HTTP and API surface, domain logic and orchestration, resilience policy, outbound HTTP client, persistence adapter) are inside the component boundary. Below the dashed boundary, the external database and the external downstream service are drawn with dashed borders. Component tests cover every internal layer including resilience, with both database and downstream service doubled. Adapter integration tests pin the outbound and persistence protocols against real containers. Consumer contract tests pin the outbound boundary. Out-of-band integration tests exercise the real downstream service to confirm doubles still match reality.

Positive test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Outbound call: constructs the right URL, headers, body, auth, and timeout.
  • Success response: parsed correctly, including optional fields and unknown fields per Postel’s Law.
  • Multi-call composition: multiple downstream calls in sequence or parallel produce the documented composite response.
  • Caching: returns the cached value within TTL and refreshes after.
  • Trace context: propagates downstream.

Negative test cases

Common cases to consider, not an exhaustive list. The bulk of the negative testing happens here, and it’s where most production incidents originate. Drive each failure mode through a client double that simulates it.

  • Timeout (downstream exceeds configured deadline): the deadline enforces; the upstream caller gets the documented response (e.g., 504); no partial state is committed. Use a client double that delays past the deadline.
  • Connection refused: retry policy executes the documented count and backoff; falls over to fallback or returns an error. Use a client double that rejects the connection.
  • 5xx responses (500, 502, 503): retry only on retryable codes. Use a client double that returns 5xx.
  • 4xx responses (400, 401, 403, 404, 409, 422, 429): each maps to documented behavior; 4xx generally not retried; 429 respects Retry-After. Use a client double that returns each code.
  • Slow response within timeout: performance-budget assertions hold if the service has SLO commitments. Use a client double that delays within the deadline.
  • Malformed response body: the response is rejected, not silently coerced. Use a client double that returns a truncated or wrong-type body.
  • Schema drift (extra or missing fields): extra fields tolerated; missing required fields detected with a clear error. Use a client double that returns a drifted body.
  • Wrong status code (200 with error body, 500 with success body): the client trusts the status code, not the body. Use a client double that returns mismatched status and body.
  • Circuit open: the circuit opens under sustained failure; fast-fails subsequent calls; recovers on a half-open probe. Use a client double that sustains failures.
  • Partial multi-call failure: compensation, rollback, or documented partial-success behavior. First client double succeeds, second fails.

Test double validation

This is where the “doubles need tests” rule lives or dies. Four layers:

  1. Consumer-side contract tests run in the pipeline on every commit using doubles. They pin the request the consumer sends and the response shape the consumer depends on. Contract artifacts are published to a broker. Fast, deterministic, blocks the build.
  2. Adapter integration tests exercise the outbound HTTP client against the real dependency in a controlled state - typically a testcontainer running an in-house service the team owns. They verify the adapter code correctly speaks the protocol: serialization, deserialization, header handling, timeout behavior, error mapping. The test asserts the adapter’s correctness, not the dependency’s behavior: if the test asks for a user, it validates that the response parses into a valid User, not which user was returned. For third-party dependencies the team can’t run in a controlled state, run these tests out-of-band on a schedule. WireMock loaded with provider-supplied fixtures is a useful complement but functions more like a contract test against recorded shapes than an integration test against the live protocol.
  3. Provider-side contract verification runs in the provider’s pipeline. The provider executes every consumer’s published contract against the real provider implementation. Breaking changes are caught at the source before the provider deploys.
  4. Post-deploy integration check runs periodically against the real downstream in a non-production environment. Same fixtures used in contract tests. Catches drift in fields the contract didn’t pin, version skew, environment differences. Failures trigger review, not a build break. See Out-of-Pipeline Verification.

For third-party APIs you do not control, there is no provider verification step. The post-deploy check against the live (or sandbox) API is the only mechanism keeping doubles honest. Run it more often than for in-house dependencies. Daily at minimum.

The anti-pattern to avoid: stubbing the third-party SDK directly. Always wrap third-party clients in a thin adapter the team owns, then double the adapter. This is called out explicitly as Mocking what you don’t own and is the single most common source of “but it worked in tests” incidents.

Pipeline placement

Same as the API provider pattern, plus:

  • Consumer-side contract tests: pre-commit and CI Stage 1.
  • Adapter integration tests for the outbound HTTP client against an in-house dependency the team controls (a testcontainer running the team’s own service in a known state): CI Stage 1 or Stage 2.
  • Adapter integration tests against a third-party API or a service owned by another team: out-of-band on a schedule, never in-band. The risk of a flaky external service blocking deploys outweighs any in-band coverage benefit, and adapter tests with WireMock fixtures already cover the team’s adapter code.
  • Resilience component tests with fault injection: CI Stage 1.
  • Post-deploy integration checks against real downstreams: out of pipeline, on a schedule.

Example: fault injection at the client double

A negative-path test for downstream timeout. The payment client double simulates a slow response, the test asserts the deadline enforces and the upstream caller gets the documented error envelope:

@SpringBootTest
@AutoConfigureMockMvc
class PaymentTimeoutTest {

  @Autowired MockMvc mvc;
  @Autowired InMemoryOrderRepo orderRepo;
  @MockBean PaymentsGateway payments;

  @Test
  void returns_504_when_payment_service_exceeds_deadline() throws Exception {
    when(payments.charge(any())).thenAnswer(inv -> {
      Thread.sleep(50);
      throw new UpstreamTimeoutException("payments");
    });

    var body = """
      { "items": [{"sku": "A1", "qty": 1}], "paymentToken": "pm_ok" }
      """;

    mvc.perform(post("/orders")
        .header("Authorization", "Bearer tok_valid")
        .contentType(APPLICATION_JSON)
        .content(body))
      .andExpect(status().isGatewayTimeout())
      .andExpect(jsonPath("$.error.code").value("UPSTREAM_TIMEOUT"));

    assertThat(orderRepo.all()).isEmpty();
  }
}
public class PaymentTimeoutTests : IClassFixture<WebApplicationFactory<Program>>
{
    private readonly HttpClient client;
    private readonly InMemoryOrderRepo orderRepo = new();
    private readonly Mock<IPaymentsGateway> payments = new();

    public PaymentTimeoutTests(WebApplicationFactory<Program> factory)
    {
        payments.Setup(p => p.ChargeAsync(It.IsAny<ChargeRequest>()))
            .Returns(async () =>
            {
                await Task.Delay(50);
                throw new UpstreamTimeoutException("payments");
            });

        client = factory.WithWebHostBuilder(b => b.ConfigureServices(s =>
        {
            s.AddSingleton<IOrderRepo>(orderRepo);
            s.AddSingleton(payments.Object);
        })).CreateClient();
    }

    [Fact]
    public async Task Returns_504_when_payment_service_exceeds_deadline()
    {
        client.DefaultRequestHeaders.Authorization = new("Bearer", "tok_valid");
        var body = new { items = new[] { new { sku = "A1", qty = 1 } }, paymentToken = "pm_ok" };

        var response = await client.PostAsJsonAsync("/orders", body);

        response.StatusCode.Should().Be(HttpStatusCode.GatewayTimeout);
        var error = await response.Content.ReadFromJsonAsync<ErrorEnvelope>();
        error!.Error.Code.Should().Be("UPSTREAM_TIMEOUT");
        orderRepo.All().Should().BeEmpty();
    }
}
test("returns 504 when payment service exceeds deadline", async () => {
  const slowPayments = {
    charge: () => new Promise((_, reject) => {
      setTimeout(() => reject(new TimeoutError("payments")), 50);
    })
  };
  const orderRepo = new InMemoryOrderRepo();
  const app = buildApp({ orderRepo, payments: slowPayments, deadlineMs: 30 });

  const res = await request(app)
    .post("/orders")
    .set("Authorization", "Bearer tok_valid")
    .send({ items: [{ sku: "A1", qty: 1 }], paymentToken: "pm_ok" });

  expect(res.status).toBe(504);
  expect(res.body.error.code).toBe("UPSTREAM_TIMEOUT");
  expect(orderRepo.all()).toHaveLength(0);
});

The test verifies three things at once: the documented status code, the structured error body the API contract promises, and that no partial state was committed.

2.3 - Scheduled Job

A service triggered on a cron, queue, or external scheduler. Reads from data sources, writes reports or updates state.

A job that runs on a cron, queue, or external scheduler. Reads from data sources, writes reports or updates state. Often has no inbound API surface. The entrypoint is the scheduler.

This pattern has two test design challenges that the API provider and API consumer patterns don’t have: time and data volume.

What needs covered

LayerConcernTest type
Pure transformation logicThe data calculation itself, with no I/OSolitary unit tests
Source and sink adaptersReading from sources, writing to sinks: protocol correctness, error mappingAdapter integration tests against real source/sink containers or WireMock
Job orchestrationIdempotency, partial failure recovery, checkpointing, locking, time-window logicComponent tests through the job’s invocation entrypoint, with client doubles, source/sink doubles, and an injected clock
Process startupExit codes, signal handling, configuration loading, real environment wiringDeployed-binary tests that invoke the real artifact
Scheduling integrationThe scheduler triggers the right entrypoint with the right arguments, environment, secrets, and concurrency settingsOut-of-band integration check against the real scheduler in a non-prod environment
ObservabilityJob ran, succeeded/failed, duration, records processed, error countAssertions in component tests
Scheduled Job: layers and the tests that cover eachSix architectural layers stacked top to bottom. The first four (pure transformation logic, job orchestration, source and sink adapters, and process startup) are inside the component boundary. Below the dashed component boundary, the external source and sink and the external scheduler and system clock are drawn with dashed borders. Each band shows its name, a one-line description, and the test types that exercise it as small coloured pills. Solitary unit tests cover pure transformation. Component tests cover orchestration with the clock and gateways doubled. Adapter integration tests pin source and sink protocols against real containers. Deployed-binary tests cover process startup on the actual artifact. Out-of-band integration uses the real scheduler and clock on a schedule.Scheduled Job: Layers and the Tests That Cover EachINSIDE THE COMPONENT BOUNDARYPure transformation logicSolitary unitComponentThe data calculation, no I/OJob orchestrationComponentIdempotency, locking, time windows, checkpointing (injected clock)Source and sink adaptersAdapter integrationSource/sink contractComp.Protocol, error mapping, transactional semanticsProcess startupDeployed binaryExit codes, signal handling, config and secret loading, lock acquisitioncomponent boundaryOUTSIDE THE BOUNDARYExternal source and sinkComponentAdapter integ.OOB integ.Data store, broker, file system. Doubled in component; real in adapter integration.External scheduler and system clockComponent (clock doubled)OOB integ.Cron expression, env, secrets, time-zone wiring. OOB check uses the real scheduler and clock.internal layerreal code under testexternal (dashed border)doubled in this test
Layered diagram of a scheduled job with six architectural layers. The first four (pure transformation logic, job orchestration, source and sink gateways, process startup) are inside the component boundary. Below the dashed boundary, the external source and sink and the external scheduler and system clock are drawn with dashed borders. Solitary unit tests cover pure transformation. Component tests cover orchestration with the clock and gateways doubled. Adapter integration tests pin source and sink protocols against real containers. Deployed-binary tests cover process startup on the actual artifact the scheduler will invoke. Out-of-band integration uses the real scheduler and clock on a schedule.

Process startup matters more here than for an API service, because scheduled jobs typically have non-trivial startup behavior (config loading, secret resolution, lock acquisition) that a component test with the SUT in-memory can bypass. The right shape is many component tests for behavior, plus one or two tests that invoke the actual deployed binary the scheduler will invoke.

Positive test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • End-to-end run: with representative input, produces the expected output (report file, database update, message published).
  • Idempotency: running the job twice for the same logical period produces the same result, not duplicates.
  • Checkpointing: a job that processes a stream resumes from the last checkpoint, not from scratch.
  • Time windows: “yesterday’s data” computes correctly for various reference times, especially around DST, month boundaries, and year boundaries.
  • Empty input: zero records produces a valid empty report, not an error.
  • Output format: the report or message conforms to the documented schema.

Negative test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Source unavailable: DB down, source API returning 5xx. Verify the job fails cleanly with a documented exit code/status, doesn’t write partial output, and is safely re-runnable.
  • Sink unavailable: destination DB or message broker rejects writes. Verify no source state changes (e.g., “marked as processed”) happen if the sink fails.
  • Partial-write failure: half the batch writes successfully, then the connection drops. Verify the next run reprocesses the failed half without duplicating the successful half. This is where idempotency keys, transactional outboxes, or compensating reads earn their keep.
  • Slow job: job exceeds its expected runtime. Verify it surfaces as alertable, doesn’t silently overlap with the next scheduled run, and that the lock prevents concurrent execution.
  • Malformed source data: null where non-null was expected, wrong type, encoding issues. Verify the bad record is logged with enough context to investigate, and the job decides per its policy: skip, dead-letter, or fail the whole run. The choice is design; the test pins it.
  • Time-zone bugs: the job runs at 02:30 UTC for a “daily” report. What does it do on the day clocks shift? Test it. Use the injected clock so the test deterministically simulates the boundary.
  • Concurrent run: the previous run hadn’t finished when the next was triggered. Verify the lock prevents overlap or, if overlap is acceptable, that the work is partitioned correctly.
  • Crash mid-run: kill -9 in the middle of processing. Verify on restart the job resumes from a consistent state.
  • Schema drift on source: a new field appears or a field changes type. Verify per the contract policy.

Test double validation

Three classes of doubles need validation, each through a different mechanism:

  1. The injected clock. Every in-band test that depends on “now” uses an injected clock. Validate it with one out-of-band check that runs against the real system clock, exercises a known time-window calculation, and confirms the production wiring of the clock dependency is correct. This catches the “tests use UTC, prod uses container local time” class of bug.
  2. Source and sink gateways. Same model as the API consumer pattern. Adapter integration tests in the pipeline exercise each gateway against a real source/sink container or WireMock. Contract tests pin the shape. Post-deploy integration checks confirm the doubles still match the real systems on a schedule.
  3. The scheduler trigger. The doubled trigger in component tests must match what the real scheduler invokes. Verify with a post-deploy integration check that runs the real scheduler against a deployed instance in a non-prod environment and confirms the entrypoint is found, the cron expression fires at the expected times, environment variables and secrets resolve, and the concurrency policy holds. This is the test that catches “passed in CI, didn’t run in prod because the cron expression had a typo.”

Pipeline placement

  • Unit and component tests: CI Stage 1.
  • Adapter integration tests for the source and sink adapters: CI Stage 1 or Stage 2.
  • Contract tests for each source and sink: CI Stage 1.
  • Component tests of the deployed binary (small set): CI Stage 1 or Stage 2.
  • Real-clock and real-scheduler integration check: out of pipeline, scheduled, against a non-prod environment.
  • Post-deploy: a synthetic invocation of the job in production that verifies it ran, processed records, and met its SLO.

Example: time-window logic with an injected clock

A test that pins the daily-report window calculation around a DST boundary. The clock is injected so the test deterministically simulates the moment of interest. source and sink are field-level fakes set up in the test class with seeded data for 2026-03-08 and 2026-03-09.

@Test
void daily_report_run_after_DST_spring_forward_uses_correct_window() {
  Clock fixedClock = Clock.fixed(
      Instant.parse("2026-03-09T07:30:00Z"),
      ZoneOffset.UTC);
  ReportJob job = new ReportJob(fixedClock, source, sink);

  job.run();

  Report emitted = sink.lastReport();
  assertThat(emitted.windowStart())
      .isEqualTo(Instant.parse("2026-03-08T05:00:00Z"));
  assertThat(emitted.windowEnd())
      .isEqualTo(Instant.parse("2026-03-09T05:00:00Z"));
  assertThat(emitted.recordsProcessed())
      .isEqualTo(source.recordsForDay("2026-03-08"));
}
[Fact]
public void Daily_report_run_after_DST_spring_forward_uses_correct_window()
{
    var fixedClock = new FakeClock(DateTimeOffset.Parse("2026-03-09T07:30:00Z"));
    var job = new ReportJob(fixedClock, source, sink);

    job.Run();

    var emitted = sink.LastReport();
    emitted.WindowStart.Should().Be(DateTimeOffset.Parse("2026-03-08T05:00:00Z"));
    emitted.WindowEnd.Should().Be(DateTimeOffset.Parse("2026-03-09T05:00:00Z"));
    emitted.RecordsProcessed.Should().Be(source.RecordsForDay("2026-03-08"));
}
test("daily report run after DST spring forward uses correct window", () => {
  const fixedClock = { now: () => new Date("2026-03-09T07:30:00Z") };
  const job = new ReportJob({ clock: fixedClock, source, sink });

  job.run();

  const emitted = sink.lastReport();
  expect(emitted.windowStart).toEqual(new Date("2026-03-08T05:00:00Z"));
  expect(emitted.windowEnd).toEqual(new Date("2026-03-09T05:00:00Z"));
  expect(emitted.recordsProcessed).toBe(source.recordsForDay("2026-03-08"));
});

A separate out-of-band check runs the deployed binary against the real system clock once, to verify the production wiring of the clock dependency matches the doubled clock used here.

2.4 - User Interface

A UI that renders data and accepts user interaction. Talks to one or more backend APIs.

A UI that renders data and accepts user interaction. Talks to one or more backend APIs.

What needs covered

LayerConcernTest type
Pure renderingComponent renders given props/stateSolitary unit tests
Component compositionComposed components wire correctlySociable unit tests
Feature behaviorA flow (login, checkout, search) works through the rendered DOM with the backend stubbed at the network layerComponent tests driven by Playwright with the team’s unit-testing framework as the runner
Backend contractWhat the UI sends and expects from each backend endpointConsumer-side contract tests
End-to-end happy pathsA small number of critical journeys against real backendsE2E tests, post-deploy
Visual regressionThe UI looks rightSnapshot or visual diff tests
AccessibilityThe UI works for assistive tech and keyboard usersAssertions in component tests + automated WCAG scanning
User Interface: layers and the tests that cover eachFive architectural layers stacked top to bottom. The first four (pure rendering, component composition, feature behaviour in the rendered DOM, and backend HTTP client) are inside the component boundary. Below the dashed component boundary, the external backend API is drawn with a dashed border. Each band shows its name, a one-line description, and the test types that exercise it as small coloured pills. Solitary unit tests cover pure rendering. Sociable unit tests cover composition. Component tests driven by Playwright with the team's unit-testing framework cover feature behaviour with the backend stubbed at the network layer. Consumer contract tests pin each backend boundary. End-to-end tests run post-deploy in a real browser against the real backend.User Interface: Layers and the Tests That Cover EachINSIDE THE COMPONENT BOUNDARYPure renderingSolitary unitComponentA component renders given props or stateComponent compositionSociable unitComponentComposed components wire correctlyFeature behaviour in the rendered DOMComponentFlows, form validation, a11y assertions, error UX (Playwright + unit-test framework)Backend HTTP clientComponentConsumer contractFetch / request build, response parse, retry, auth headerscomponent boundaryOUTSIDE THE BOUNDARYBackend API (external)ComponentE2EStubbed via page.route in component tests; real in E2E smoke.internal layerreal code under testexternal (dashed border)doubled in this test
Layered diagram of a user interface with five architectural layers. The first four (pure rendering, component composition, feature behavior in the rendered DOM, backend HTTP client) are inside the component boundary. Below the dashed boundary, the external backend API is drawn with a dashed border. Solitary unit tests cover pure rendering. Sociable unit tests cover composition. Component tests driven by Playwright cover feature behavior with the backend doubled at the network layer. Consumer contract tests pin each backend boundary. End-to-end tests run post-deploy against the real backend.

UI component tests run in a real browser engine (Chromium, Firefox, WebKit) driven by Playwright, with the team’s existing unit-testing framework (Vitest, Jest, or whatever is already in the project) as the runner. In-memory renderer shortcuts like JSDOM are rejected: they trade accuracy for speed and produce false greens around layout, focus, event timing, Intersection Observer, and animations - exactly the surface where UI bugs live. Playwright’s headless Chromium starts in milliseconds and runs the suite fast enough to use as the default. Backends are stubbed at the network layer with page.route so the same fixtures drive component tests today and end-to-end smoke tests later.

Positive test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Critical flows: a user can complete each documented critical flow via keyboard and via mouse.
  • Forms: accept valid input, submit, and show success.
  • Loading states: render while the backend is in flight.
  • Empty, populated, and overflow states: all render correctly.
  • Internationalization: the UI renders with longer translations and right-to-left scripts.
  • Responsive layouts: render at the documented breakpoints.

Negative test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Backend errors: for every API call the UI makes, what does the user see for 4xx, 5xx, network failure, timeout? Test each. The most common UI bug is “spins forever on error.”
  • Form validation: required fields, format errors, length limits, cross-field rules. Each shows a specific, actionable message that’s announced to screen readers.
  • Authentication expiry: token expires mid-session. Verify the user is sent through the documented re-auth flow, not silently dropped.
  • Permission denied: the user navigates to a page they cannot access. Verify the documented response (redirect, “not authorized,” etc.).
  • Stale data: a list rendered, then a delete on another tab, then the user clicks the deleted item. Verify the documented refresh or error behavior.
  • Slow network: every interaction has a documented behavior at 3G speeds. Verify with throttled fixtures.
  • Concurrent edit: two users editing the same record. Verify the optimistic-lock UX behaves as documented.
  • Browser back button: the back button is a public interface. Test it.
  • Accessibility violations: automated WCAG scan in component tests catches missing labels, contrast failures, ARIA misuse on every commit. Don’t defer to quarterly audits.

Test double validation

Backend doubles in component tests must match the real backends. Same mechanism as the API consumer pattern: the UI is a consumer, every backend it talks to is a provider. Consumer-driven contracts run on every commit; provider verification runs in the backend’s pipeline. Post-deploy E2E smoke tests against the real backend close the loop on drift the contract didn’t pin.

Because UI component tests run in a real browser engine, there is no renderer-level double to validate. The browser is the production renderer, just headless. The remaining gap is between the stubbed backend and the real backend, which the out-of-band E2E suite covers. Out-of-band failures trigger review, not a build break.

Pipeline placement

  • Unit tests (rendering, composition): CI Stage 1.
  • Component tests in headless browser (including a11y assertions): CI Stage 1.
  • Visual regression: CI Stage 1 if fast, CI Stage 2 if slow.
  • Consumer-side contract tests for each backend: CI Stage 1.
  • E2E happy-path smoke tests against real backends: post-deploy, in a production-like environment, blocking the rollout but not the build.
  • Real user monitoring + synthetic transactions: continuously in production.

Example: UI component test for an error path

A flow-oriented test for the checkout error path. Playwright drives a headless browser; the backend is stubbed at the network layer with page.route; the team’s existing unit-testing framework (Vitest, JUnit, xUnit) runs the test. The assertion: the user sees a documented error message and the spinner does not get stuck.

@Test
void shows_error_and_clears_spinner_when_checkout_fails_with_500() {
  try (Playwright playwright = Playwright.create();
       Browser browser = playwright.chromium().launch()) {
    Page page = browser.newPage();

    page.route("**/api/checkout", route ->
        route.fulfill(new Route.FulfillOptions()
            .setStatus(500)
            .setContentType("application/json")
            .setBody("{\"error\":{\"code\":\"INTERNAL\"}}")));

    page.navigate("http://localhost:3000/checkout");
    page.getByRole(AriaRole.BUTTON,
        new Page.GetByRoleOptions().setName("Place order")).click();

    assertThat(page.getByRole(AriaRole.ALERT))
        .containsText("Something went wrong, please try again");
    assertThat(page.getByRole(AriaRole.STATUS)).not().isVisible();
  }
}
[Fact]
public async Task Shows_error_and_clears_spinner_when_checkout_fails_with_500()
{
    using var playwright = await Playwright.CreateAsync();
    await using var browser = await playwright.Chromium.LaunchAsync();
    var page = await browser.NewPageAsync();

    await page.RouteAsync("**/api/checkout", route => route.FulfillAsync(new()
    {
        Status = 500,
        ContentType = "application/json",
        Body = "{\"error\":{\"code\":\"INTERNAL\"}}"
    }));

    await page.GotoAsync("http://localhost:3000/checkout");
    await page.GetByRole(AriaRole.Button, new() { Name = "Place order" })
        .ClickAsync();

    await Expect(page.GetByRole(AriaRole.Alert))
        .ToContainTextAsync("Something went wrong, please try again");
    await Expect(page.GetByRole(AriaRole.Status)).Not.ToBeVisibleAsync();
}
import { test, expect, beforeAll, afterAll } from "vitest";
import { chromium } from "playwright";

let browser;

beforeAll(async () => { browser = await chromium.launch(); });
afterAll(async () => { await browser.close(); });

test("shows error and clears spinner when checkout fails with 500", async () => {
  const page = await browser.newPage();

  await page.route("**/api/checkout", route =>
    route.fulfill({
      status: 500,
      contentType: "application/json",
      body: JSON.stringify({ error: { code: "INTERNAL" } }),
    })
  );

  await page.goto("http://localhost:3000/checkout");
  await page.getByRole("button", { name: /place order/i }).click();

  await expect(page.getByRole("alert"))
    .toContainText(/something went wrong, please try again/i);
  await expect(page.getByRole("status")).not.toBeVisible();
});

The test exercises the rendered DOM the way a real user would. Intercepting at the network layer with page.route keeps the same fixtures reusable when the component test gets promoted to an end-to-end smoke test against the real backend.

2.5 - Event Consumer

A service that consumes messages from a broker (Kafka, SQS, RabbitMQ, Pub/Sub). Brief sketch.

A consumer of messages from Kafka, SQS, RabbitMQ, Pub/Sub, or similar. Reads messages, processes them, often updates state and produces downstream messages. The “public interface” is the topic or queue and the schema of messages on it.

This pattern has problems the API provider and API consumer patterns don’t have: ordering, replay, poison messages, dead-letter queues, and delivery semantics (at-most-once, at-least-once, exactly-once-with-effort).

What needs covered

LayerConcernTest type
Message handlerPure transformation per messageSolitary unit tests
IdempotencySame message twice produces the same effectIn-process component tests
Poison message handlingMalformed message goes to DLQ, doesn’t crash the consumerIn-process component tests
OrderingOut-of-order messages produce documented outcomesIn-process component tests
BackpressureConsumer slows when downstream is slowResilience component tests
Broker contractTopic, schema, headersContract tests
Broker clientReal protocol behavior, offset commits, consumer group rebalancingAdapter integration tests against a real broker container
Event Consumer: layers and the tests that cover eachSix architectural layers stacked top to bottom. The first five (message handler logic, idempotency and ordering, dead-letter and poison-message handling, backpressure, and broker client) are inside the component boundary. Below the dashed component boundary, the external broker and schema registry are drawn with a dashed border. Each band shows its name, a one-line description, and the test types that exercise it as small coloured pills. Solitary unit tests cover handler logic. Component tests cover idempotency, dead-letter handling, ordering, and backpressure with the broker doubled. Adapter integration tests pin the broker protocol against a real broker container. Broker contract tests pin the topic, schema, and headers. Out-of-band synthetic publish confirms doubles still match the real broker.Event Consumer: Layers and the Tests That Cover EachINSIDE THE COMPONENT BOUNDARYMessage handler logicSolitary unitComponentPure transformation per messageIdempotency and orderingComponentDuplicate delivery absorbed; ordering policy enforcedDead-letter and poison-message handlingComponentMalformed message routed to DLQ with correlation ID; consumer survivesBackpressureComponentConsumer slows when downstream is slow; offsets uncommitted on failureBroker clientAdapter integrationBrk.Comp.Protocol, offset commits, consumer-group rebalancingcomponent boundaryOUTSIDE THE BOUNDARYExternal broker and schema registryComponentAdapter integ.Broker contractOOBDoubled in component; real in adapter integration; OOB synthetic publish on a schedule.internal layerreal code under testexternal (dashed border)doubled in this test
Layered diagram of an event consumer with six architectural layers. The first five (message handler logic, idempotency and ordering, dead-letter and poison-message handling, backpressure, broker client) are inside the component boundary. Below the dashed boundary, the external broker and schema registry are drawn with a dashed border. Solitary unit tests cover handler logic. Component tests cover idempotency, dead-letter handling, ordering, and backpressure with the broker doubled. Adapter integration tests pin the broker protocol against a real broker container. Broker contract tests pin the topic, schema, and headers. Out-of-band synthetic publish confirms the doubles still match the real broker.

Positive test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Well-formed message: produces the expected state change and the documented downstream events.
  • Batch processing: processes per documented policy.
  • Replay from offset: reproduces the same end state.
  • Documented schema versions: are accepted.

Negative test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Malformed message: routes to the DLQ with a correlation ID; the consumer survives.
  • Duplicate delivery: absorbed by idempotency.
  • Out-of-order delivery: follows the documented behavior.
  • Mid-batch downstream failure: the offset is left uncommitted.
  • Schema-version skew: handled per the documented policy.
  • Slow downstream: applies backpressure rather than OOM.
  • Consumer-group rebalance during processing: no in-flight messages are stranded.

Test double validation

The broker double in component tests is validated by adapter integration tests against a real broker container the team controls (Kafka in Docker, ElasticMQ for SQS, Redpanda in Docker). The test exercises the broker client adapter against that controlled instance and asserts the adapter speaks the protocol correctly - it does not assert anything about which messages the broker returns or in what order; that is the broker’s behavior, not the adapter’s. Schema registry double is validated by contract tests pinning each version, plus a post-deploy check against the real registry. Post-deploy synthetic publishes a known message to the real topic in a non-prod environment.

Pipeline placement

Handler unit tests and component tests run in CI Stage 1; adapter integration tests against a team-controlled broker container in CI Stage 1 or Stage 2; adapter integration tests against a managed broker the team can’t pin to a known state run out-of-band on a schedule, alongside the post-deploy synthetic.

Example: idempotency under duplicate delivery

Money.usd takes minor units (cents); 4250 represents $42.50.

@Test
void same_message_processed_twice_creates_one_payment_record() {
  PaymentEvent event = new PaymentEvent(
      "evt-9f12", OrderId.of("ord-001"), Money.usd(4250));
  PaymentRepo repo = new InMemoryPaymentRepo();
  PaymentEventHandler handler = new PaymentEventHandler(repo);

  handler.handle(event);
  handler.handle(event);

  assertThat(repo.findByEventId("evt-9f12")).hasSize(1);
  assertThat(repo.totalForOrder(OrderId.of("ord-001"))).isEqualTo(Money.usd(4250));
}
[Fact]
public void Same_message_processed_twice_creates_one_payment_record()
{
    var evt = new PaymentEvent("evt-9f12", OrderId.Of("ord-001"), Money.Usd(4250));
    var repo = new InMemoryPaymentRepo();
    var handler = new PaymentEventHandler(repo);

    handler.Handle(evt);
    handler.Handle(evt);

    repo.FindByEventId("evt-9f12").Should().HaveCount(1);
    repo.TotalForOrder(OrderId.Of("ord-001")).Should().Be(Money.Usd(4250));
}
test("same message processed twice creates one payment record", () => {
  const event = new PaymentEvent(
    "evt-9f12", OrderId.of("ord-001"), Money.usd(4250));
  const repo = new InMemoryPaymentRepo();
  const handler = new PaymentEventHandler(repo);

  handler.handle(event);
  handler.handle(event);

  expect(repo.findByEventId("evt-9f12")).toHaveLength(1);
  expect(repo.totalForOrder(OrderId.of("ord-001"))).toEqual(Money.usd(4250));
});

2.6 - Event Producer

A service that produces messages to a broker. Often paired with the event consumer pattern in the same service. Brief sketch.

The producer side, often paired with the Event consumer pattern in the same service. After a state change, the service publishes a message that downstream consumers depend on.

The hard problems differ from the consumer side: atomicity with persistence (did the DB row commit and the message publish?), exactly-once semantics that require an outbox or two-phase commit, and downstream consumer dependence on schema, routing key, and headers.

What needs covered

LayerConcernTest type
Outbox / transactional emitDB write and message emit happen as a unitComponent tests with real DB + broker double
Produced message contractSchema, headers, routingProvider-side contract tests
RoutingRight topic and key per event typeComponent tests
Retry on broker unavailableOutbox drains once broker recoversComponent tests with fault-injecting broker client double
Trace propagationTrace context in headers matches the inbound requestComponent tests
Event Producer: layers and the tests that cover eachFive architectural layers stacked top to bottom. The first three (domain emit decision, outbox or transactional emit, and broker client) are inside the component boundary. Below the dashed component boundary, the external broker and the database used by the outbox are drawn with dashed borders. Each band shows its name, a one-line description, and the test types that exercise it as small coloured pills. Solitary unit tests cover the emit decision logic. Component tests cover outbox atomicity, retry on broker unavailable, and trace propagation, run with a real database and a doubled broker. Adapter integration pins the broker protocol against a real broker container. Provider contract verification runs against every consumer's published expectations. Out-of-band synthetic state change confirms the message arrives in the real broker.Event Producer: Layers and the Tests That Cover EachINSIDE THE COMPONENT BOUNDARYDomain emit decisionSolitary unitComponentWhen, what, and which routing keyOutbox or transactional emitComponentDB write and message emit happen as a unit; trace propagationBroker clientAdapter integrationPrv.Comp.Protocol, headers, retry on broker unavailablecomponent boundaryOUTSIDE THE BOUNDARYExternal brokerComponentAdapter integ.Provider contractOOBDoubled in component; real in adapter integration; OOB synthetic state change on a schedule.Database (external)ComponentAdapter integrationReal DB in component to validate outbox atomicityinternal layerreal code under testexternal (dashed border)doubled in this test
Layered diagram of an event producer with five architectural layers. The first three (domain emit decision, outbox or transactional emit, broker client) are inside the component boundary. Below the dashed boundary, the external broker and the database used by the outbox are drawn with dashed borders. Solitary unit tests cover the emit decision logic. Component tests cover outbox atomicity, retry on broker unavailable, and trace propagation, run with a real database and a doubled broker. Adapter integration pins the broker protocol against a real broker container. Provider contract verification runs against every consumer's published expectations. Out-of-band synthetic state change confirms the message arrives in the real broker.

Positive test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • State change: produces the correct message on the correct topic with the correct routing key, headers, and schema version.
  • Outbox drain: drains in order.
  • Redelivery: does not reorder.

Negative test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • DB commits but broker fails: the message stays in the outbox and emits on the next drain. No event lost.
  • Broker accepts but DB rolls back: nothing is emitted. No phantom events.
  • Broker unavailable for an extended period: the outbox accumulates with bounded growth and alerts at a threshold.
  • Breaking schema change: fails provider-side contract verification before shipping.

Test double validation

The broker double in component tests is validated against a real broker container the team controls in adapter integration tests. The test asserts the adapter publishes with the right routing key, headers, and serialization - it does not assert which messages downstream consumers happen to read or in what order; those are downstream concerns. Provider-side contract verification runs in this service’s pipeline against every consumer’s published expectations.

Pipeline placement

Outbox component tests and routing tests run in CI Stage 1; adapter integration tests against a team-controlled broker container in CI Stage 1 or Stage 2; adapter integration tests against a managed broker the team can’t pin run out-of-band on a schedule. Provider-side contract verification in CD Stage 1; post-deploy synthetic state change verifies the message arrives with the expected shape.

2.7 - CLI Tool or Library

A binary or package consumed by other developers. The public interface is the CLI invocation surface or the library’s exported API. Brief sketch.

A binary (CLI) or package (library) consumed by other developers. The “public interface” is the CLI invocation surface (argv, stdin, stdout, stderr, exit code) or the library’s exported API.

The pattern is different because the consumer is a developer or another program, not a user clicking a button. Cross-platform behavior, semantic versioning, and backward compatibility matter more than they do for a service.

What needs covered

LayerConcernTest type
Pure logicFunctions, classes, parsersSolitary unit tests
CLI invocationArgument parsing, exit codes, output streamsComponent tests through the CLI entrypoint
Cross-platformPath separators, line endings, signal handlingCross-OS test matrix running the suite on every supported OS in CI
Public API surfaceLibrary’s exported types and functionsAPI surface tests (snapshot of the public API; diff fails the build)
Documented examplesThe README examples actually workDoctests / executable docs
CLI Tool or Library: layers and the tests that cover eachFive architectural layers stacked top to bottom. The first four (pure logic and parsing, CLI invocation surface or library API, file system and subprocess adapter, and documented README examples) are inside the component boundary. Below the dashed component boundary, the real OS, file system, and subprocess are drawn with a dashed border. Each band shows its name, a one-line description, and the test types that exercise it as small coloured pills. Solitary unit tests cover pure logic and parsing. Component tests cover invocation through the entrypoint. Adapter integration tests cover the file system and subprocess against the real OS in a temp directory. The API surface diff catches removal or rename of any public symbol. Doctests verify README examples run against the real binary or library. The cross-OS CI matrix runs the suite on every supported OS to catch platform-specific bugs.CLI Tool or Library: Layers and the Tests That Cover EachINSIDE THE COMPONENT BOUNDARYPure logic and parsingSolitary unitComponentFunctions, classes, parsers; no I/OCLI invocation surface or library APIComponentAPI surface diffDoctestsCross-OS CIargv, stdin, stdout, stderr, exit code, --help, exported symbolsFile system and subprocess adapterComponentAdapter integ.Cross-OSPaths, encodings, signal handling, spawn semanticsDocumented README examplesDoctestsExamples in the docs actually run against the real binary or librarycomponent boundaryOUTSIDE THE BOUNDARYReal OS, file system, subprocessComponentAdapter integ.Cross-OSPath separators, line endings, signals. Doubled in component; real in adapter integration and the cross-OS matrix.internal layerreal code under testexternal (dashed border)doubled in this test
Layered diagram of a CLI tool or library with five architectural layers. The first four (pure logic and parsing, CLI invocation surface or library API, file system and subprocess adapter, documented README examples) are inside the component boundary. Below the dashed boundary, the real OS, file system, and subprocess are drawn with a dashed border. Solitary unit tests cover pure logic and parsing. Component tests cover invocation through the entrypoint. Adapter integration tests cover the file system and subprocess against the real OS in a temp directory. The API surface diff catches removal or rename of any public symbol. Doctests verify README examples run against the real binary or library. The cross-OS CI matrix runs the suite on every supported OS to catch platform-specific bugs.

Positive test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Valid arguments: produce documented stdout output, no stderr, and exit code 0.
  • Pipe-friendly mode: produces machine-readable output (JSON/NDJSON) when stdout is not a TTY.
  • Library API: returns documented values for valid input.

Negative test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Bad arguments: exit with the documented non-zero code and structured stderr.
  • Help text: reachable via --help.
  • Large input: does not OOM.
  • Interrupt (Ctrl-C, SIGTERM): runs cleanup and flushes or rolls back partial output.
  • Invalid arguments to the library: throws the documented error type.
  • Public symbol removed or renamed: the API-surface test fails the build.

Test double validation

File system doubles validated by integration tests against the real FS in a temp directory. Subprocess doubles validated by tests that actually spawn the subprocess on each supported OS. Doctests validate README examples against the real binary or library on every build.

Pipeline placement

Unit and component tests run in CI Stage 1 on every supported OS; API surface diff and doctests in CI Stage 1; cross-platform integration tests in CI Stage 2 if slow.

2.8 - Stateful Service

A service that maintains long-lived in-memory state: caches, in-memory aggregates, leader-elected coordinators, websocket gateways, real-time engines. Brief sketch.

A service that maintains long-lived in-memory state: caches, in-memory aggregates, leader-elected coordinators, websocket gateways, real-time engines, sticky-session servers.

The hard problems are concurrency, recovery, and unbounded growth. Stateful services fail in ways stateless services do not.

What needs covered

LayerConcernTest type
State machine logicPure transitionsSolitary unit tests
Persistence and checkpointingState survives restart or rebuilds correctlyComponent tests with real persistence
Recovery from crashRestart converges to a consistent stateComponent tests that simulate crash mid-write
Leader electionOnly one leader; transitions are observable; split-brain is impossibleCluster tests with real consensus library
ReplicationFollowers stay in sync; backpressure is documentedCluster tests
Memory boundsState doesn’t grow unbounded; eviction policy holdsLong-running soak tests
Connection lifecycleSessions clean up on disconnect; reconnect is documentedComponent tests
Stateful Service: layers and the tests that cover eachSix architectural layers stacked top to bottom. The first five (state machine logic, persistence and recovery, single-node concurrency, replication and leader election, and memory bounds and long-run behaviour) are inside the component boundary. Below the dashed component boundary, the persistence engine is drawn with a dashed border. Each band shows its name, a one-line description, and the test types that exercise it as small coloured pills. Solitary unit tests cover state transitions. Component tests cover persistence, recovery, and single-node concurrency. Cluster tests exercise replication and leader election against a multi-node testcontainer setup. Soak and chaos tests run out of band against a deployed instance.Stateful Service: Layers and the Tests That Cover EachINSIDE THE COMPONENT BOUNDARYState machine logicSolitary unitComponentPure transitions; documented machinePersistence and recoveryComponentState survives restart; consistent state after crash mid-writeSingle-node concurrencyComponentSerialized mutations; connection lifecycle and reconnectReplication and leader electionCluster testsFollowers stay in sync; minority partition steps down; no split-brainMemory bounds and long-run behaviourSoak / chaos (OOB)Eviction policy holds; no unbounded growth; replication-lag stays in budgetcomponent boundaryOUTSIDE THE BOUNDARYPersistence engine (external)ComponentAdapter integ.ClusterSoakProduction engine. Doubled or in-memory in single-node component; real in gateway and cluster.internal layerreal code under testexternal (dashed border)doubled in this test
Layered diagram of a stateful service with six architectural layers. The first five (state machine logic, persistence and recovery, single-node concurrency, replication and leader election, memory bounds and long-run behavior) are inside the component boundary. Below the dashed boundary, the persistence engine is drawn with a dashed border. Solitary unit tests cover state transitions. Component tests cover persistence, recovery, and single-node concurrency. Cluster tests exercise replication and leader election against a multi-node testcontainer setup. Out-of-band soak and chaos tests catch unbounded growth, slow leaks, and replication-lag drift against a deployed instance.

Positive test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • State transitions: follow the documented machine.
  • Restart: state rebuilds and behavior matches pre-restart.
  • Replication lag under expected load: stays within budget.

Negative test cases

Common cases to consider, not an exhaustive list. Drop items that don’t apply and add ones the pattern doesn’t mention but your component needs.

  • Crash mid-write: consistent state on restart. No torn writes.
  • Network partition: minority replicas step down with documented reconciliation on heal.
  • Slow replication: applies backpressure rather than silent divergence.
  • Memory pressure: evicts oldest entries per policy without OOM.
  • Idle long-running connections: close cleanly with documented reconnect behavior.
  • Concurrent state mutations: serialize without lost updates.

Test double validation

Persistence doubles validated by adapter integration tests against the real production engine. Consensus library doubles validated by cluster tests against a multi-node testcontainer setup. Soak tests run out of pipeline against a deployed instance to catch slow leaks and unbounded growth.

Pipeline placement

State machine unit tests, recovery component tests, and single-node concurrency tests run in CI Stage 1; cluster tests with real consensus library in CI Stage 2; soak and chaos tests out of pipeline.

3 - Cross-Cutting Concerns

Concerns that cut across every pattern: authn/authz, database migrations, fixtures, observability, performance, mutation testing, flake handling, and time budgets.

The patterns describe testing organized by component shape. The concerns below cut across all patterns and deserve dedicated coverage in any non-trivial system.

Authn and authz testing

Authentication and authorization deserve dedicated, exhaustive coverage. They are a major source of high-impact incidents and the failure modes are predictable:

  • Tenant isolation: tenant A’s queries never return tenant B’s data. Test every read path. Multi-tenant SaaS bugs are almost always missing isolation tests.
  • Scope or role escalation: a token with read:orders cannot perform write:orders. Test the matrix of scope and endpoint.
  • Expired tokens: rejected even if cached locally. Clock-skew tolerance is a property of the verifier, not a license to skip the test.
  • Forged tokens: signature validation actually validates. The classic JWT alg: none bug still ships periodically.
  • Missing auth: every protected endpoint returns 401, never 500 (information leak) and never 200 (catastrophic).
  • Service-to-service auth: machine identities respected, mTLS validated, token-swapping attacks detected.

The pattern: a parameterized test that takes (endpoint, method, expected-status-when-no-token, expected-status-when-wrong-scope) and runs across every endpoint in the OpenAPI or schema definition. New endpoints are covered automatically.

Database migrations

Migrations have their own discipline. For every migration:

  • Forward on representative data: produces the expected schema and data.
  • Backward (where supported): returns to the previous schema with no data loss. Expand-contract migrations may not roll back; that’s a design choice the test pins.
  • Forward + backward + forward: idempotent.
  • Time on production-scale data: budget assertion. A 30-minute migration on a 50M-row table needs a different deploy strategy than a 30-second one.
  • Under traffic: the expand-contract pattern doesn’t break in-flight transactions.

Test against the real production database engine and version using testcontainers. SQLite-against-Postgres is a frequent source of “passed in CI, broke at 02:00 in prod” incidents.

Test data and fixtures

Fixtures rot faster than the code that uses them. Two principles keep them honest:

  1. Generate fixtures from the schema, not by hand. When the schema is the source of truth (Avro, OpenAPI, SQL DDL, Protobuf), generate fixture builders from it. A type change breaks the build, not production.
  2. Use Object Mother or builder patterns, not raw inline literals. A test that says placeOrder(buildValidOrder().withItem("A1", 2).build()) survives a schema change because the builder updates centrally. A test with 30 lines of raw JSON inline does not.

Avoid shared global fixtures that tests mutate. Each test creates the state it needs, names what is essential about that state, and discards the rest.

Observability as a tested artifact

Logs, metrics, and traces are part of a service’s contract with operators. If an alert depends on a metric, the test for the failure path should assert the metric is emitted. If a runbook depends on a structured log line, the test should assert the line is produced with the right fields and correlation ID.

The pattern: in component tests, attach a metrics collector and a log capture to the assembled component. Failure-path tests assert three things at once:

  1. The response status is correct.
  2. The error metric is incremented with the right labels.
  3. The structured log line is emitted with correlation ID, error code, and any fields the runbook depends on.

This prevents silent regressions where the code “works” but the operator can’t see what’s happening when it doesn’t.

Performance and load testing

Three classes of perf tests, each with a different home in the pipeline:

  1. Per-endpoint perf budgets in component tests. Simple latency assertion under no load (assertThat(p99).isLessThan(50ms)). Catches algorithmic regressions cheaply. Fits in CI Stage 1 if the assertions are tight and the runtime is stable.
  2. Load tests in acceptance. k6, Gatling, or Locust against a deployed instance. Validate p99 latency, throughput, and error rate at expected production load. Gates production promotion.
  3. Soak tests out of pipeline. Long-running load to catch memory leaks, file handle leaks, and slow drift. Scheduled, non-blocking.

A perf regression that breaches a documented budget should block deploy. A regression within budget but worse than baseline should generate a finding for review, not a build failure: noisy alerts get ignored.

Mutation testing

Coverage % tells you what code ran. Mutation testing tells you whether the tests would have failed if the code had been wrong. Tools (Stryker for JS, PIT for Java) systematically change operators, return values, and conditionals, then re-run the test suite. Surviving mutants are tests that didn’t catch the mutation.

Each surviving mutant is one of three things:

  • A real test gap. Add a flow-oriented test that would have failed when the mutation was applied.
  • An equivalent mutant, semantically identical to the original. Mark and move on.
  • A trivially equivalent mutant (logging change, assertion message tweak). Configure the tool to skip.

Mutation testing is too slow to run on every commit. Run it nightly or weekly on the highest-value modules. Treat it as a periodic audit of test quality, not a gating check.

Flake handling protocol

A flaky test is a known unknown. Three rules keep flakes from rotting the suite:

  1. Quarantine on detection. First flake gets the test moved to a quarantine lane that doesn’t block the build. Don’t ignore it; don’t keep failing builds for unrelated reasons.
  2. Time-boxed remediation. Quarantined tests have a deadline (e.g., five business days) and an owner. After the deadline, fix or delete. No silent quarantine.
  3. Track the cause. Most flakes share root causes: timing, shared state, network, ordering. The fix is usually structural (eliminate the timing dependency) rather than local (add a longer sleep).

A suite with a permanent quarantine list has lost its CD-ready quality. See also Tests Randomly Pass or Fail.

Cost and time budgets

Empirical starting points for in-band test budgets, based on typical service complexity. Adjust for your codebase, language, framework, and the size of the component under test.

PatternIn-band suite budgetNotes
1 (API provider)< 5 minMost logic in unit and component tests
2 (API consumer)< 5 minMore gateway and resilience tests than 1
3 (scheduled job)< 3 minPlus a small set of tests that exercise the deployed binary
4 (UI)< 8 minComponent tests in headless browser via Playwright + the team’s unit-testing framework
5 (event consumer)< 5 minReal broker container for gateway tests
6 (event producer)< 5 minSame
7 (CLI / library)< 3 minOne pass per supported OS in CI matrix
8 (stateful service)< 8 minReal persistence; cluster tests in Stage 2

The total CD pipeline in-band suite under 10 minutes is the gating constraint at the team level. The first lever for hitting that budget is parallel execution: the suite should fan out across cores or runners, not run serially. Parallelism only works when tests are independent of each other - no shared mutable state, no ordering dependencies, no global fixtures that one test mutates and another reads. Decoupling tests is a prerequisite for speed, not an optimization on top of it.

If a component’s tests still can’t fit the budget after the suite is running in parallel, the goal is to remediate the underlying cause - slow component startup, oversize fixtures, expensive setup duplicated per test, hidden serialization through a shared resource - not to declare the budget unreachable. While the remediation is underway, moving the offending tests out-of-band on a schedule is a reasonable stopgap so the in-band suite stays fast. Out-of-band placement here is a temporary mitigation, not the destination: those tests should come back in-band once the underlying speed issue is fixed.