This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Dysfunction Symptoms

Start from what you observe. Find the anti-patterns causing it.

1: Test Suite Problems

1.1: Tests Pass in One Environment but Fail in Another
1.2: High Coverage but Tests Miss Defects
1.3: Refactoring Breaks Tests
1.4: Test Suite Is Too Slow to Run
1.5: Tests Randomly Pass or Fail

2: Deployment and Release Problems

2.1: Multiple Services Must Be Deployed Together
2.2: The Team Is Afraid to Deploy
2.3: Hardening Sprints Are Needed Before Every Release
2.4: Releases Are Infrequent and Painful
2.5: Merge Freezes Before Deployments
2.6: Staging Passes but Production Fails

3: Integration and Feedback Problems

3.1: Everything Started, Nothing Finished
3.2: Feedback Takes Hours Instead of Minutes
3.3: Merging Is Painful and Time-Consuming
3.4: Pull Requests Sit for Days Waiting for Review
3.5: Pipelines Take Too Long
3.6: Work Items Take Days or Weeks to Complete

4: Production Visibility and Team Health

4.1: Team Burnout and Unsustainable Pace
4.2: Production Issues Discovered by Customers
4.3: Production Problems Are Discovered Hours or Days Late
4.4: It Works on My Machine

5: Find Your Symptom
6: Symptoms for Developers
7: Symptoms for Managers

Not sure which anti-pattern is hurting your team? Start here. Choose the path that fits how you want to explore.

Find your symptom

Answer a few questions to narrow down which symptoms match your situation.

Start the triage questions

Browse by category

Jump directly to the area where you are experiencing problems.

Test Suite Problems - Flaky tests, slow suites, high coverage that misses defects, environment-dependent failures
Deployment and Release Problems - Fear of deploying, infrequent releases, coordinated deployments, merge freezes, hardening sprints
Integration and Feedback Problems - Too much WIP, long cycle times, review bottlenecks, painful merges, slow feedback loops
Production Visibility and Team Health - Customers finding bugs first, slow incident detection, environment drift, team burnout

Start from your role

Each role sees different symptoms first. Find the ones most relevant to your daily work.

For Developers - Symptoms you hit while writing, testing, and shipping code - from flaky tests to painful merges
For Managers - Symptoms that show up as unpredictable delivery, quality gaps, and team health problems

Explore by theme

Symptoms and anti-patterns share common themes. Browse by tag to see connections across categories.

View all tags

1 - Test Suite Problems

Symptoms related to test reliability, coverage effectiveness, speed, and environment consistency.

These symptoms indicate problems with your testing strategy. Unreliable or slow tests erode confidence and slow delivery. Each page describes what you are seeing and links to the anti-patterns most likely causing it.

How to use this section

Start with the symptom that matches what your team experiences. Each symptom page explains what you are seeing, identifies the most likely root causes (anti-patterns), and provides diagnostic questions to narrow down which cause applies to your situation. Follow the anti-pattern link to find concrete fix steps.

Related anti-pattern categories: Testing Anti-Patterns, Pipeline Anti-Patterns

Related guide: Testing Fundamentals

1.1 - Tests Pass in One Environment but Fail in Another

Tests pass locally but fail in CI, or pass in CI but fail in staging. Environment differences cause unpredictable failures.

What you are seeing

A developer runs the tests locally and they pass. They push to CI and the same tests fail. Or the CI pipeline is green but the tests fail in the staging environment. The failures are not caused by a code defect. They are caused by differences between environments: a different OS version, a different database version, a different timezone setting, a missing environment variable, or a service that is available locally but not in CI.

The developer spends time debugging the failure and discovers the root cause is environmental, not logical. They add a workaround (skip the test in CI, add an environment check, adjust a timeout) and move on. The workaround accumulates over time. The test suite becomes littered with environment-specific conditionals and skipped tests.

The team loses confidence in the test suite because results depend on where the tests run rather than whether the code is correct.

Common causes

Snowflake Environments

When each environment is configured by hand and maintained independently, they drift apart over time. The developer’s laptop has one version of a database driver. The CI server has another. The staging environment has a third. These differences are invisible until a test exercises a code path that behaves differently across versions. The fix is not to harmonize configurations manually (they will drift again) but to provision all environments from the same infrastructure code.

Read more: Snowflake Environments

Manual Deployments

When deployment and environment setup are manual processes, subtle differences creep in. One developer installed a dependency a particular way. The CI server was configured by a different person with slightly different settings. The staging environment was set up months ago and has not been updated. Manual processes are never identical twice, and the variance causes environment- dependent behavior.

Read more: Manual Deployments

Tightly Coupled Monolith

When the application has hidden dependencies on external state (filesystem paths, network services, system configuration), tests that work in one environment fail in another because the external state differs. Well-isolated code with explicit dependencies is portable across environments. Tightly coupled code that reaches into its environment for implicit dependencies is fragile.

Read more: Tightly Coupled Monolith

How to narrow it down

Are all environments provisioned from the same infrastructure code? If not, environment drift is the most likely cause. Start with Snowflake Environments.
Are environment setup and configuration manual? If different people configured different environments, the variance is a direct result of manual processes. Start with Manual Deployments.
Do the failing tests depend on external services, filesystem paths, or system configuration? If tests assume specific external state rather than declaring explicit dependencies, the code’s coupling to its environment is the issue. Start with Tightly Coupled Monolith.

Tests Randomly Pass or Fail - Environment differences are a common cause of flaky tests
It Works on My Machine - The same root cause affects both testing and development
Snowflake Environments - Eliminating environment variance
Production-Like Environments - Making all environments consistent
Testing Fundamentals - Designing tests that are environment-independent

1.2 - High Coverage but Tests Miss Defects

Test coverage numbers look healthy but defects still reach production.

What you are seeing

Your dashboard shows 80% or 90% code coverage, but bugs keep getting through. Defects show up in production that feel like they should have been caught. The team points to the coverage number as proof that testing is solid, yet the results tell a different story.

People start losing trust in the test suite. Some developers stop running tests locally because they do not believe the tests will catch anything useful. Others add more tests, pushing coverage higher, without the defect rate improving.

Common causes

Inverted Test Pyramid

When most of your tests are end-to-end or integration tests, they exercise many code paths in a single run - which inflates coverage numbers. But these tests often verify that a workflow completes without errors, not that each piece of logic produces the correct result. A test that clicks through a form and checks for a success message covers dozens of functions without validating any of them in detail.

Read more: Inverted Test Pyramid

Pressure to Skip Testing

When teams face pressure to hit a coverage target, testing becomes theater. Developers write tests with trivial assertions - checking that a function returns without throwing, or that a value is not null - just to get the number up. The coverage metric looks healthy, but the tests do not actually verify behavior. They exist to satisfy a gate, not to catch defects.

Read more: Pressure to Skip Testing

Code Coverage Mandates

When the organization gates the pipeline on a coverage target, teams optimize for the number rather than for defect detection. Developers write assertion-free tests, cover trivial code, or add single integration tests that execute hundreds of lines without validating any of them. The coverage metric rises while the tests remain unable to catch meaningful defects.

Read more: Code Coverage Mandates

Manual Testing Only

When test automation is absent or minimal, teams sometimes generate superficial tests or rely on coverage from integration-level runs that touch many lines without asserting meaningful outcomes. The coverage tool counts every line that executes, regardless of whether any test validates the result.

Read more: Manual Testing Only

How to narrow it down

Do most tests assert on behavior and expected outcomes, or do they just verify that code runs without errors? If tests mostly check for no-exceptions or non-null returns, the problem is testing theater - tests written to hit a number, not to catch defects. Start with Pressure to Skip Testing.
Are the majority of your tests end-to-end or integration tests? If most of the suite runs through a browser, API, or multi-service flow rather than testing units of logic directly, start with Inverted Test Pyramid.
Does the pipeline gate on a specific coverage percentage? If the team writes tests primarily to keep coverage above a mandated threshold, start with Code Coverage Mandates.
Were tests added retroactively to meet a coverage target? If the bulk of tests were written after the code to satisfy a coverage gate rather than to verify design decisions, start with Pressure to Skip Testing.

Refactoring Breaks Tests - Another sign that tests verify implementation instead of behavior
Code Coverage Mandates - When coverage targets incentivize the wrong testing behavior
Testing Fundamentals - Building tests that catch real defects
Unit Tests - Writing fast, behavior-focused tests
Change Fail Rate - Measure defect escape rate instead of coverage percentage

1.3 - Refactoring Breaks Tests

Internal code changes that do not alter behavior cause widespread test failures.

What you are seeing

A developer renames a method, extracts a class, or reorganizes modules - changes that should not affect external behavior. But dozens of tests fail. The failures are not catching real bugs. They are breaking because the tests depend on implementation details that changed.

Developers start avoiding refactoring because the cost of updating tests is too high. Code quality degrades over time because cleanup work is too expensive. When someone does refactor, they spend more time fixing tests than improving the code.

Common causes

Inverted Test Pyramid

When the test suite is dominated by end-to-end and integration tests, those tests tend to be tightly coupled to implementation details - CSS selectors, API response shapes, DOM structure, or specific sequences of internal calls. A refactoring that changes none of the observable behavior still breaks these tests because they assert on how the system works rather than what it does.

Unit tests focused on behavior (“given this input, expect this output”) survive refactoring. Tests coupled to implementation (“this method was called with these arguments”) do not.

Read more: Inverted Test Pyramid

Tightly Coupled Monolith

When components lack clear interfaces, tests reach into the internals of other modules. A refactoring in module A breaks tests for module B - not because B’s behavior changed, but because B’s tests were calling A’s internal methods directly. Without well-defined boundaries, every internal change ripples across the test suite.

Read more: Tightly Coupled Monolith

How to narrow it down

Do the broken tests assert on internal method calls, mock interactions, or DOM structure? If yes, the tests are coupled to implementation rather than behavior. This is a test design issue - start with Inverted Test Pyramid for guidance on building a behavior-focused test suite.
Are the broken tests end-to-end or UI tests that fail because of layout or selector changes? If yes, you have too many tests at the wrong level of the pyramid. Start with Inverted Test Pyramid.
Do the broken tests span multiple modules - testing code in one area but breaking because of changes in another? If yes, the problem is missing boundaries between components. Start with Tightly Coupled Monolith.

High Coverage but Tests Miss Defects - Tests that verify implementation often create high coverage without catching bugs
Inverted Test Pyramid - Over-reliance on integration and E2E tests amplifies this problem
Testing Fundamentals - Test architecture that supports refactoring
Unit Tests - Black box testing that survives internal changes
Test Doubles - Using test doubles without coupling to implementation

1.4 - Test Suite Is Too Slow to Run

The test suite takes 30 minutes or more. Developers stop running it locally and push without verifying.

What you are seeing

The full test suite takes 30 minutes, an hour, or longer. Developers do not run it locally because they cannot afford to wait. Instead, they push their changes and let CI run the tests. Feedback arrives long after the developer has moved on. If a test fails, the developer must context-switch back, recall what they were doing, and debug the failure.

Some developers run only a subset of tests locally (the ones for their module) and skip the rest. This catches some issues but misses integration problems between modules. Others skip local testing entirely and treat the CI pipeline as their test runner, which overloads the shared pipeline and increases wait times for everyone.

The team has discussed parallelizing the tests, splitting the suite, or adding more CI capacity. These discussions stall because the root cause is not infrastructure. It is the shape of the test suite itself.

Common causes

Inverted Test Pyramid

When the majority of tests are end-to-end or integration tests, the suite is inherently slow. E2E tests launch browsers, start services, make network calls, and wait for responses. Each test takes seconds or minutes instead of milliseconds. A suite of 500 E2E tests will always be slower than a suite of 5,000 unit tests that verify the same logic at a lower level. The fix is not faster hardware. It is moving test coverage down the pyramid.

Read more: Inverted Test Pyramid

Tightly Coupled Monolith

When the codebase has no clear module boundaries, tests cannot be scoped to individual components. A test for one feature must set up the entire application because the feature depends on everything. Test setup and teardown dominate execution time because there is no way to isolate the system under test.

Read more: Tightly Coupled Monolith

Manual Testing Only

Sometimes the test suite is slow because the team added automated tests as an afterthought, using E2E tests to backfill coverage for code that was not designed for unit testing. The resulting suite is a collection of heavyweight tests that exercise the full stack for every scenario because the code provides no lower-level testing seams.

Read more: Manual Testing Only

How to narrow it down

What is the ratio of unit tests to E2E/integration tests? If E2E tests outnumber unit tests, the test pyramid is inverted and the suite is slow by design. Start with Inverted Test Pyramid.
Can tests be run for a single module in isolation? If running one module’s tests requires starting the entire application, the architecture prevents test isolation. Start with Tightly Coupled Monolith.
Were the automated tests added retroactively to a codebase with no testing seams? If tests were bolted on after the fact using E2E tests because the code cannot be unit-tested, the codebase needs refactoring for testability. Start with Manual Testing Only.

Pipelines Take Too Long - Slow tests are the most common cause of slow pipelines
Feedback Takes Hours Instead of Minutes - Slow suites force developers into long feedback loops
Inverted Test Pyramid - Too many slow tests at the wrong level
Testing Fundamentals - Rebalancing the test pyramid for speed
Build Duration - Track pipeline speed as a first-class metric

1.5 - Tests Randomly Pass or Fail

The pipeline fails, the developer reruns it without changing anything, and it passes.

What you are seeing

A developer pushes a change. The pipeline fails on a test they did not touch, in a module they did not change. They click rerun. It passes. They merge. This happens multiple times a day across the team. Nobody investigates failures on the first occurrence because the odds favor flakiness over a real problem.

The team has adapted: retry-until-green is a routine step, not an exception. Some pipelines are configured to automatically rerun failed tests. Tests are tagged as “known flaky” and skipped. Real regressions hide behind the noise because the team has been trained to ignore failures.

Common causes

Inverted Test Pyramid

When the test suite is dominated by end-to-end tests, flakiness is structural. E2E tests depend on network connectivity, shared test environments, external service availability, and browser rendering timing. Any of these can produce a different result on each run. A suite built mostly on E2E tests will always be flaky because it is built on non-deterministic foundations.

Replacing E2E tests with functional tests that use test doubles for external dependencies makes the suite deterministic by design. The test produces the same result every time because it controls all its inputs.

Read more: Inverted Test Pyramid

Snowflake Environments

When the CI environment is configured differently from other environments - or drifts over time - tests pass locally but fail in CI, or pass in CI on Tuesday but fail on Wednesday. The inconsistency is not in the test or the code but in the environment the test runs in.

Tests that depend on specific environment configurations, installed packages, file system layout, or network access are vulnerable to environment drift. Infrastructure-as-code eliminates this class of flakiness by ensuring environments are identical and reproducible.

Read more: Snowflake Environments

Tightly Coupled Monolith

When components share mutable state - a database, a cache, a filesystem directory - tests that run concurrently or in a specific order can interfere with each other. Test A writes to a shared table. Test B reads from the same table and gets unexpected data. The tests pass individually but fail together, or pass in one order but fail in another.

Without clear component boundaries, tests cannot be isolated. The flakiness is a symptom of architectural coupling, not a testing problem.

Read more: Tightly Coupled Monolith

How to narrow it down

Do the flaky tests hit real external services or shared environments? If yes, the tests are non-deterministic by design. Start with Inverted Test Pyramid and replace them with functional tests using test doubles.
Do tests pass locally but fail in CI, or vice versa? If yes, the environments differ. Start with Snowflake Environments.
Do tests pass individually but fail when run together, or fail in a different order? If yes, tests share mutable state. Start with Tightly Coupled Monolith for the architectural root cause, and isolate test data as an immediate fix.

Tests Pass in One Environment but Fail in Another - Environment differences cause similar non-determinism
Test Suite Is Too Slow to Run - Flaky tests compound slow feedback loops
Inverted Test Pyramid - The most common structural cause of flaky tests
Testing Fundamentals - Building a fast, reliable test suite
Change Fail Rate - Track whether test reliability improvements reduce production failures

2 - Deployment and Release Problems

Symptoms related to deployment frequency, release risk, coordination overhead, and environment parity.

These symptoms indicate problems with your deployment and release process. When deploying is painful, teams deploy less often, which increases batch size and risk. Each page describes what you are seeing and links to the anti-patterns most likely causing it.

How to use this section

Related anti-pattern categories: Pipeline Anti-Patterns, Architecture Anti-Patterns

Related guides: Pipeline Architecture, Rollback, Small Batches

2.1 - Multiple Services Must Be Deployed Together

Changes cannot go to production until multiple services are deployed in a specific order during a coordinated release window.

What you are seeing

A developer finishes a change to one service. It is tested, reviewed, and ready to deploy. But it cannot go out alone. The change depends on a schema migration in a shared database, a new endpoint in another service, and a UI update in a third. All three teams coordinate a release window. Someone writes a deployment runbook with numbered steps. If step four fails, steps one through three need to be rolled back manually.

The team cannot deploy on a Tuesday afternoon because the other teams are not ready. The change sits in a branch (or merged to main but feature-flagged off) waiting for the coordinated release next Thursday. By then, more changes have accumulated, making the release larger and riskier.

Common causes

Tightly Coupled Architecture

When services share a database, call each other without versioned contracts, or depend on deployment order, they cannot be deployed independently. A change to Service A’s data model breaks Service B if Service B is not updated at the same time. The architecture forces coordination because the boundaries between services are not real boundaries. They are implementation details that leak across service lines.

Read more: Tightly Coupled Monolith

Distributed Monolith

The organization moved from a monolith to services, but the service boundaries are wrong. Services were decomposed along technical lines (a “database service,” an “auth service,” a “notification service”) rather than along domain lines. The result is services that cannot handle a business request on their own. Every user-facing operation requires a synchronous chain of calls across multiple services. If one service in the chain is unavailable or deploying, the entire operation fails.

This is a monolith distributed across the network. It has all the operational complexity of microservices (network latency, partial failures, distributed debugging) with none of the benefits (independent deployment, team autonomy, fault isolation). Deploying one service still requires deploying the others because the boundaries do not correspond to independent units of business functionality.

Read more: Distributed Monolith

Horizontal Slicing

When work for a feature is decomposed by service (“Team A builds the API, Team B updates the UI, Team C modifies the processor”), each team’s change is incomplete on its own. Nothing is deployable until all teams finish their part. The decomposition created the coordination requirement. Vertical slicing within each team’s domain, with stable contracts between services, allows each team to deploy when their slice is ready.

Read more: Horizontal Slicing

Undone Work

Sometimes the coordination requirement is artificial. The service could technically be deployed independently, but the team’s definition of done requires a cross-service integration test that only runs during the release window. Or deployment is gated on a manual approval from another team. The coordination is not forced by the architecture but by process decisions that bundle independent changes into a single release event.

Read more: Undone Work

How to narrow it down

Do services share a database or call each other without versioned contracts? If yes, the architecture forces coordination. Changes to shared state or unversioned interfaces cannot be deployed independently. Start with Tightly Coupled Monolith.
Does every user-facing request require a synchronous chain across multiple services? If a single business operation touches three or more services in sequence, the service boundaries were drawn in the wrong place. You have a distributed monolith. Start with Distributed Monolith.
Was the feature decomposed by service or team rather than by behavior? If each team built their piece of the feature independently and now all pieces must go out together, the work was sliced horizontally. Start with Horizontal Slicing.
Could each service technically be deployed on its own, but process or policy prevents it? If the coupling is in the release process (shared release window, cross-team sign-off, manual integration test gate) rather than in the code, the constraint is organizational. Start with Undone Work and examine whether the definition of done requires unnecessary coordination.

Releases Are Infrequent and Painful - Coordination overhead makes releases less frequent
Distributed Monolith - Services that cannot deploy independently
Tightly Coupled Monolith - Architectural coupling that forces coordination
Architecture Decoupling - Breaking dependencies between services
Lead Time - Measure the cost of coordination in delivery speed

2.2 - The Team Is Afraid to Deploy

Production deployments cause anxiety because they frequently fail. The team delays deployments, which increases batch size, which increases risk.

What you are seeing

Nobody wants to deploy on a Friday. Or a Thursday. Ideally, deployments happen early in the week when the team is available to respond to problems. The team has learned through experience that deployments break things, so they treat each deployment as a high-risk event requiring maximum staffing and attention.

Developers delay merging “risky” changes until after the next deploy so their code does not get caught in the blast radius. Release managers add buffer time between deploys. The team informally agrees on a deployment cadence (weekly, biweekly) that gives everyone time to recover between releases.

The fear is rational. Deployments do break things. But the team’s response (deploy less often, batch more changes, add more manual verification) makes each deployment larger, riskier, and more likely to fail. The fear becomes self-reinforcing.

Common causes

Manual Deployments

When deployment requires human execution of steps, each deployment carries human error risk. The team has experienced deployments where a step was missed, a script was run in the wrong order, or a configuration was set incorrectly. The fear is not of the code but of the deployment process itself. Automated deployments that execute the same steps identically every time eliminate the process-level risk.

Read more: Manual Deployments

Missing Deployment Pipeline

When there is no automated path from commit to production, the team has no confidence that the deployed artifact has been properly built and tested. Did someone run the tests? Are we deploying the right version? Is this the same artifact that was tested in staging? Without a pipeline that enforces these checks, every deployment requires the team to manually verify the prerequisites.

Read more: Missing Deployment Pipeline

When the team cannot observe production health after a deployment, they have no way to know quickly whether the deploy succeeded or failed. The fear is not just that something will break but that they will not know it broke until a customer reports it. Monitoring and automated health checks transform deployment from “deploy and hope” to “deploy and verify.”

Read more: Blind Operations

Manual Testing Only

When the team has no automated tests, they have no confidence that the code works before deploying it. Manual testing provides some coverage, but it is never exhaustive, and the team knows it. Every deployment carries the risk that an untested code path will fail in production. A comprehensive automated test suite gives the team evidence that the code works, replacing hope with confidence.

Read more: Manual Testing Only

Monolithic Work Items

When changes are large, each deployment carries more risk simply because more code is changing at once. A deployment with 200 lines changed across 3 files is easy to reason about and easy to roll back. A deployment with 5,000 lines changed across 40 files is unpredictable. Small, frequent deployments reduce risk per deployment rather than accumulating it.

Read more: Monolithic Work Items

How to narrow it down

Is the deployment process automated? If a human runs the deployment, the fear may be of the process, not the code. Start with Manual Deployments.
Does the team have an automated pipeline from commit to production? If not, there is no systematic guarantee that the right artifact with the right tests reaches production. Start with Missing Deployment Pipeline.
Can the team verify production health within minutes of deploying? If not, the fear includes not knowing whether the deploy worked. Start with Blind Operations.
Does the team have automated tests that provide confidence before deploying? If not, the fear is that untested code will break. Start with Manual Testing Only.
How many changes are in a typical deployment? If deployments are large batches, the risk per deployment is high by construction. Start with Monolithic Work Items.

Releases Are Infrequent and Painful - Fear of deploying leads to batching, which increases risk further
Hardening Sprints Are Needed Before Every Release - Teams afraid to deploy often need stabilization periods
Manual Deployments - Manual steps make deployments unpredictable
Pipeline Architecture - Automated pipelines that make deployment routine
Rollback - Fast rollback reduces deployment risk
Change Fail Rate - Track deployment reliability over time

2.3 - Hardening Sprints Are Needed Before Every Release

The team dedicates one or more sprints after “feature complete” to stabilize code before it can be released.

What you are seeing

After the team finishes building features, nothing is ready to ship. A “hardening sprint” is scheduled: one or more sprints dedicated to bug fixing, stabilization, and integration testing. No new features are built during this period. The team knows from experience that the code is not production-ready when development ends.

The hardening sprint finds bugs that were invisible during development. Integration issues surface because components were built in isolation. Performance problems appear under realistic load. Edge cases that nobody tested during development cause failures. The hardening sprint is not optional because skipping it means shipping broken software.

The team treats this as normal. Planning includes hardening time by default. A project that takes four sprints to build is planned as six: four for features, two for stabilization.

Common causes

Manual Testing Only

When the team has no automated test suite, quality verification happens manually at the end. The hardening sprint is where manual testers find the defects that automated tests would have caught during development. Without automated regression testing, every release requires a full manual pass to verify nothing is broken.

Read more: Manual Testing Only

Inverted Test Pyramid

When most tests are slow end-to-end tests and few are unit tests, defects in business logic go undetected until integration testing. The E2E tests are too slow to run continuously, so they run at the end. The hardening sprint is when the team finally discovers what was broken all along.

Read more: Inverted Test Pyramid

Undone Work

When the team’s definition of done does not include deployment and verification, stories are marked complete while hidden work remains. Testing, validation, and integration happen after the story is “done.” The hardening sprint is where all that undone work gets finished.

Read more: Undone Work

Monolithic Work Items

When features are built as large, indivisible units, integration risk accumulates silently. Each large feature is developed in relative isolation for weeks. The hardening sprint is the first time all the pieces come together, and the integration pain is proportional to the batch size.

Read more: Monolithic Work Items

Pressure to Skip Testing

When management pressures the team to maximize feature output, testing is deferred to “later.” The hardening sprint is that “later.” Testing was not skipped; it was moved to the end where it is less effective, more expensive, and blocks the release.

Read more: Pressure to Skip Testing

How to narrow it down

Does the team have automated tests that run on every commit? If not, the hardening sprint is compensating for the lack of continuous quality verification. Start with Manual Testing Only.
Are most automated tests end-to-end or UI tests? If the test suite is slow and top-heavy, defects are caught late because fast unit tests are missing. Start with Inverted Test Pyramid.
Does the team’s definition of done include deployment and verification? If stories are “done” before they are tested and deployed, the hardening sprint finishes what “done” should have included. Start with Undone Work.
How large are the typical work items? If features take weeks and integrate at the end, the batch size creates the integration risk. Start with Monolithic Work Items.
Is there pressure to prioritize features over testing? If testing is consistently deferred to hit deadlines, the hardening sprint absorbs the cost. Start with Pressure to Skip Testing.

Merge Freezes Before Deployments - Hardening and freezes are companion symptoms
The Team Is Afraid to Deploy - Hardening sprints reinforce the belief that deployment is risky
Manual Regression Testing Gates - Manual testing phases that drive hardening cycles
Testing Fundamentals - Automated testing that builds quality in continuously
Deployable Definition - Making every commit production-ready by definition
Change Fail Rate - Track whether quality improves without hardening

2.4 - Releases Are Infrequent and Painful

Deploying happens monthly, quarterly, or less. Each release is a large, risky event that requires war rooms and weekend work.

What you are seeing

The team deploys once a month, once a quarter, or on some irregular cadence that nobody can predict. Each release is a significant event. There is a release planning meeting, a deployment runbook, a designated release manager, and often a war room during the actual deploy. People cancel plans for release weekends.

Between releases, changes pile up. By the time the release goes out, it contains dozens or hundreds of changes from multiple developers. Nobody can confidently say what is in the release without checking a spreadsheet or release notes document. When something breaks in production, the team spends hours narrowing down which of the many changes caused the problem.

The team wants to release more often but feels trapped. Each release is so painful that adding more releases feels like adding more pain.

Common causes

Manual Deployments

When deployment requires a human to execute steps (SSH into servers, run scripts, click through a console), the process is slow, error-prone, and dependent on specific people being available. The cost of each deployment is high enough that the team batches changes to amortize it. The batch grows, the risk grows, and the release becomes an event rather than a routine.

Read more: Manual Deployments

Missing Deployment Pipeline

When there is no automated path from commit to production, every release requires manual coordination of builds, tests, and deployments. Without a pipeline, the team cannot deploy on demand because the process itself does not exist in a repeatable form.

Read more: Missing Deployment Pipeline

CAB Gates

When every production change requires committee approval, the approval cadence sets the release cadence. If the Change Advisory Board meets weekly, releases happen weekly at best. If the meeting is biweekly, releases are biweekly. The team cannot deploy faster than the approval process allows, regardless of technical capability.

Read more: CAB Gates

Monolithic Work Items

When work is not decomposed into small, independently deployable increments, each “feature” is a large batch of changes that takes weeks to complete. The team cannot release until the feature is done, and the feature is never done quickly because it was scoped too large. Small batches enable frequent releases. Large batches force infrequent ones.

Read more: Monolithic Work Items

Manual Regression Testing Gates

When every release requires a manual test pass that takes days or weeks, the testing cadence limits the release cadence. The team cannot release until QA finishes, and QA cannot finish faster because the test suite is manual and grows with every feature.

Read more: Manual Regression Testing Gates

How to narrow it down

Is the deployment process automated? If deploying requires human steps beyond pressing a button, the process itself is the bottleneck. Start with Manual Deployments.
Does a pipeline exist that can take code from commit to production? If not, the team cannot release on demand because the infrastructure does not exist. Start with Missing Deployment Pipeline.
Does a committee or approval board gate production changes? If releases wait for scheduled approval meetings, the approval cadence is the constraint. Start with CAB Gates.
How large is the typical work item? If features take weeks and are delivered as single units, the batch size is the constraint. Start with Monolithic Work Items.
Does a manual test pass gate every release? If QA takes days per release, the testing process is the constraint. Start with Manual Regression Testing Gates.

The Team Is Afraid to Deploy - Infrequent releases are often driven by deployment fear
Merge Freezes Before Deployments - Stabilization overhead that accompanies large releases
Missing Deployment Pipeline - No automated path from commit to production
Small Batches - Reducing release size to reduce risk
Release Frequency - Measure how often the team ships to production

2.5 - Merge Freezes Before Deployments

Developers announce merge freezes because the integration process is fragile. Deploying requires coordination in chat.

What you are seeing

A message appears in the team chat: “Please don’t merge to main, I’m about to deploy.” The deployment process requires the main branch to be stable and unchanged for the duration of the deploy. Any merge during that window could invalidate the tested artifact, break the build, or create an inconsistent state between what was tested and what ships.

Other developers queue up their PRs and wait. If the deployment hits a problem, the freeze extends. Sometimes the freeze lasts hours. In the worst cases, the team informally agrees on “deployment windows” where merging is allowed at certain times and deployments happen at others.

The merge freeze is a coordination tax. Every deployment interrupts the entire team’s workflow. Developers learn to time their merges around deploy schedules, adding mental overhead to routine work.

Common causes

Manual Deployments

When deployment is a manual process (running scripts, clicking through UIs, executing a runbook), the person deploying needs the environment to hold still. Any change to main during the deployment window could mean the deployed artifact does not match what was tested. Automated deployments that build, test, and deploy atomically eliminate this window because the pipeline handles the full sequence without requiring a stable pause.

Read more: Manual Deployments

Integration Deferred

When the team does not have a reliable CI process, merging to main is itself risky. If the build breaks after a merge, the deployment is blocked. The team freezes merges not just to protect the deployment but because they lack confidence that any given merge will keep main green. If CI were reliable, merging and deploying could happen concurrently because main would always be deployable.

Read more: Integration Deferred

Missing Deployment Pipeline

When there is no pipeline that takes a specific commit through build, test, and deploy as a single atomic operation, the team must manually coordinate which commit gets deployed. A pipeline pins the deployment to a specific artifact built from a specific commit. Without it, the team must freeze merges to prevent the target from moving while they deploy.

Read more: Missing Deployment Pipeline

How to narrow it down

Is the deployment process automated end-to-end? If a human executes deployment steps, the freeze protects against variance in the manual process. Start with Manual Deployments.
Does the team trust that main is always deployable? If merges to main sometimes break the build, the freeze protects against unreliable integration. Start with Integration Deferred.
Does the pipeline deploy a specific artifact from a specific commit? If there is no pipeline that pins the deployment to an immutable artifact, the team must manually ensure the target does not move. Start with Missing Deployment Pipeline.

Hardening Sprints Are Needed Before Every Release - Freezes and hardening sprints often go together
Releases Are Infrequent and Painful - Freezes are a symptom of high-risk release processes
Integration Deferred - Batching integration creates the instability that freezes try to control
Trunk-Based Development - Continuous integration eliminates the need for freezes
Integration Frequency - Track how often the team integrates to trunk

2.6 - Staging Passes but Production Fails

Deployments pass every pre-production check but break when they reach production.

What you are seeing

Code passes tests, QA signs off, staging looks fine. Then the release hits production and something breaks: a feature behaves differently, a dependent service times out, or data that never appeared in staging triggers an unhandled edge case.

The team scrambles to roll back or hotfix. Confidence in the pipeline drops. People start adding more manual verification steps, which slows delivery without actually preventing the next surprise.

Common causes

Snowflake Environments

When each environment is configured by hand (or was set up once and has drifted since), staging and production are never truly the same. Different library versions, different environment variables, different network configurations. Code that works in one context silently fails in another because the environments are only superficially similar.

Read more: Snowflake Environments

Sometimes the problem is not that staging passes and production fails. It is that production failures go undetected until a customer reports them. Without monitoring and alerting, the team has no way to verify production health after a deploy. “It works in staging” becomes the only signal, and production problems surface hours or days late.

Read more: Blind Operations

Tightly Coupled Monolith

Hidden dependencies between components mean that a change in one area affects behavior in another. In staging, these interactions may behave differently because the data is smaller, the load is lighter, or a dependent service is stubbed. In production, the full weight of real usage exposes coupling the team did not know existed.

Read more: Tightly Coupled Monolith

Manual Deployments

When deployment involves human steps (running scripts by hand, clicking through a console, copying files), the process is never identical twice. A step skipped in staging, an extra configuration applied in production, a different order of operations. The deployment itself becomes a source of variance between environments.

Read more: Manual Deployments

How to narrow it down

Are your environments provisioned from the same infrastructure code? If not, or if you are not sure, start with Snowflake Environments.
How did you discover the production failure? If a customer or support team reported it rather than an automated alert, start with Blind Operations.
Does the failure involve a different service or module than the one you changed? If yes, the issue is likely hidden coupling. Start with Tightly Coupled Monolith.
Is the deployment process identical and automated across all environments? If not, start with Manual Deployments.

It Works on My Machine - The same environment inconsistency pattern at a different stage
Tests Pass in One Environment but Fail in Another - Environment-dependent behavior is the common root
Snowflake Environments - Unique environments that diverge from production
Production-Like Environments - Making staging match production
Change Fail Rate - Track deployment failures that staging should have caught

3 - Integration and Feedback Problems

Symptoms related to work-in-progress, integration pain, review bottlenecks, and feedback speed.

These symptoms indicate problems with how work flows through your team. When integration is deferred, feedback is slow, or work piles up, the team stays busy without finishing things. Each page describes what you are seeing and links to the anti-patterns most likely causing it.

How to use this section

Related anti-pattern categories: Team Workflow Anti-Patterns, Branching and Integration Anti-Patterns

Related guides: Trunk-Based Development, Work Decomposition, Limiting WIP

3.1 - Everything Started, Nothing Finished

The board shows many items in progress but few reaching done. The team is busy but not delivering.

What you are seeing

Open the team’s board on any given day. Count the items in progress. Count the team members. If the first number is significantly higher than the second, the team has a WIP problem. Every developer is working on a different story. Eight items in progress, zero done. Nothing gets the focused attention needed to finish.

At the end of the sprint, there is a scramble to close anything. Stories that were “almost done” for days finally get pushed through. Cycle time is long and unpredictable. The team is busy all the time but finishes very little.

Common causes

Push-Based Work Assignment

When managers assign work to individuals rather than letting the team pull from a prioritized backlog, each person ends up with their own queue of assigned items. WIP grows because work is distributed across individuals rather than flowing through the team. Nobody swarms on blocked items because everyone is busy with “their” assigned work.

Read more: Push-Based Work Assignment

Horizontal Slicing

When work is split by technical layer (“build the database schema,” “build the API,” “build the UI”), each layer must be completed before anything is deployable. Multiple developers work on different layers of the same feature simultaneously, all “in progress,” none independently done. WIP is high because the decomposition prevents any single item from reaching completion quickly.

Read more: Horizontal Slicing

Unbounded WIP

When the team has no explicit constraint on how many items can be in progress simultaneously, there is nothing to prevent WIP from growing. Developers start new work whenever they are blocked, waiting for review, or between tasks. Without a limit, the natural tendency is to stay busy by starting things rather than finishing them.

Read more: Unbounded WIP

How to narrow it down

Does each developer have their own assigned backlog of work? If yes, the assignment model prevents swarming and drives individual queues. Start with Push-Based Work Assignment.
Are work items split by technical layer rather than by user-visible behavior? If yes, items cannot be completed independently. Start with Horizontal Slicing.
Is there any explicit limit on how many items can be in progress at once? If no, the team has no mechanism to stop starting and start finishing. Start with Unbounded WIP.

Work Items Take Days or Weeks to Complete - High WIP directly increases cycle time
Pull Requests Sit for Days Waiting for Review - Review queues are a common source of excess WIP
Unbounded WIP - No limits on work in progress
Limiting WIP - Setting and enforcing WIP limits
Work in Progress - Measuring and tracking WIP over time

3.2 - Feedback Takes Hours Instead of Minutes

The time from making a change to knowing whether it works is measured in hours, not minutes. Developers batch changes to avoid waiting.

What you are seeing

A developer makes a change and wants to know if it works. They push to CI and wait 45 minutes for the pipeline. Or they open a PR and wait two days for a review. Or they deploy to staging and wait for a manual QA pass that happens next week. By the time feedback arrives, the developer has moved on to something else.

The slow feedback changes developer behavior. They batch multiple changes into a single commit to avoid waiting multiple times. They skip local verification and push larger, less certain changes. They start new work before the previous change is validated, juggling multiple incomplete tasks.

When feedback finally arrives and something is wrong, the developer must context-switch back. The mental model from the original change has faded. Debugging takes longer because the developer is working from memory rather than from active context. If multiple changes were batched, the developer must untangle which one caused the failure.

Common causes

Inverted Test Pyramid

When most tests are slow E2E tests, the test feedback loop is measured in tens of minutes rather than seconds. Unit tests provide feedback in seconds. E2E tests take minutes or hours. A team with a fast unit test suite can verify a change in under a minute. A team whose testing relies on E2E tests cannot get feedback faster than those tests can run.

Read more: Inverted Test Pyramid

Integration Deferred

When the team does not integrate frequently (at least daily), the feedback loop for integration problems is as long as the branch lifetime. A developer working on a two-week branch does not discover integration conflicts until they merge. Daily integration catches conflicts within hours. Continuous integration catches them within minutes.

Read more: Integration Deferred

Manual Testing Only

When there are no automated tests, the only feedback comes from manual verification. A developer makes a change and must either test it manually themselves (slow) or wait for someone else to test it (slower). Automated tests provide feedback in the pipeline without requiring human effort or scheduling.

Read more: Manual Testing Only

Long-Lived Feature Branches

When pull requests wait days for review, the code review feedback loop dominates total cycle time. A developer finishes a change in two hours, then waits two days for review. The review feedback loop is 24 times longer than the development time. Long-lived branches produce large PRs, and large PRs take longer to review. Fast feedback requires fast reviews, which requires small PRs, which requires short-lived branches.

Read more: Long-Lived Feature Branches

Manual Regression Testing Gates

When every change must pass through a manual QA gate, the feedback loop includes human scheduling. The QA team has a queue. The change waits in line. When the tester gets to it, days have passed. Automated testing in the pipeline replaces this queue with instant feedback.

Read more: Manual Regression Testing Gates

How to narrow it down

How fast can the developer verify a change locally? If the local test suite takes more than a few minutes, the test strategy is the bottleneck. Start with Inverted Test Pyramid.
How frequently does the team integrate to main? If developers work on branches for days before integrating, the integration feedback loop is the bottleneck. Start with Integration Deferred.
Are there automated tests at all? If the only feedback is manual testing, the lack of automation is the bottleneck. Start with Manual Testing Only.
How long do PRs wait for review? If review turnaround is measured in days, the review process is the bottleneck. Start with Long-Lived Feature Branches.
Is there a manual QA gate in the pipeline? If changes wait in a QA queue, the manual gate is the bottleneck. Start with Manual Regression Testing Gates.

Pipelines Take Too Long - Pipeline speed is the most common feedback bottleneck
Pull Requests Sit for Days Waiting for Review - Review queues add days to the feedback loop
Integration Deferred - Delayed integration delays feedback
Build Automation - Automated builds that give feedback in minutes
Testing Fundamentals - Fast tests as the foundation of fast feedback
Build Duration - Track the speed of your feedback loop

3.3 - Merging Is Painful and Time-Consuming

Integration is a dreaded, multi-day event. Teams delay merging because it is painful, which makes the next merge even worse.

What you are seeing

A developer has been working on a feature branch for two weeks. They open a pull request and discover dozens of conflicts across multiple files. Other developers have changed the same areas of the codebase. Resolving the conflicts takes a full day. Some conflicts are straightforward (two people edited adjacent lines), but others are semantic (two people changed the same function’s behavior in different ways). The developer must understand both changes to merge correctly.

After resolving conflicts, the tests fail. The merged code compiles but does not work because the two changes are logically incompatible. The developer spends another half-day debugging the interaction. By the time the branch is merged, the developer has spent more time integrating than they spent building the feature.

The team knows merging is painful, so they delay it. The delay makes the next merge worse because more code has diverged. The cycle repeats until someone declares a “merge day” and the team spends an entire day resolving accumulated drift.

Common causes

Long-Lived Feature Branches

When branches live for weeks or months, they accumulate divergence from the main line. The longer the branch lives, the more changes happen on main that the branch does not include. At merge time, all of that divergence must be reconciled at once. A branch that is one day old has almost no conflicts. A branch that is two weeks old may have dozens.

Read more: Long-Lived Feature Branches

Integration Deferred

When the team does not practice continuous integration (integrating to main at least daily), each developer’s work diverges independently. The build may be green on each branch but broken when branches combine. CI means integrating continuously, not running a build server. Without frequent integration, merge pain is inevitable.

Read more: Integration Deferred

Monolithic Work Items

When work items are too large to complete in a day or two, developers must stay on a branch for the duration. A story that takes a week forces a week-long branch. Breaking work into smaller increments that can be integrated daily eliminates the divergence window that causes painful merges.

Read more: Monolithic Work Items

How to narrow it down

How long do branches typically live before merging? If branches live longer than two days, the branch lifetime is the primary driver of merge pain. Start with Long-Lived Feature Branches.
Does the team integrate to main at least once per day? If developers work in isolation for days before integrating, they are not practicing continuous integration regardless of whether a CI server exists. Start with Integration Deferred.
How large are the typical work items? If stories take a week or more, the work decomposition forces long branches. Start with Monolithic Work Items.

Work Items Take Days or Weeks to Complete - Long-lived work creates the divergence that makes merges painful
Feedback Takes Hours Instead of Minutes - Merge pain discourages frequent integration
Long-Lived Feature Branches - The primary cause of merge conflicts
Trunk-Based Development - Integrating at least daily to prevent divergence
Integration Frequency - Measure how often developers integrate to trunk

3.4 - Pull Requests Sit for Days Waiting for Review

Pull requests queue up and wait. Authors have moved on by the time feedback arrives.

What you are seeing

A developer opens a pull request and waits. Hours pass. A day passes. They ping someone in chat. Eventually, comments arrive, but the author has moved on to something else and has to reload context to respond. Another round of comments. Another wait. The PR finally merges two or three days after it was opened.

The team has five or more open PRs at any time. Some are days old. Developers start new work while they wait, which creates more PRs, which creates more review load, which slows reviews further.

Common causes

Long-Lived Feature Branches

When developers work on branches for days, the resulting PRs are large. Large PRs take longer to review because reviewers need more time to understand the scope of the change. A 300-line PR is daunting. A 50-line PR takes 10 minutes. The branch length drives the PR size, which drives the review delay.

Read more: Long-Lived Feature Branches

Knowledge Silos

When only specific individuals can review certain areas of the codebase, those individuals become bottlenecks. Their review queue grows while other team members who could review are not considered qualified. The constraint is not review capacity in general but review capacity for specific code areas concentrated in too few people.

Read more: Knowledge Silos

Push-Based Work Assignment

When work is assigned to individuals, reviewing someone else’s code feels like a distraction from “my work.” Every developer has their own assigned stories to protect. Helping a teammate finish their work by reviewing their PR competes with the developer’s own assignments. The incentive structure deprioritizes collaboration.

Read more: Push-Based Work Assignment

How to narrow it down

Are PRs larger than 200 lines on average? If yes, the reviews are slow because the changes are too large to review quickly. Start with Long-Lived Feature Branches and the work decomposition that feeds them.
Are reviews waiting on specific individuals? If most PRs are assigned to or waiting on one or two people, the team has a knowledge bottleneck. Start with Knowledge Silos.
Do developers treat review as lower priority than their own coding work? If yes, the team’s norms do not treat review as a first-class activity. Start with Push-Based Work Assignment and establish a team working agreement that reviews happen before starting new work.

Everything Started, Nothing Finished - Blocked PRs drive up work in progress
Feedback Takes Hours Instead of Minutes - Review delays are a form of slow feedback
Long-Lived Feature Branches - Branches that outlive their review window
Code Review - Making review fast and continuous
Trunk-Based Development - Short-lived branches that are reviewed same-day
Development Cycle Time - Track review wait time as part of cycle time

3.5 - Pipelines Take Too Long

CI/CD pipelines take 30 minutes or more. Developers stop waiting and lose the feedback loop.

What you are seeing

A developer pushes a commit and waits. Thirty minutes pass. An hour. The pipeline is still running. The developer context-switches to another task, and by the time the pipeline finishes (or fails), they have moved on mentally. If the build fails, they must reload context, figure out what went wrong, fix it, push again, and wait another 30 minutes.

Developers stop running the full test suite locally because it takes too long. They push and hope. Some developers batch multiple changes into a single push to avoid waiting multiple times, which makes failures harder to diagnose. Others skip the pipeline entirely for small changes and merge with only local verification.

The pipeline was supposed to provide fast feedback. Instead, it provides slow feedback that developers work around rather than rely on.

Common causes

Inverted Test Pyramid

When most of the test suite consists of end-to-end or integration tests rather than unit tests, the pipeline is dominated by slow, resource-intensive test execution. E2E tests launch browsers, spin up services, and wait for network responses. A test suite with thousands of unit tests (that run in seconds) and a small number of targeted E2E tests is fast. A suite with hundreds of E2E tests and few unit tests is slow by construction.

Read more: Inverted Test Pyramid

Snowflake Environments

When pipeline environments are not standardized or reproducible, builds include extra time for environment setup, dependency installation, and configuration. Caching is unreliable because the environment state is unpredictable. A pipeline that spends 15 minutes downloading dependencies because there is no reliable cache layer is slow for infrastructure reasons, not test reasons.

Read more: Snowflake Environments

Tightly Coupled Monolith

When the codebase has no clear module boundaries, every change triggers a full rebuild and a full test run. The pipeline cannot selectively build or test only the affected components because the dependency graph is tangled. A change to one module might affect any other module, so the pipeline must verify everything.

Read more: Tightly Coupled Monolith

Manual Regression Testing Gates

When the pipeline includes a manual testing phase, the wall-clock time from push to green includes human wait time. A pipeline that takes 10 minutes to build and test but then waits two days for manual sign-off is not a 10-minute pipeline. It is a two-day pipeline with a 10-minute automated prefix.

Read more: Manual Regression Testing Gates

How to narrow it down

What percentage of pipeline time is spent running tests? If test execution dominates and most tests are E2E or integration tests, the test strategy is the bottleneck. Start with Inverted Test Pyramid.
How much time is spent on environment setup and dependency installation? If the pipeline spends significant time on infrastructure before any tests run, the build environment is the bottleneck. Start with Snowflake Environments.
Can the pipeline build and test only the changed components? If every change triggers a full rebuild, the architecture prevents selective testing. Start with Tightly Coupled Monolith.
Does the pipeline include any manual steps? If a human must approve or act before the pipeline completes, the human is the bottleneck. Start with Manual Regression Testing Gates.

Feedback Takes Hours Instead of Minutes - Slow pipelines are the primary cause of slow feedback
Test Suite Is Too Slow to Run - Slow tests are the most common cause of slow pipelines
Inverted Test Pyramid - Too many slow tests at the wrong level
Build Automation - Pipeline design for speed
Pipeline Architecture - Optimizing pipeline structure
Build Duration - Track pipeline speed as a first-class metric

3.6 - Work Items Take Days or Weeks to Complete

Stories regularly take more than a week from start to done. Developers go days without integrating.

What you are seeing

A developer picks up a work item on Monday. By Wednesday, they are still working on it. By Friday, it is “almost done.” The following Monday, they are fixing edge cases. The item finally moves to review mid-week as a 300-line pull request that the reviewer does not have time to look at carefully.

Cycle time is measured in weeks, not days. The team commits to work at the start of the sprint and scrambles at the end. Estimates are off by a factor of two because large items hide unknowns that only surface mid-implementation.

Common causes

Horizontal Slicing

When work is split by technical layer rather than by user-visible behavior, each item spans an entire layer and takes days to complete. “Build the database schema,” “build the API,” “build the UI” are each multi-day items. Nothing is deployable until all layers are done. Vertical slicing (cutting thin slices through all layers to deliver complete functionality) produces items that can be finished in one to two days.

Read more: Horizontal Slicing

Monolithic Work Items

When the team takes requirements as they arrive without breaking them into smaller pieces, work items are as large as the feature they describe. A ticket titled “Add user profile page” hides a login form, avatar upload, email verification, notification preferences, and password reset. Without a decomposition practice during refinement, items arrive at planning already too large to flow.

Read more: Monolithic Work Items

Long-Lived Feature Branches

When developers work on branches for days or weeks, the branch and the work item are the same size: large. The branching model reinforces large items because there is no integration pressure to finish quickly. Trunk-based development creates natural pressure to keep items small enough to integrate daily.

Read more: Long-Lived Feature Branches

How to narrow it down

Are work items split by technical layer? If the board shows items like “backend for feature X” and “frontend for feature X,” the decomposition is horizontal. Start with Horizontal Slicing.
Do items arrive at planning without being broken down? If items go from “product owner describes a feature” to “developer starts coding” without a decomposition step, start with Monolithic Work Items.
Do developers work on branches for more than a day? If yes, the branching model allows and encourages large items. Start with Long-Lived Feature Branches.

Everything Started, Nothing Finished - High WIP and long cycle times reinforce each other
Merging Is Painful and Time-Consuming - Long-lived work creates merge pain that further slows delivery
Monolithic Work Items - Stories too large to finish quickly
Work Decomposition - Breaking work into small, deliverable slices
Development Cycle Time - Measure time from first commit to deployable

4 - Production Visibility and Team Health

Symptoms related to production observability, incident detection, environment parity, and team sustainability.

These symptoms indicate problems with how your team sees and responds to production issues. When problems are invisible until customers report them, or when the team is burning out from process overhead, the delivery system is working against the people in it. Each page describes what you are seeing and links to the anti-patterns most likely causing it.

How to use this section

Related anti-pattern categories: Monitoring and Observability Anti-Patterns, Organizational and Cultural Anti-Patterns

4.1 - Team Burnout and Unsustainable Pace

The team is exhausted. Every sprint is a crunch sprint. There is no time for learning, improvement, or recovery.

What you are seeing

The team is always behind. Sprint commitments are missed or met only through overtime. Developers work evenings and weekends to hit deadlines, then start the next sprint already tired. There is no buffer for unplanned work, so every production incident or stakeholder escalation blows up the plan.

Nobody has time for learning, experimentation, or process improvement. Suggestions like “let’s improve our test suite” or “let’s automate that deployment” are met with “we don’t have time.” The irony is that the manual work those improvements would eliminate is part of what keeps the team too busy.

Attrition risk is high. The most experienced developers leave first because they have options. Their departure increases the load on whoever remains, accelerating the cycle.

Common causes

Thin-Spread Teams

When a small team owns too many products, every developer is stretched across multiple codebases. Context switching consumes 20 to 40 percent of their capacity. The team looks fully utilized but delivers less than a focused team half its size. The utilization trap (“keep everyone busy”) masks the real problem: the team has more responsibilities than it can sustain.

Read more: Thin-Spread Teams

Deadline-Driven Development

When every sprint is driven by an arbitrary deadline, the team never operates at a sustainable pace. There is no recovery period after a crunch because the next deadline starts immediately. Quality is the first casualty, which creates rework, which consumes future capacity, which makes the next deadline even harder to meet. The cycle accelerates until the team collapses.

Read more: Deadline-Driven Development

Unbounded WIP

When there is no limit on work in progress, the team starts many things and finishes few. Every developer juggles multiple items, each getting fragmented attention. The sensation of being constantly busy but never finishing anything is a direct contributor to burnout. The team is working hard on everything and completing nothing.

Read more: Unbounded WIP

Velocity as Individual Metric

When individual story points are tracked, developers cannot afford to help each other, take time to learn, or invest in quality. Every hour must produce measurable output. The pressure to perform individually eliminates the slack that teams need to stay healthy. Helping a teammate, mentoring a junior developer, or improving a build script all become career risks because they do not produce points.

Read more: Velocity as Individual Metric

How to narrow it down

Is the team responsible for more products than it can sustain? If developers are spread across many products with constant context switching, the workload exceeds what the team structure can handle. Start with Thin-Spread Teams.
Is every sprint driven by an external deadline? If the team has not had a sprint without deadline pressure in months, the pace is unsustainable by design. Start with Deadline-Driven Development.
Does the team have more items in progress than team members? If WIP is unbounded and developers juggle multiple items, the team is thrashing rather than delivering. Start with Unbounded WIP.
Are individuals measured by story points or velocity? If developers feel pressure to maximize personal output at the expense of collaboration and sustainability, the measurement system is contributing to burnout. Start with Velocity as Individual Metric.

Everything Started, Nothing Finished - High WIP is a direct contributor to burnout
Pull Requests Sit for Days Waiting for Review - Blocked work creates frustration and context switching
Thin-Spread Teams - Teams spread across too many responsibilities
Working Agreements - Explicit team norms that protect sustainable pace
Limiting WIP - Reducing overload by constraining work in progress
Work in Progress - Track WIP as a leading indicator of team health

4.2 - Production Issues Discovered by Customers

The team finds out about production problems from support tickets, not alerts.

What you are seeing

The team deploys a change. Someone asks “is it working?” Nobody knows. There is no dashboard to check. There are no metrics to compare before and after. The team waits. If nobody complains within an hour, they assume the deployment was successful.

When something does go wrong, the team finds out from a customer support ticket, a Slack message from another team, or an executive asking why the site is slow. The investigation starts with SSH-ing into a server and reading raw log files. Hours pass before anyone understands what happened, what caused it, or how many users were affected.

Common causes

The team has no application-level metrics, no centralized logging, and no alerting. The infrastructure may report that servers are running, but nobody can tell whether the application is actually working correctly. Without instrumentation, the only way to discover a problem is to wait for someone to experience it and report it.

Read more: Blind Operations

Manual Deployments

When deployments involve human steps (running scripts by hand, clicking through a console), there is no automated verification step. The deployment process ends when the human finishes the steps, not when the system confirms it is healthy. Without an automated pipeline that checks health metrics after deploying, verification falls to manual spot-checking or waiting for complaints.

Read more: Manual Deployments

Missing Deployment Pipeline

When there is no automated path from commit to production, there is nowhere to integrate automated health checks. A deployment pipeline can include post-deploy verification that compares metrics before and after. Without a pipeline, verification is entirely manual and usually skipped under time pressure.

Read more: Missing Deployment Pipeline

How to narrow it down

Does the team have application-level metrics and alerts? If no, the team has no way to detect problems automatically. Start with Blind Operations.
Is the deployment process automated with health checks? If deployments are manual or automated without post-deploy verification, problems go undetected until users report them. Start with Manual Deployments or Missing Deployment Pipeline.
Does the team check a dashboard after every deployment? If the answer is “sometimes” or “we click through the app manually,” the verification step is unreliable. Start with Blind Operations to build automated verification.

Production Problems Are Discovered Hours or Days Late - Both symptoms indicate missing observability
Staging Passes but Production Fails - Staging does not catch what monitoring would
Blind Operations - No monitoring, no alerting, no visibility
Progressive Rollout - Canary deployments that detect problems before full rollout
Mean Time to Repair - Measure how quickly the team detects and resolves incidents

4.3 - Production Problems Are Discovered Hours or Days Late

Issues in production are not discovered until users report them. There is no automated detection or alerting.

What you are seeing

A deployment goes out on Tuesday. On Thursday, a support ticket comes in: a feature is broken for a subset of users. The team investigates and discovers the problem was introduced in Tuesday’s deploy. For two days, users experienced the issue while the team had no idea.

Or a performance degradation appears gradually. Response times creep up over a week. Nobody notices until a customer complains or a business metric drops. The team checks the dashboards and sees the degradation started after a specific deploy, but the deploy was days ago and the trail is cold.

The team deploys carefully and then “watches for a while.” Watching means checking a few URLs manually or refreshing a dashboard for 15 minutes. If nothing obviously breaks in that window, the deployment is declared successful. Problems that manifest slowly, affect a subset of users, or appear under specific conditions go undetected.

Common causes

When the team has no monitoring, no alerting, and no aggregated logging, production is a black box. The only signal that something is wrong comes from users, support staff, or business reports. The team cannot detect problems because they have no instruments to detect them with. Adding observability (metrics, structured logging, distributed tracing, alerting) gives the team eyes on production.

Read more: Blind Operations

Undone Work

When the team’s definition of done does not include post-deployment verification, nobody is responsible for confirming that the deployment is healthy. The story is “done” when the code is merged or deployed, not when it is verified in production. Health checks, smoke tests, and canary analysis are not part of the workflow because the workflow ends before production.

Read more: Undone Work

Manual Deployments

When deployments are manual, there is no automated post-deploy verification step. An automated pipeline can include health checks, smoke tests, and rollback triggers as part of the deployment sequence. A manual deployment ends when the human finishes the runbook. Whether the deployment is actually healthy is a separate question that may or may not get answered.

Read more: Manual Deployments

How to narrow it down

Does the team have production monitoring with alerting thresholds? If not, the team cannot detect problems that users do not report. Start with Blind Operations.
Does the team’s definition of done include post-deploy verification? If stories are closed before production health is confirmed, nobody owns the detection step. Start with Undone Work.
Does the deployment process include automated health checks? If deployments end when the human finishes the script, there is no automated verification. Start with Manual Deployments.

Production Issues Discovered by Customers - The next stage of the same problem: customers become the monitoring
The Team Is Afraid to Deploy - Slow detection makes deployments feel riskier
Blind Operations - The root cause when no automated detection exists
Pipeline Architecture - Embedding health checks into the deployment process
Progressive Rollout - Automated rollback on health check failure
Mean Time to Repair - Track detection and recovery speed

4.4 - It Works on My Machine

Code that works in one developer’s environment fails in another, in CI, or in production. Environment differences make results unreproducible.

What you are seeing

A developer runs the application locally and everything works. They push to CI and the build fails. Or a teammate pulls the same branch and gets a different result. Or a bug report comes in that nobody can reproduce locally.

The team spends hours debugging only to discover the issue is environmental: a different Node version, a missing system library, a different database encoding, or a service running on the developer’s machine that is not available in CI. The code is correct. The environments are different.

New team members experience this acutely. Setting up a development environment takes days of following an outdated wiki page, asking teammates for help, and discovering undocumented dependencies. Every developer’s machine accumulates unique configuration over time, making “works on my machine” a common refrain and a useless debugging signal.

Common causes

Snowflake Environments

When development environments are set up manually and maintained individually, each developer’s machine becomes unique. One developer installed Python 3.9, another has 3.11. One has PostgreSQL 14, another has 15. These differences are invisible until someone hits a version-specific behavior. Reproducible, containerized development environments eliminate the variance by ensuring every developer works in an identical setup.

Read more: Snowflake Environments

Manual Deployments

When environment setup is a manual process documented in a wiki or README, it is never followed identically. Each developer interprets the instructions slightly differently, installs a slightly different version, or skips a step that seems optional. The manual process guarantees divergence over time. Infrastructure as code and automated setup scripts ensure consistency.

Read more: Manual Deployments

Tightly Coupled Monolith

When the application has implicit dependencies on its environment (specific file paths, locally running services, system-level configuration), it is inherently sensitive to environmental differences. Well-designed code with explicit, declared dependencies works the same way everywhere. Code that reaches into its runtime environment for undeclared dependencies works only where those dependencies happen to exist.

Read more: Tightly Coupled Monolith

How to narrow it down

Do all developers use the same OS, runtime versions, and dependency versions? If not, environment divergence is the most likely cause. Start with Snowflake Environments.
Is the development environment setup automated or manual? If it is a wiki page that takes a day to follow, the manual process creates the divergence. Start with Manual Deployments.
Does the application depend on local services, file paths, or system configuration that is not declared in the codebase? If the application has implicit environmental dependencies, it will behave differently wherever those dependencies differ. Start with Tightly Coupled Monolith.

Tests Pass in One Environment but Fail in Another - The same root cause manifests in both development and testing
Staging Passes but Production Fails - Environment inconsistency at the deployment stage
Snowflake Environments - Unique environments that diverge over time
Production-Like Environments - Making all environments consistent and reproducible
Everything as Code - Infrastructure and configuration managed in version control

5 - Find Your Symptom

Answer a few questions to narrow down which dysfunction symptoms match your situation.

Expand the category that best describes what your team is experiencing, then follow the sub-questions to find the most relevant symptom pages.

We have problems with our tests

Tests pass sometimes and fail sometimes without code changes

Your tests are non-deterministic. This is often caused by environment differences or test architecture that depends on external systems.

Tests Randomly Pass or Fail - Pipeline fails, rerun passes, nobody investigates
Tests Pass in One Environment but Fail in Another - Works locally, fails in CI, or the reverse

We have good coverage numbers but bugs still reach production

Coverage measures which lines execute, not whether the tests verify correct behavior. High coverage with low defect detection points to a test design problem.

High Coverage but Tests Miss Defects - Tests assert implementation details instead of behavior

Refactoring is risky because it breaks tests

When tests are coupled to implementation details rather than behavior, any internal change causes test failures even when the behavior is correct.

Refactoring Breaks Tests - Internal changes break tests that should not care about implementation

The test suite takes too long to run

Slow tests delay feedback and encourage developers to skip running them locally.

Test Suite Is Too Slow to Run - Tests take so long that developers avoid running them
Pipelines Take Too Long - The overall pipeline is slow, not just tests

Deploying and releasing is painful

The team avoids or dreads deployments

When deployments frequently cause incidents, the team learns to treat them as high-risk events.

The Team Is Afraid to Deploy - Deployments cause anxiety because they frequently fail
Releases Are Infrequent and Painful - The team batches changes into large, risky releases

We need to coordinate multiple services or teams to deploy

Deployment coordination signals architectural coupling or process constraints.

Multiple Services Must Be Deployed Together - Services cannot be deployed independently
Merge Freezes Before Deployments - The team stops merging to stabilize before a release

We need a stabilization period before each release

If you need dedicated time to “harden” before releasing, the normal development process is not producing releasable code.

Hardening Sprints Are Needed Before Every Release - Extra time required to make code production-ready
Staging Passes but Production Fails - Staging environment does not catch production problems

Work is slow and things pile up

Lots of things are in progress but few are finishing

High work-in-progress means the team is spread thin. Nothing gets the focus needed to finish.

Everything Started, Nothing Finished - The board shows many items in progress, few reaching done
Work Items Take Days or Weeks to Complete - Individual items take far longer than estimated

Merging and integrating code is difficult

When integration is deferred, branches diverge and merging becomes painful.

Merging Is Painful and Time-Consuming - Merges require significant effort to resolve conflicts
Pull Requests Sit for Days Waiting for Review - Code waits in the review queue instead of flowing forward

Feedback on changes takes too long

Slow feedback loops mean developers context-switch away and problems grow before they are caught.

Feedback Takes Hours Instead of Minutes - Developers wait hours or days to learn if a change works
Pipelines Take Too Long - The CI/CD pipeline itself is the bottleneck

Production problems and team health

Customers find problems before we do

If your monitoring does not catch issues before users report them, you have an observability gap.

Production Issues Discovered by Customers - Users report bugs the team did not know existed
Production Problems Are Discovered Hours or Days Late - Incidents go unnoticed until impact accumulates

Code behaves differently in different environments

Environment inconsistency makes it impossible to reproduce problems reliably.

It Works on My Machine - Code works locally but fails elsewhere
Tests Pass in One Environment but Fail in Another - Environment differences cause test failures

The team is exhausted from process overhead

When the delivery process creates friction at every step, the team burns out.

Team Burnout and Unsustainable Pace - Process overhead is wearing the team down

6 - Symptoms for Developers

Dysfunction symptoms grouped by the friction developers and tech leads experience - from daily coding pain to team-level delivery patterns.

These are the symptoms you experience while writing, testing, and shipping code. Some you feel personally. Others you see as patterns across the team. If something on this list sounds familiar, follow the link to find what is causing it and how to fix it.

Pushing code and getting feedback

Pipelines Take Too Long - You push a change, then wait 30 minutes or more to find out if it passed. Pipeline duration limits how often the team can integrate.
Feedback Takes Hours Instead of Minutes - You do not learn whether a change works until long after you wrote it. Developers batch changes to avoid the wait.
Pull Requests Sit for Days Waiting for Review - Your PR is ready, but no one reviews it for days. You start another branch. Now you have two things in flight and neither is done.

Tests getting in the way

Tests Randomly Pass or Fail - You click rerun without investigating because flaky failures are so common. The team ignores failures by default, which masks real regressions.
Refactoring Breaks Tests - You rename a method or restructure a class and 15 tests fail, even though the behavior is correct. Technical debt accumulates because cleanup is too expensive.
Test Suite Is Too Slow to Run - Running tests locally is so slow that you skip it and push to CI instead, trading fast feedback for a longer loop.
High Coverage but Tests Miss Defects - Coverage is above 80% but bugs still make it to production. The tests check that code runs, not that it works correctly.

Integrating and merging

Merging Is Painful and Time-Consuming - Your branch has diverged so far from main that merging takes hours of conflict resolution.
Everything Started, Nothing Finished - The board is full of in-progress items but the done column is empty. The team is busy but throughput is low.
Work Items Take Days or Weeks to Complete - Cycle time is long and unpredictable. Items sit in progress for days because they are too large or blocked by dependencies.

Deploying and releasing

The Team Is Afraid to Deploy - Deployments are treated as high-risk events requiring full-team attention. The team deploys less often, which makes each deployment larger and riskier.
Releases Are Infrequent and Painful - Releases happen monthly or quarterly and require significant coordination, manual testing, and rollback plans.
Merge Freezes Before Deployments - The team stops merging to stabilize before each release, creating artificial bottlenecks and deferred work.
Hardening Sprints Are Needed Before Every Release - A dedicated stabilization period is needed before every release because the normal process does not produce releasable code.
Multiple Services Must Be Deployed Together - Services are coupled so that deploying one requires deploying others at the same time.

Environment and production surprises

It Works on My Machine - Code passes all your local tests but fails in CI or production. You cannot reproduce the problem locally.
Tests Pass in One Environment but Fail in Another - The same test produces different results depending on where it runs.
Staging Passes but Production Fails - The staging environment gives false confidence. Problems that staging should catch reach production.
Production Issues Discovered by Customers - The team learns about production problems from customer reports instead of monitoring.
Production Problems Are Discovered Hours or Days Late - Incidents are not detected until the impact has already accumulated.

7 - Symptoms for Managers

Dysfunction symptoms grouped by business impact - unpredictable delivery, quality, and team health.

These are the symptoms that show up in sprint reviews, quarterly planning, and 1-on-1s. They manifest as missed commitments, quality problems, and retention risk.

Unpredictable delivery

Everything Started, Nothing Finished - The team reports progress on many items but finishes few. Sprint commitments are routinely missed because work that seemed “almost done” stalls.
Work Items Take Days or Weeks to Complete - Estimates are consistently wrong. A “3-day story” takes two weeks. Forecasting becomes unreliable.
Releases Are Infrequent and Painful - The organization can only ship quarterly because each release requires weeks of stabilization. Business opportunities are lost to lead time.
Hardening Sprints Are Needed Before Every Release - The team needs dedicated time to “harden” before every release. This hidden cost is not visible in velocity metrics.

Quality reaching customers

Production Issues Discovered by Customers - Customers report bugs before the team knows about them. Each incident erodes trust and creates unplanned support work.
Staging Passes but Production Fails - The team followed the process - tests passed, staging looked good - but production still broke. The process gives false confidence.
High Coverage but Tests Miss Defects - The team reports strong test coverage numbers, but defects keep reaching production. The metric is not measuring what it appears to measure.
Production Problems Are Discovered Hours or Days Late - Problems are not detected until the blast radius has grown. The mean time to detect is measured in hours or days, not minutes.

Coordination overhead

Multiple Services Must Be Deployed Together - Deploying requires coordination across teams and services. This creates scheduling dependencies and increases the cost of every change.
Merge Freezes Before Deployments - Development stops before each release so the team can stabilize. This idle time is invisible but costly.
The Team Is Afraid to Deploy - Deployments are treated as risky events. The team prefers to batch and delay rather than ship frequently, which amplifies risk.

Team health and retention

Team Burnout and Unsustainable Pace - Process friction, on-call burden, and deployment stress are wearing the team down. Attrition risk is high.
Merging Is Painful and Time-Consuming - Developers spend significant time resolving merge conflicts instead of building features. This is invisible overhead that slows delivery.
Pull Requests Sit for Days Waiting for Review - Developers are blocked waiting for reviews. This creates frustration and drives up work-in-progress as they start new things while waiting.
It Works on My Machine - Environment inconsistency means developers waste time debugging problems that only appear in certain environments. This is preventable friction.

What to do next

If these symptoms sound familiar, these resources can help you build a case for change and find a starting point:

Phase 0: Assess - Map your value stream, take baseline measurements, and identify your top constraints.
DORA Capabilities - The research-backed capabilities that predict delivery performance. Use this to connect symptoms to organizational capabilities.
Metrics Reference - Definitions for the metrics used throughout this guide, including the four DORA metrics.
FAQ: How long does the migration take? - Rough timelines for each phase of the migration.
FAQ: What if our organization requires CAB? - How to move from manual change approval to automated evidence.

Dysfunction Symptoms

Find your symptom

Browse by category

Start from your role

Explore by theme

1 - Test Suite Problems

How to use this section

1.1 - Tests Pass in One Environment but Fail in Another

What you are seeing

Common causes

Snowflake Environments

Manual Deployments

Tightly Coupled Monolith

How to narrow it down

Related Content

1.2 - High Coverage but Tests Miss Defects

What you are seeing

Common causes

Inverted Test Pyramid

Pressure to Skip Testing

Code Coverage Mandates

Manual Testing Only

How to narrow it down

Related Content

1.3 - Refactoring Breaks Tests

What you are seeing

Common causes

Inverted Test Pyramid

Tightly Coupled Monolith

How to narrow it down

Related Content

1.4 - Test Suite Is Too Slow to Run

What you are seeing

Common causes

Inverted Test Pyramid

Tightly Coupled Monolith

Manual Testing Only

How to narrow it down

Related Content

1.5 - Tests Randomly Pass or Fail

What you are seeing

Common causes

Inverted Test Pyramid

Snowflake Environments

Tightly Coupled Monolith

How to narrow it down

Related Content

2 - Deployment and Release Problems

How to use this section

2.1 - Multiple Services Must Be Deployed Together

What you are seeing

Common causes

Tightly Coupled Architecture

Distributed Monolith

Horizontal Slicing

Undone Work

How to narrow it down

Related Content

2.2 - The Team Is Afraid to Deploy

What you are seeing

Common causes

Manual Deployments

Missing Deployment Pipeline

Blind Operations

Manual Testing Only

Monolithic Work Items

How to narrow it down

Related Content

2.3 - Hardening Sprints Are Needed Before Every Release

What you are seeing

Common causes

Manual Testing Only

Inverted Test Pyramid

Undone Work

Monolithic Work Items

Pressure to Skip Testing

How to narrow it down

Related Content

2.4 - Releases Are Infrequent and Painful

What you are seeing