This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Phase 3: Optimize
Improve flow by reducing batch size, limiting work in progress, and using metrics to drive improvement.
Key question: “Can we deliver small changes quickly?”
With a working pipeline in place, this phase focuses on optimizing the flow of changes
through it. Smaller batches, feature flags, and WIP limits reduce risk and increase
delivery frequency.
What You’ll Do
- Reduce batch size - Deliver smaller, more frequent changes
- Use feature flags - Decouple deployment from release
- Limit work in progress - Focus on finishing over starting
- Drive improvement with metrics - Use DORA metrics and improvement kata
- Run effective retrospectives - Continuously improve the delivery process
- Decouple architecture - Enable independent deployment of components
Why This Phase Matters
Having a pipeline isn’t enough - you need to optimize the flow through it. Teams that
deploy weekly with a CD pipeline are missing most of the benefits. Small batches reduce
risk, feature flags enable testing in production, and metrics-driven improvement creates
a virtuous cycle of getting better at getting better.
When You’re Ready to Move On
You’re ready for Phase 4: Deliver on Demand when:
- Most changes are small enough to deploy independently
- Feature flags let you deploy incomplete features safely
- Your WIP limits keep work flowing without bottlenecks
- You’re measuring and improving your DORA metrics regularly
1 - Small Batches
Deliver smaller, more frequent changes to reduce risk and increase feedback speed.
Phase 3 - Optimize | Adapted from MinimumCD.org
Batch size is the single biggest lever for improving delivery performance. This page covers what batch size means at every level - deploy frequency, commit size, and story size - and provides concrete techniques for reducing it.
Why Batch Size Matters
Large batches create large risks. When you deploy 50 changes at once, any failure could be caused by any of those 50 changes. When you deploy 1 change, the cause of any failure is obvious.
This is not a theory. The DORA research consistently shows that elite teams deploy more frequently, with smaller changes, and have both higher throughput and lower failure rates. Small batches are the mechanism that makes this possible.
“If it hurts, do it more often, and bring the pain forward.”
- Jez Humble, Continuous Delivery
Three Levels of Batch Size
Batch size is not just about deployments. It operates at three distinct levels, and optimizing only one while ignoring the others limits your improvement.
Level 1: Deploy Frequency
How often you push changes to production.
| State |
Deploy Frequency |
Risk Profile |
| Starting |
Monthly or quarterly |
Each deploy is a high-stakes event |
| Improving |
Weekly |
Deploys are planned but routine |
| Optimizing |
Daily |
Deploys are unremarkable |
| Elite |
Multiple times per day |
Deploys are invisible |
How to reduce: Remove manual gates, automate approval workflows, build confidence through progressive rollout. If your pipeline is reliable (Phase 2), the only thing preventing more frequent deploys is organizational habit.
Level 2: Commit Size
How much code changes in each commit to trunk.
| Indicator |
Too Large |
Right-Sized |
| Files changed |
20+ files |
1-5 files |
| Lines changed |
500+ lines |
Under 100 lines |
| Review time |
Hours or days |
Minutes |
| Merge conflicts |
Frequent |
Rare |
| Description length |
Paragraph needed |
One sentence suffices |
How to reduce: Practice TDD (write one test, make it pass, commit). Use feature flags to merge incomplete work. Pair program so review happens in real time.
Level 3: Story Size
How much scope each user story or work item contains.
A story that takes a week to complete is a large batch. It means a week of work piles up before integration, a week of assumptions go untested, and a week of inventory sits in progress.
Target: Every story should be completable - coded, tested, reviewed, and integrated - in two days or less. If it cannot be, it needs to be decomposed further.
Behavior-Driven Development for Decomposition
BDD provides a concrete technique for breaking stories into small, testable increments. The Given-When-Then format forces clarity about scope.
The Given-When-Then Pattern
Each scenario becomes a deliverable increment. You can implement and deploy the first scenario before starting the second. This is how you turn a “discount feature” (large batch) into three independent, deployable changes (small batches).
Decomposing Stories Using Scenarios
When a story has too many scenarios, it is too large. Use this process:
- Write all the scenarios first. Before any code, enumerate every Given-When-Then for the story.
- Group scenarios into deliverable slices. Each slice should be independently valuable or at least independently deployable.
- Create one story per slice. Each story has 1-3 scenarios and can be completed in 1-2 days.
- Order the slices by value. Deliver the most important behavior first.
Example decomposition:
| Original Story |
Scenarios |
Sliced Into |
| “As a user, I can manage my profile” |
12 scenarios covering name, email, password, avatar, notifications, privacy, deactivation |
5 stories: basic info (2 scenarios), password (2), avatar (2), notifications (3), deactivation (3) |
Vertical Slicing
A vertical slice cuts through all layers of the system to deliver a thin piece of end-to-end functionality. This is the opposite of horizontal slicing, where you build all the database changes, then all the API changes, then all the UI changes.
Horizontal vs. Vertical Slicing
Horizontal (avoid):
Story 1: Build the database schema for discounts
Story 2: Build the API endpoints for discounts
Story 3: Build the UI for applying discounts
Problems: Story 1 and 2 deliver no user value. You cannot test end-to-end until story 3 is done. Integration risk accumulates.
Vertical (prefer):
Story 1: Apply a simple percentage discount (DB + API + UI for one scenario)
Story 2: Reject expired discount codes (DB + API + UI for one scenario)
Story 3: Apply discounts only to eligible items (DB + API + UI for one scenario)
Benefits: Every story delivers testable, deployable functionality. Integration happens with each story, not at the end. You can ship story 1 and get feedback before building story 2.
How to Slice Vertically
Ask these questions about each proposed story:
- Can a user (or another system) observe the change? If not, slice differently.
- Can I write an end-to-end test for it? If not, the slice is incomplete.
- Does it require all other slices to be useful? If yes, find a thinner first slice.
- Can it be deployed independently? If not, check whether feature flags could help.
Practical Steps for Reducing Batch Size
Week 1-2: Measure Current State
Before changing anything, measure where you are:
- Average commit size (lines changed per commit)
- Average story cycle time (time from start to done)
- Deploy frequency (how often changes reach production)
- Average changes per deploy (how many commits per deployment)
Week 3-4: Introduce Story Decomposition
- Start writing BDD scenarios before implementation
- Split any story estimated at more than 2 days
- Track the number of stories completed per week (expect this to increase as stories get smaller)
Week 5-8: Tighten Commit Size
- Adopt the discipline of “one logical change per commit”
- Use TDD to create a natural commit rhythm: write test, make it pass, commit
- Track average commit size and set a team target (e.g., under 100 lines)
Ongoing: Increase Deploy Frequency
- Deploy at least once per day, then work toward multiple times per day
- Remove any batch-oriented processes (e.g., “we deploy on Tuesdays”)
- Make deployment a non-event
Key Pitfalls
1. “Small stories take more overhead to manage”
This is true only if your process adds overhead per story (e.g., heavyweight estimation ceremonies, multi-level approval). The solution is to simplify the process, not to keep stories large. Overhead per story should be near zero for a well-decomposed story.
2. “Some things can’t be done in small batches”
Almost anything can be decomposed further. Database migrations can be done in backward-compatible steps. API changes can use versioning. UI changes can be hidden behind feature flags. The skill is in finding the decomposition, not in deciding whether one exists.
3. “We tried small stories but our throughput dropped”
This usually means the team is still working sequentially. Small stories require limiting WIP and swarming - see Limiting WIP. If the team starts 10 small stories instead of 2 large ones, they have not actually reduced batch size; they have increased WIP.
Measuring Success
Next Step
Small batches often require deploying incomplete features to production. Feature Flags provide the mechanism to do this safely.
This content is adapted from MinimumCD.org,
licensed under CC BY 4.0.
2 - Feature Flags
Decouple deployment from release by using feature flags to control feature visibility.
Phase 3 - Optimize | Adapted from MinimumCD.org
Feature flags are the mechanism that makes trunk-based development and small batches safe. They let you deploy code to production without exposing it to users, enabling dark launches, gradual rollouts, and instant rollback of features without redeploying.
Why Feature Flags?
In continuous delivery, deployment and release are two separate events:
- Deployment is pushing code to production.
- Release is making a feature available to users.
Feature flags are the bridge between these two events. They let you deploy frequently (even multiple times a day) without worrying about exposing incomplete or untested features. This separation is what makes continuous deployment possible for teams that ship real products to real users.
When You Need Feature Flags (and When You Don’t)
Not every change requires a feature flag. Flags add complexity, and unnecessary complexity slows you down. Use this decision tree to determine the right approach.
Decision Tree
Is the change user-visible?
├── No → Deploy without a flag
│ (refactoring, performance improvements, dependency updates)
│
└── Yes → Can it be completed and deployed in a single small batch?
├── Yes → Deploy without a flag
│ (bug fixes, copy changes, small UI tweaks)
│
└── No → Is there a seam in the code where you can introduce the change?
├── Yes → Consider Branch by Abstraction
│ (replacing a subsystem, swapping an implementation)
│
└── No → Is it a new feature with a clear entry point?
├── Yes → Use a Feature Flag
│
└── No → Consider Connect Tests Last
(build the internals first, wire them up last)
Alternatives to Feature Flags
| Technique |
How It Works |
When to Use |
| Branch by Abstraction |
Introduce an abstraction layer, build the new implementation behind it, switch when ready |
Replacing an existing subsystem or library |
| Connect Tests Last |
Build internal components without connecting them to the UI or API |
New backend functionality that has no user-facing impact until connected |
| Dark Launch |
Deploy the code path but do not route any traffic to it |
New infrastructure, new services, or new endpoints that are not yet referenced |
These alternatives avoid the lifecycle overhead of feature flags while still enabling trunk-based development with incomplete work.
Implementation Approaches
Feature flags can be implemented at different levels of sophistication. Start simple and add complexity only when needed.
Level 1: Static Code-Based Flags
The simplest approach: a boolean constant or configuration value checked in code.
Pros: Zero infrastructure. Easy to understand. Works everywhere.
Cons: Changing a flag requires a deployment. No per-user targeting. No gradual rollout.
Best for: Teams starting out. Internal tools. Changes that will be fully on or fully off.
Level 2: Dynamic In-Process Flags
Flags stored in a configuration file, database, or environment variable that can be changed at runtime without redeploying.
Pros: No redeployment needed. Supports percentage rollout. Simple to implement.
Cons: Each instance reads its own config - no centralized view. Limited targeting capabilities.
Best for: Teams that need gradual rollout but do not want to adopt a third-party service yet.
Level 3: Centralized Flag Service
A dedicated service (self-hosted or SaaS) that manages all flags, provides a dashboard, supports targeting rules, and tracks flag usage.
Examples: LaunchDarkly, Unleash, Flagsmith, Split, or a custom internal service.
Pros: Centralized management. Rich targeting (by user, plan, region, etc.). Audit trail. Real-time changes.
Cons: Added dependency. Cost (for SaaS). Network latency for flag evaluation (mitigated by local caching in most SDKs).
Best for: Teams at scale. Products with diverse user segments. Regulated environments needing audit trails.
Level 4: Infrastructure Routing
Instead of checking flags in application code, route traffic at the infrastructure level (load balancer, service mesh, API gateway).
Pros: No application code changes. Clean separation of routing from logic. Works across services.
Cons: Requires infrastructure investment. Less granular than application-level flags. Harder to target individual users.
Best for: Microservice architectures. Service-level rollouts. A/B testing at the infrastructure layer.
Feature Flag Lifecycle
Every feature flag has a lifecycle. Flags that are not actively managed become technical debt. Follow this lifecycle rigorously.
The Six Stages
1. CREATE → Define the flag, document its purpose and owner
2. DEPLOY OFF → Code ships to production with the flag disabled
3. BUILD → Incrementally add functionality behind the flag
4. DARK LAUNCH → Enable for internal users or a small test group
5. ROLLOUT → Gradually increase the percentage of users
6. REMOVE → Delete the flag and the old code path
Stage 1: Create
Before writing any code, define the flag:
- Name: Use a consistent naming convention (e.g.,
enable-new-checkout, feature.discount-engine)
- Owner: Who is responsible for this flag through its lifecycle?
- Purpose: One sentence describing what the flag controls
- Planned removal date: Set this at creation time. Flags without removal dates become permanent.
Stage 2: Deploy OFF
The first deployment includes the flag check but the flag is disabled. This verifies that:
- The flag infrastructure works
- The default (off) path is unaffected
- The flag check does not introduce performance issues
Stage 3: Build Incrementally
Continue building the feature behind the flag over multiple deploys. Each deploy adds more functionality, but the flag remains off for users. Test both paths in your automated suite:
Stage 4: Dark Launch
Enable the flag for internal users or a specific test group. This is your first validation with real production data and real traffic patterns. Monitor:
- Error rates for the flagged group vs. control
- Performance metrics (latency, throughput)
- Business metrics (conversion, engagement)
Stage 5: Gradual Rollout
Increase exposure systematically:
| Step |
Audience |
Duration |
What to Watch |
| 1 |
1% of users |
1-2 hours |
Error rates, latency |
| 2 |
5% of users |
4-8 hours |
Performance at slightly higher load |
| 3 |
25% of users |
1 day |
Business metrics begin to be meaningful |
| 4 |
50% of users |
1-2 days |
Statistically significant business impact |
| 5 |
100% of users |
- |
Full rollout |
At any step, if metrics degrade, roll back by disabling the flag. No redeployment needed.
Stage 6: Remove
This is the most commonly skipped step, and skipping it creates significant technical debt.
Once the feature has been stable at 100% for an agreed period (e.g., 2 weeks):
- Remove the flag check from code
- Remove the old code path
- Remove the flag definition from the flag service
- Deploy the simplified code
Set a maximum flag lifetime. A common practice is 90 days. Any flag older than 90 days triggers an automatic review. Stale flags are a maintenance burden and a source of confusion.
Key Pitfalls
1. “We have 200 feature flags and nobody knows what they all do”
This is flag debt, and it is as damaging as any other technical debt. Prevent it by enforcing the lifecycle: every flag has an owner, a purpose, and a removal date. Run a monthly flag audit.
2. “We use flags for everything, including configuration”
Feature flags and configuration are different concerns. Flags are temporary (they control unreleased features). Configuration is permanent (it controls operational behavior like timeouts, connection pools, log levels). Mixing them leads to confusion about what can be safely removed.
3. “Testing both paths doubles our test burden”
It does increase test effort, but this is a temporary cost. When the flag is removed, the extra tests go away too. The alternative - deploying untested code paths - is far more expensive.
4. “Nested flags create combinatorial complexity”
Avoid nesting flags whenever possible. If feature B depends on feature A, do not create a separate flag for B. Instead, extend the behavior behind feature A’s flag. If you must nest, document the dependency and test the specific combinations that matter.
Measuring Success
| Metric |
Target |
Why It Matters |
| Active flag count |
Stable or decreasing |
Confirms flags are being removed, not accumulating |
| Average flag age |
< 90 days |
Catches stale flags before they become permanent |
| Flag-related incidents |
Near zero |
Confirms flag management is not causing problems |
| Time from deploy to release |
Hours to days (not weeks) |
Confirms flags enable fast, controlled releases |
Next Step
Small batches and feature flags let you deploy more frequently, but deploying more means more work in progress. Limiting WIP ensures that increased deploy frequency does not create chaos.
This content is adapted from MinimumCD.org,
licensed under CC BY 4.0.
3 - Limiting Work in Progress
Focus on finishing work over starting new work to improve flow and reduce cycle time.
Phase 3 - Optimize | Adapted from Dojo Consortium
Work in progress (WIP) is inventory. Like physical inventory, it loses value the longer it sits unfinished. Limiting WIP is the most counterintuitive and most impactful practice in this entire migration: doing less work at once makes you deliver more.
Why Limiting WIP Matters
Every item of work in progress has a cost:
- Context switching: Moving between tasks destroys focus. Research consistently shows that switching between two tasks reduces productive time by 20-40%.
- Delayed feedback: Work that is started but not finished cannot be validated by users. The longer it sits, the more assumptions go untested.
- Hidden dependencies: The more items in progress simultaneously, the more likely they are to conflict, block each other, or require coordination.
- Longer cycle time: Little’s Law states that cycle time = WIP / throughput. If throughput is constant, the only way to reduce cycle time is to reduce WIP.
“Stop starting, start finishing.”
How to Set Your WIP Limit
The N+2 Starting Point
A practical starting WIP limit for a team is N+2, where N is the number of team members actively working on delivery.
| Team Size |
Starting WIP Limit |
Rationale |
| 3 developers |
5 items |
Allows one item per person plus a small buffer |
| 5 developers |
7 items |
Same principle at larger scale |
| 8 developers |
10 items |
Buffer becomes proportionally smaller |
Why N+2 and not N? Because some items will be blocked waiting for review, testing, or external dependencies. A small buffer prevents team members from being idle when their primary task is blocked. But the buffer should be small - two items, not ten.
Continuously Lower the Limit
The N+2 formula is a starting point, not a destination. Once the team is comfortable with the initial limit, reduce it:
- Start at N+2. Run for 2-4 weeks. Observe where work gets stuck.
- Reduce to N+1. Tighten the limit. Some team members will occasionally be “idle” - this is a feature, not a bug. They should swarm on blocked items.
- Reduce to N. At this point, every team member is working on exactly one thing. Blocked work gets immediate attention because someone is always available to help.
- Consider going below N. Some teams find that pairing (two people, one item) further reduces cycle time. A team of 6 with a WIP limit of 3 means everyone is pairing.
Each reduction will feel uncomfortable. That discomfort is the point - it exposes problems in your workflow that were previously hidden by excess WIP.
What Happens When You Hit the Limit
When the team reaches its WIP limit and someone finishes a task, they have two options:
- Pull the next highest-priority item (if the WIP limit allows it).
- Swarm on an existing item that is blocked, stuck, or nearing its cycle time target.
When the WIP limit is reached and no items are complete:
- Do not start new work. This is the hardest part and the most important.
- Help unblock existing work. Pair with someone. Review a pull request. Write a missing test. Talk to the person who has the answer to the blocking question.
- Improve the process. If nothing is blocked but everything is slow, this is the time to work on automation, tooling, or documentation.
Swarming
Swarming is the practice of multiple team members working together on a single item to get it finished faster. It is the natural complement to WIP limits.
When to Swarm
- An item has been in progress for longer than the team’s cycle time target (e.g., more than 2 days)
- An item is blocked and the blocker can be resolved by another team member
- The WIP limit is reached and someone needs work to do
- A critical defect needs to be fixed immediately
How to Swarm Effectively
| Approach |
How It Works |
Best For |
| Pair programming |
Two developers work on the same item at the same machine |
Complex logic, knowledge transfer, code that needs review |
| Mob programming |
The whole team works on one item together |
Critical path items, complex architectural decisions |
| Divide and conquer |
Break the item into sub-tasks and assign them |
Items that can be parallelized (e.g., frontend + backend + tests) |
| Unblock and return |
One person resolves the blocker, then hands back |
External dependencies, environment issues, access requests |
Why Teams Resist Swarming
The most common objection: “It’s inefficient to have two people on one task.” This is only true if you measure efficiency as “percentage of time each person is writing new code.” If you measure efficiency as “how quickly value reaches production,” swarming is almost always faster because it reduces handoffs, wait time, and rework.
How Limiting WIP Exposes Workflow Issues
One of the most valuable effects of WIP limits is that they make hidden problems visible. When you cannot start new work, you are forced to confront the problems that slow existing work down.
| Symptom When WIP Is Limited |
Root Cause Exposed |
| “I’m idle because my PR is waiting for review” |
Code review process is too slow |
| “I’m idle because I’m waiting for the test environment” |
Not enough environments, or environments are not self-service |
| “I’m idle because I’m waiting for the product owner to clarify requirements” |
Stories are not refined before being pulled into the sprint |
| “I’m idle because my build is broken and I can’t figure out why” |
Build is not deterministic, or test suite is flaky |
| “I’m idle because another team hasn’t finished the API I depend on” |
Architecture is too tightly coupled (see Architecture Decoupling) |
Each of these is a bottleneck that was previously invisible because the team could always start something else. With WIP limits, these bottlenecks become obvious and demand attention.
Implementing WIP Limits
Step 1: Make WIP Visible (Week 1)
Before setting limits, make current WIP visible:
- Count the number of items currently “in progress” for the team
- Write this number on the board (physical or digital) every day
- Most teams are shocked by how high it is. A team of 5 often has 15-20 items in progress.
Step 2: Set the Initial Limit (Week 2)
- Calculate N+2 for your team
- Add the limit to your board (e.g., a column header that says “In Progress (limit: 7)”)
- Agree as a team that when the limit is reached, no new work starts
Step 3: Enforce the Limit (Week 3+)
- When someone tries to pull new work and the limit is reached, the team helps them find an existing item to work on
- Track violations: how often does the team exceed the limit? What causes it?
- Discuss in retrospectives: Is the limit too high? Too low? What bottlenecks are exposed?
Step 4: Reduce the Limit (Monthly)
- Every month, consider reducing the limit by 1
- Each reduction will expose new bottlenecks - this is the intended effect
- Stop reducing when the team reaches a sustainable flow where items move from start to done predictably
Key Pitfalls
1. “We set a WIP limit but nobody enforces it”
A WIP limit that is not enforced is not a WIP limit. Enforcement requires a team agreement and a visible mechanism. If the board shows 10 items in progress and the limit is 7, the team should stop and address it immediately. This is a working agreement, not a suggestion.
2. “Developers are idle and management is uncomfortable”
This is the most common failure mode. Management sees “idle” developers and concludes WIP limits are wasteful. In reality, those “idle” developers are either swarming on existing work (which is productive) or the team has hit a genuine bottleneck that needs to be addressed. The discomfort is a signal that the system needs improvement.
3. “We have WIP limits but we also have expedite lanes for everything”
If every urgent request bypasses the WIP limit, you do not have a WIP limit. Expedite lanes should be rare - one per week at most. If everything is urgent, nothing is.
4. “We limit WIP per person but not per team”
Per-person WIP limits miss the point. The goal is to limit team WIP so that team members are incentivized to help each other. A per-person limit of 1 with no team limit still allows the team to have 8 items in progress simultaneously with no swarming.
Measuring Success
| Metric |
Target |
Why It Matters |
| Work in progress |
At or below team limit |
Confirms the limit is being respected |
| Development cycle time |
Decreasing |
Confirms that less WIP leads to faster delivery |
| Items completed per week |
Stable or increasing |
Confirms that finishing more, starting less works |
| Time items spend blocked |
Decreasing |
Confirms bottlenecks are being addressed |
Next Step
WIP limits expose problems. Metrics-Driven Improvement provides the framework for systematically addressing them.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
4 - Metrics-Driven Improvement
Use DORA metrics and improvement kata to drive systematic delivery improvement.
Phase 3 - Optimize | Original content combining DORA recommendations and improvement kata
Improvement without measurement is guesswork. This page combines the DORA four key metrics with the improvement kata pattern to create a systematic, repeatable approach to getting better at delivery.
The Problem with Ad Hoc Improvement
Most teams improve accidentally. Someone reads a blog post, suggests a change at standup, and the team tries it for a week before forgetting about it. This produces sporadic, unmeasurable progress that is impossible to sustain.
Metrics-driven improvement replaces this with a disciplined cycle: measure where you are, define where you want to be, run a small experiment, measure the result, and repeat. The improvement kata provides the structure. DORA metrics provide the measures.
The Four DORA Metrics
The DORA research program (now part of Google Cloud) has identified four key metrics that predict software delivery performance. These are the metrics you should track throughout your CD migration.
1. Deployment Frequency
How often your team deploys to production.
| Performance Level |
Deployment Frequency |
| Elite |
On-demand (multiple deploys per day) |
| High |
Between once per day and once per week |
| Medium |
Between once per week and once per month |
| Low |
Between once per month and once every six months |
What it tells you: How comfortable your team and pipeline are with deploying. Low frequency usually indicates manual gates, fear of deployment, or large batch sizes.
How to measure: Count the number of successful deployments to production per unit of time. Automated deploys count. Hotfixes count. Rollbacks do not.
2. Lead Time for Changes
The time from a commit being pushed to trunk to that commit running in production.
| Performance Level |
Lead Time |
| Elite |
Less than one hour |
| High |
Between one day and one week |
| Medium |
Between one week and one month |
| Low |
Between one month and six months |
What it tells you: How efficient your pipeline is. Long lead times indicate slow builds, manual approval steps, or infrequent deployment windows.
How to measure: Record the timestamp when a commit merges to trunk and the timestamp when that commit is running in production. The difference is lead time. Track the median, not the mean (outliers distort the mean).
3. Change Failure Rate
The percentage of deployments that cause a failure in production requiring remediation (rollback, hotfix, or patch).
| Performance Level |
Change Failure Rate |
| Elite |
0-15% |
| High |
16-30% |
| Medium |
16-30% |
| Low |
46-60% |
What it tells you: How effective your testing and validation pipeline is. High failure rates indicate gaps in test coverage, insufficient pre-production validation, or overly large changes.
How to measure: Track deployments that result in a degraded service, require rollback, or need a hotfix. Divide by total deployments. A “failure” is defined by the team - typically any incident that requires immediate human intervention.
4. Mean Time to Restore (MTTR)
How long it takes to recover from a failure in production.
| Performance Level |
Time to Restore |
| Elite |
Less than one hour |
| High |
Less than one day |
| Medium |
Less than one day |
| Low |
Between one week and one month |
What it tells you: How resilient your system and team are. Long recovery times indicate manual rollback processes, poor observability, or insufficient incident response practices.
How to measure: Record the timestamp when a production failure is detected and the timestamp when service is fully restored. Track the median.
The DORA Capabilities
Behind these four metrics are 24 capabilities that the DORA research has shown to drive performance. They organize into five categories. Use this as a diagnostic tool: when a metric is lagging, look at the related capabilities to identify what to improve.
Continuous Delivery Capabilities
These directly affect your pipeline and deployment practices:
- Version control for all production artifacts
- Automated deployment processes
- Continuous integration
- Trunk-based development
- Test automation
- Test data management
- Shift-left security
- Continuous delivery (the ability to deploy at any time)
Architecture Capabilities
These affect how easily your system can be changed and deployed:
- Loosely coupled architecture
- Empowered teams that can choose their own tools
- Teams that can test, deploy, and release independently
Product and Process Capabilities
These affect how work flows through the team:
- Customer feedback loops
- Value stream visibility
- Working in small batches
- Team experimentation
Lean Management Capabilities
These affect how the organization supports delivery:
- Lightweight change approval processes
- Monitoring and observability
- Proactive notification
- WIP limits
- Visual management of workflow
Cultural Capabilities
These affect the environment in which teams operate:
- Generative organizational culture (Westrum model)
- Encouraging and supporting learning
- Collaboration within and between teams
- Job satisfaction
- Transformational leadership
For a detailed breakdown, see the DORA Capabilities reference.
The Improvement Kata
The improvement kata is a four-step pattern from lean manufacturing adapted for software delivery. It provides the structure for turning DORA measurements into concrete improvements.
Step 1: Understand the Direction
Where does your CD migration need to go?
This is already defined by the phases of this migration guide. In Phase 3, your direction is: smaller batches, faster flow, and higher confidence in every deployment.
Step 2: Grasp the Current Condition
Measure your current DORA metrics. Be honest - the point is to understand reality, not to look good.
Practical approach:
- Collect two weeks of data for all four DORA metrics
- Plot the data - do not just calculate averages. Look at the distribution.
- Identify which metric is furthest from your target
- Investigate the related capabilities to understand why
Example current condition:
| Metric |
Current |
Target |
Gap |
| Deployment frequency |
Weekly |
Daily |
5x improvement needed |
| Lead time |
3 days |
< 1 day |
Pipeline is slow or has manual gates |
| Change failure rate |
25% |
< 15% |
Test coverage or change size issue |
| MTTR |
4 hours |
< 1 hour |
Rollback is manual |
Step 3: Establish the Next Target Condition
Do not try to fix everything at once. Pick one metric and define a specific, measurable, time-bound target.
Good target: “Reduce lead time from 3 days to 1 day within the next 4 weeks.”
Bad target: “Improve our deployment pipeline.” (Too vague, no measure, no deadline.)
Step 4: Experiment Toward the Target
Design a small experiment that you believe will move the metric toward the target. Run it. Measure the result. Adjust.
The experiment format:
| Element |
Description |
| Hypothesis |
“If we [action], then [metric] will [improve/decrease] because [reason].” |
| Action |
What specifically will you change? |
| Duration |
How long will you run the experiment? (Typically 1-2 weeks) |
| Measure |
How will you know if it worked? |
| Decision criteria |
What result would cause you to keep, modify, or abandon the change? |
Example experiment:
Hypothesis: If we parallelize our integration test suite, lead time will drop from 3 days to under 2 days because 60% of lead time is spent waiting for tests to complete.
Action: Split the integration test suite into 4 parallel runners.
Duration: 2 weeks.
Measure: Median lead time for commits merged during the experiment period.
Decision criteria: Keep if lead time drops below 2 days. Modify if it drops but not enough. Abandon if it has no effect or introduces flakiness.
The Cycle Repeats
After each experiment:
- Measure the result
- Update your understanding of the current condition
- If the target is met, pick the next metric to improve
- If the target is not met, design another experiment
This creates a continuous improvement loop. Each cycle takes 1-2 weeks. Over months, the cumulative effect is dramatic.
Connecting Metrics to Action
When a metric is lagging, use this guide to identify where to focus.
Low Deployment Frequency
| Possible Cause |
Investigation |
Action |
| Manual approval gates |
Map the approval chain |
Automate or eliminate non-value-adding approvals |
| Fear of deployment |
Ask the team what they fear |
Address the specific fear (usually testing gaps) |
| Large batch size |
Measure changes per deploy |
Implement small batches practices |
| Deploy process is manual |
Time the deploy process |
Automate the deployment pipeline |
Long Lead Time
| Possible Cause |
Investigation |
Action |
| Slow builds |
Time each pipeline stage |
Optimize the slowest stage (often tests) |
| Waiting for environments |
Track environment wait time |
Implement self-service environments |
| Waiting for approval |
Track approval wait time |
Reduce approval scope or automate |
| Large changes |
Measure commit size |
Reduce batch size |
High Change Failure Rate
| Possible Cause |
Investigation |
Action |
| Insufficient test coverage |
Measure coverage by area |
Add tests for the areas that fail most |
| Tests pass but production differs |
Compare test and prod environments |
Make environments more production-like |
| Large, risky changes |
Measure change size |
Reduce batch size, use feature flags |
| Configuration drift |
Audit configuration differences |
Externalize and version configuration |
Long MTTR
| Possible Cause |
Investigation |
Action |
| Rollback is manual |
Time the rollback process |
Automate rollback |
| Hard to identify root cause |
Review recent incidents |
Improve observability and alerting |
| Hard to deploy fixes quickly |
Measure fix lead time |
Ensure pipeline supports rapid hotfix deployment |
| Dependencies fail in cascade |
Map failure domains |
Improve architecture decoupling |
Building a Metrics Dashboard
Make your DORA metrics visible to the team at all times. A dashboard on a wall monitor or a shared link is ideal.
Essential elements:
- Current values for all four DORA metrics
- Trend lines showing direction over the past 4-8 weeks
- Current target condition highlighted
- Active experiment description
Keep it simple. A spreadsheet updated weekly is better than a sophisticated dashboard that nobody maintains. The goal is visibility, not tooling sophistication.
Key Pitfalls
1. “We measure but don’t act”
Measurement without action is waste. If you collect metrics but never run experiments, you are creating overhead with no benefit. Every measurement should lead to a hypothesis. Every hypothesis should lead to an experiment.
2. “We use metrics to compare teams”
DORA metrics are for teams to improve themselves, not for management to rank teams. Using metrics for comparison creates incentives to game the numbers. Each team should own its own metrics and its own improvement targets.
3. “We try to improve all four metrics at once”
Focus on one metric at a time. Improving deployment frequency and change failure rate simultaneously often requires conflicting actions. Pick the biggest bottleneck, address it, then move to the next.
4. “We abandon experiments too quickly”
Most experiments need at least two weeks to show results. One bad day is not a reason to abandon an experiment. Set the duration up front and commit to it.
Measuring Success
| Indicator |
Target |
Why It Matters |
| Experiments per month |
2-4 |
Confirms the team is actively improving |
| Metrics trending in the right direction |
Consistent improvement over 3+ months |
Confirms experiments are having effect |
| Team can articulate current condition and target |
Everyone on the team knows |
Confirms improvement is a shared concern |
| Improvement items in backlog |
Always present |
Confirms improvement is treated as a deliverable |
Next Step
Metrics tell you what to improve. Retrospectives provide the team forum for deciding how to improve it.
5 - Retrospectives
Continuously improve the delivery process through structured reflection.
Phase 3 - Optimize | Adapted from Dojo Consortium
A retrospective is the team’s primary mechanism for turning observations into improvements. Without effective retrospectives, WIP limits expose problems that nobody addresses, metrics trend in the wrong direction with no response, and the CD migration stalls.
Why Retrospectives Matter for CD Migration
Every practice in this guide - trunk-based development, small batches, WIP limits, metrics-driven improvement - generates signals about what is working and what is not. Retrospectives are where the team processes those signals and decides what to change.
Teams that skip retrospectives or treat them as a checkbox exercise consistently stall at whatever maturity level they first reach. Teams that run effective retrospectives continuously improve, week after week, month after month.
The Five-Part Structure
An effective retrospective follows a structured format that prevents it from devolving into a venting session or a status meeting. This five-part structure ensures the team moves from observation to action.
Part 1: Review the Mission (5 minutes)
Start by reminding the team of the larger goal. In the context of a CD migration, this might be:
- “Our mission this quarter is to deploy to production at least once per day.”
- “We are working toward eliminating manual gates in our pipeline.”
- “Our goal is to reduce lead time from 3 days to under 1 day.”
This grounding prevents the retrospective from focusing on minor irritations and keeps the conversation aligned with what matters.
Part 2: Review the KPIs (10 minutes)
Present the team’s current metrics. For a CD migration, these are typically the DORA metrics plus any team-specific measures from Metrics-Driven Improvement.
| Metric |
Last Period |
This Period |
Trend |
| Deployment frequency |
3/week |
4/week |
Improving |
| Lead time (median) |
2.5 days |
2.1 days |
Improving |
| Change failure rate |
22% |
18% |
Improving |
| MTTR |
3 hours |
3.5 hours |
Declining |
| WIP (average) |
8 items |
6 items |
Improving |
Do not skip this step. Without data, the retrospective becomes a subjective debate where the loudest voice wins. With data, the conversation focuses on what the numbers show and what to do about them.
Part 3: Review Experiments (10 minutes)
Review the outcomes of any experiments the team ran since the last retrospective.
For each experiment:
- What was the hypothesis? Remind the team what you were testing.
- What happened? Present the data.
- What did you learn? Even failed experiments teach you something.
- What is the decision? Keep, modify, or abandon.
Example:
Experiment: Parallelize the integration test suite to reduce lead time.
Hypothesis: Lead time would drop from 2.5 days to under 2 days.
Result: Lead time dropped to 2.1 days. The parallelization worked, but environment setup time is now the bottleneck.
Decision: Keep the parallelization. New experiment: investigate self-service test environments.
Part 4: Check Goals (10 minutes)
Review any improvement goals or action items from the previous retrospective.
- Completed: Acknowledge and celebrate. This is important - it reinforces that improvement work matters.
- In progress: Check for blockers. Does the team need to adjust the approach?
- Not started: Why not? Was it deprioritized, blocked, or forgotten? If improvement work is consistently not started, the team is not treating improvement as a deliverable (see below).
Part 5: Open Conversation (25 minutes)
This is the core of the retrospective. The team discusses:
- What is working well that we should keep doing?
- What is not working that we should change?
- What new problems or opportunities have we noticed?
Facilitation techniques for this section:
| Technique |
How It Works |
Best For |
| Start/Stop/Continue |
Each person writes items in three categories |
Quick, structured, works with any team |
| 4Ls (Liked, Learned, Lacked, Longed For) |
Broader categories that capture emotional responses |
Teams that need to process frustration or celebrate wins |
| Timeline |
Plot events on a timeline and discuss turning points |
After a particularly eventful sprint or incident |
| Dot voting |
Everyone gets 3 votes to prioritize discussion topics |
When there are many items and limited time |
From Conversation to Commitment
The open conversation must produce concrete action items. Vague commitments like “we should communicate better” are worthless. Good action items are:
- Specific: “Add a Slack notification when the build breaks” (not “improve communication”)
- Owned: “Alex will set this up by Wednesday” (not “someone should do this”)
- Measurable: “We will know this worked if build break response time drops below 10 minutes”
- Time-bound: “We will review the result at the next retrospective”
Limit action items to 1-3 per retrospective. More than three means nothing gets done. One well-executed improvement is worth more than five abandoned ones.
Psychological Safety Is a Prerequisite
A retrospective only works if team members feel safe to speak honestly about what is not working. Without psychological safety, retrospectives produce sanitized, non-actionable discussion.
Signs of Low Psychological Safety
- Only senior team members speak
- Nobody mentions problems - everything is “fine”
- Issues that everyone knows about are never raised
- Team members vent privately after the retrospective instead of during it
- Action items are always about tools or processes, never about behaviors
Building Psychological Safety
| Practice |
Why It Helps |
| Leader speaks last |
Prevents the leader’s opinion from anchoring the discussion |
| Anonymous input |
Use sticky notes or digital tools where input is anonymous initially |
| Blame-free language |
“The deploy failed” not “You broke the deploy” |
| Follow through on raised issues |
Nothing destroys safety faster than raising a concern and having it ignored |
| Acknowledge mistakes openly |
Leaders who admit their own mistakes make it safe for others to do the same |
| Separate retrospective from performance review |
If retro content affects reviews, people will not be honest |
Treat Improvement as a Deliverable
The most common failure mode for retrospectives is producing action items that never get done. This happens when improvement work is treated as something to do “when we have time” - which means never.
Make Improvement Visible
- Add improvement items to the same board as feature work
- Include improvement items in WIP limits
- Track improvement items through the same workflow as any other deliverable
Allocate Capacity
Reserve a percentage of team capacity for improvement work. Common allocations:
| Allocation |
Approach |
| 20% continuous |
One day per week (or equivalent) dedicated to improvement, tooling, and tech debt |
| Dedicated improvement sprint |
Every 4th sprint is entirely improvement-focused |
| Improvement as first pull |
When someone finishes work and the WIP limit allows, the first option is an improvement item |
The specific allocation matters less than having one. A team that explicitly budgets 10% for improvement will improve more than a team that aspires to 20% but never protects the time.
Retrospective Cadence
| Cadence |
Best For |
Caution |
| Weekly |
Teams in active CD migration, teams working through major changes |
Can feel like too many meetings if not well-facilitated |
| Bi-weekly |
Teams in steady state with ongoing improvement |
Most common cadence |
| After incidents |
Any team |
Incident retrospectives (postmortems) are separate from regular retrospectives |
| Monthly |
Mature teams with well-established improvement habits |
Too infrequent for teams early in their migration |
During active phases of a CD migration (Phases 1-3), weekly retrospectives are recommended. Once the team reaches Phase 4, bi-weekly is usually sufficient.
Running Your First CD Migration Retrospective
If your team has not been running effective retrospectives, start here:
Before the Retrospective
- Collect your DORA metrics for the past two weeks
- Review any action items from the previous retrospective (if applicable)
- Prepare a shared document or board with the five-part structure
During the Retrospective (60 minutes)
- Review mission (5 min): State your CD migration goal for this phase
- Review KPIs (10 min): Present the DORA metrics. Ask: “What do you notice?”
- Review experiments (10 min): Discuss any experiments that were run
- Check goals (10 min): Review action items from last time
- Open conversation (25 min): Use Start/Stop/Continue for the first time - it is the simplest format
After the Retrospective
- Publish the action items where the team will see them daily
- Assign owners and due dates
- Add improvement items to the team board
- Schedule the next retrospective
Key Pitfalls
1. “Our retrospectives always produce the same complaints”
If the same issues surface repeatedly, the team is not executing on its action items. Check whether improvement work is being prioritized alongside feature work. If it is not, no amount of retrospective technique will help.
2. “People don’t want to attend because nothing changes”
This is a symptom of the same problem - action items are not executed. The fix is to start small: commit to one action item per retrospective, execute it completely, and demonstrate the result at the next retrospective. Success builds momentum.
3. “The retrospective turns into a blame session”
The facilitator must enforce blame-free language. Redirect “You did X wrong” to “When X happened, the impact was Y. How can we prevent Y?” If blame is persistent, the team has a psychological safety problem that needs to be addressed separately.
4. “We don’t have time for retrospectives”
A team that does not have time to improve will never improve. A 60-minute retrospective that produces one executed improvement is the highest-leverage hour of the entire sprint.
Measuring Success
| Indicator |
Target |
Why It Matters |
| Retrospective attendance |
100% of team |
Confirms the team values the practice |
| Action items completed |
> 80% completion rate |
Confirms improvement is treated as a deliverable |
| DORA metrics trend |
Improving quarter over quarter |
Confirms retrospectives lead to real improvement |
| Team engagement |
Voluntary contributions increasing |
Confirms psychological safety is present |
Next Step
With metrics-driven improvement and effective retrospectives, you have the engine for continuous improvement. The final optimization step is Architecture Decoupling - ensuring your system’s architecture does not prevent you from deploying independently.
This content is adapted from the Dojo Consortium,
licensed under CC BY 4.0.
6 - Architecture Decoupling
Enable independent deployment of components by decoupling architecture boundaries.
Phase 3 - Optimize | Original content based on Dojo Consortium delivery journey patterns
You cannot deploy independently if your architecture requires coordinated releases. This page describes the three architecture states teams encounter on the journey to continuous deployment and provides practical strategies for moving from entangled to loosely coupled.
Why Architecture Matters for CD
Every practice in this guide - small batches, feature flags, WIP limits - assumes that your team can deploy its changes independently. But if your application is a monolith where changing one module requires retesting everything, or a set of microservices with tightly coupled APIs, independent deployment is impossible regardless of how good your practices are.
Architecture is either an enabler or a blocker for continuous deployment. There is no neutral.
Three Architecture States
The Delivery System Improvement Journey describes three states that teams move through. Most teams start entangled. The goal is to reach loosely coupled.
State 1: Entangled
In an entangled architecture, everything is connected to everything. Changes in one area routinely break other areas. Teams cannot deploy independently.
Characteristics:
- Shared database schemas with no ownership boundaries
- Circular dependencies between modules or services
- Deploying one service requires deploying three others at the same time
- Integration testing requires the entire system to be running
- A single team’s change can block every other team’s release
- “Big bang” releases on a fixed schedule
Impact on delivery:
| Metric |
Typical State |
| Deployment frequency |
Monthly or quarterly (because coordinating releases is hard) |
| Lead time |
Weeks to months (because changes wait for the next release train) |
| Change failure rate |
High (because big releases mean big risk) |
| MTTR |
Long (because failures cascade across boundaries) |
How you got here: Entanglement is the natural result of building quickly without deliberate architectural boundaries. It is not a failure - it is a stage that almost every system passes through.
State 2: Tightly Coupled
In a tightly coupled architecture, there are identifiable boundaries between components, but those boundaries are leaky. Teams have some independence, but coordination is still required for many changes.
Characteristics:
- Services exist but share a database or use synchronous point-to-point calls
- API contracts exist but are not versioned - breaking changes require simultaneous updates
- Teams can deploy some changes independently, but cross-cutting changes require coordination
- Integration testing requires multiple services but not the entire system
- Release trains still exist but are smaller and more frequent
Impact on delivery:
| Metric |
Typical State |
| Deployment frequency |
Weekly to bi-weekly |
| Lead time |
Days to a week |
| Change failure rate |
Moderate (improving but still affected by coupling) |
| MTTR |
Hours (failures are more isolated but still cascade sometimes) |
State 3: Loosely Coupled
In a loosely coupled architecture, components communicate through well-defined interfaces, own their own data, and can be deployed independently without coordinating with other teams.
Characteristics:
- Each service owns its own data store - no shared databases
- APIs are versioned; consumers and producers can be updated independently
- Asynchronous communication (events, queues) is used where possible
- Each team can deploy without coordinating with any other team
- Services are designed to degrade gracefully if a dependency is unavailable
- No release trains - each team deploys when ready
Impact on delivery:
| Metric |
Typical State |
| Deployment frequency |
On-demand (multiple times per day) |
| Lead time |
Hours |
| Change failure rate |
Low (small, isolated changes) |
| MTTR |
Minutes (failures are contained within service boundaries) |
Moving from Entangled to Tightly Coupled
This is the first and most difficult transition. It requires establishing boundaries where none existed before.
Strategy 1: Identify Natural Seams
Look for places where the system already has natural boundaries, even if they are not enforced:
- Different business domains: Orders, payments, inventory, and user accounts are different domains even if they live in the same codebase.
- Different rates of change: Code that changes weekly and code that changes yearly should not be in the same deployment unit.
- Different scaling needs: Components with different load profiles benefit from separate deployment.
- Different team ownership: If different teams work on different parts of the codebase, those parts are candidates for separation.
Strategy 2: Strangler Fig Pattern
Instead of rewriting the system, incrementally extract components from the monolith.
Step 1: Route all traffic through a facade/proxy
Step 2: Build the new component alongside the old
Step 3: Route a small percentage of traffic to the new component
Step 4: Validate correctness and performance
Step 5: Route all traffic to the new component
Step 6: Remove the old code
Key rule: The strangler fig pattern must be done incrementally. If you try to extract everything at once, you are doing a rewrite, not a strangler fig.
Strategy 3: Define Ownership Boundaries
Assign clear ownership of each module or component to a single team. Ownership means:
- The owning team decides the API contract
- The owning team deploys the component
- Other teams consume the API, not the internal implementation
- Changes to the API contract require agreement from consumers (but not simultaneous deployment)
What to Avoid
- The “big rewrite”: Rewriting a monolith from scratch almost always fails. Use the strangler fig pattern instead.
- Premature microservices: Do not split into microservices until you have clear domain boundaries and team ownership. Microservices with unclear boundaries are a distributed monolith - the worst of both worlds.
- Shared databases across services: This is the most common coupling mechanism. If two services share a database, they cannot be deployed independently because a schema change in one service can break the other.
Moving from Tightly Coupled to Loosely Coupled
This transition is about hardening the boundaries that were established in the previous step.
Strategy 1: Eliminate Shared Data Stores
If two services share a database, one of three things needs to happen:
- One service owns the data, the other calls its API. The dependent service no longer accesses the database directly.
- The data is duplicated. Each service maintains its own copy, synchronized via events.
- The shared data becomes a dedicated data service. Both services consume from a service that owns the data.
BEFORE (shared database):
Service A → [Shared DB] ← Service B
AFTER (option 1 - API ownership):
Service A → [DB A]
Service B → Service A API → [DB A]
AFTER (option 2 - event-driven duplication):
Service A → [DB A] → Events → Service B → [DB B]
AFTER (option 3 - data service):
Service A → Data Service → [DB]
Service B → Data Service → [DB]
Strategy 2: Version Your APIs
API versioning allows consumers and producers to evolve independently.
Rules for API versioning:
- Never make a breaking change without a new version. Adding fields is non-breaking. Removing fields is breaking. Changing field types is breaking.
- Support at least two versions simultaneously. This gives consumers time to migrate.
- Deprecate old versions with a timeline. “Version 1 will be removed on date X.”
- Use consumer-driven contract tests to verify compatibility. See Contract Testing.
Strategy 3: Prefer Asynchronous Communication
Synchronous calls (HTTP, gRPC) create temporal coupling: if the downstream service is slow or unavailable, the upstream service is also affected.
| Communication Style |
Coupling |
When to Use |
| Synchronous (HTTP/gRPC) |
Temporal + behavioral |
When the caller needs an immediate response |
| Asynchronous (events/queues) |
Behavioral only |
When the caller does not need an immediate response |
| Event-driven (publish/subscribe) |
Minimal |
When the producer does not need to know about consumers |
Prefer asynchronous communication wherever the business requirements allow it. Not every interaction needs to be synchronous.
Strategy 4: Design for Failure
In a loosely coupled system, dependencies will be unavailable sometimes. Design for this:
- Circuit breakers: Stop calling a failing dependency after N failures. Return a degraded response instead.
- Timeouts: Set aggressive timeouts on all external calls. A 30-second timeout on a service that should respond in 100ms is not a timeout - it is a hang.
- Bulkheads: Isolate failures so that one failing dependency does not consume all resources.
- Graceful degradation: Define what the user experience should be when a dependency is down. “Recommendations unavailable” is better than a 500 error.
Practical Steps for Architecture Decoupling
Month 1: Map Dependencies
Before changing anything, understand what you have:
- Draw a dependency graph. Which components depend on which? Where are the shared databases?
- Identify deployment coupling. Which components must be deployed together? Why?
- Identify the highest-impact coupling. Which coupling most frequently blocks independent deployment?
Month 2-3: Establish the First Boundary
Pick one component to decouple. Choose the one with the highest impact and lowest risk:
- Apply the strangler fig pattern to extract it
- Define a clear API contract
- Move its data to its own data store
- Deploy it independently
Month 4+: Repeat
Take the next highest-impact coupling and address it. Each decoupling makes the next one easier because the team learns the patterns and the remaining system is simpler.
Key Pitfalls
1. “We need to rewrite everything before we can deploy independently”
No. Decoupling is incremental. Extract one component, deploy it independently, prove the pattern works, then continue. A partial decoupling that enables one team to deploy independently is infinitely more valuable than a planned rewrite that never finishes.
2. “We split into microservices but our lead time got worse”
Microservices add operational complexity (more services to deploy, monitor, and debug). If you split without investing in deployment automation, observability, and team autonomy, you will get worse, not better. Microservices are a tool for organizational scaling, not a silver bullet for delivery speed.
3. “Teams keep adding new dependencies that recouple the system”
Architecture decoupling requires governance. Establish architectural principles (e.g., “no shared databases”) and enforce them through automated checks (e.g., dependency analysis in CI) and architecture reviews for cross-boundary changes.
4. “We can’t afford the time to decouple”
You cannot afford not to. Every week spent doing coordinated releases is a week of delivery capacity lost to coordination overhead. The investment in decoupling pays for itself quickly through increased deployment frequency and reduced coordination cost.
Measuring Success
| Metric |
Target |
Why It Matters |
| Teams that can deploy independently |
Increasing |
The primary measure of decoupling |
| Coordinated releases per quarter |
Decreasing toward zero |
Confirms coupling is being eliminated |
| Deployment frequency per team |
Increasing independently |
Confirms teams are not blocked by each other |
| Cross-team dependencies per feature |
Decreasing |
Confirms architecture supports independent work |
Next Step
With optimized flow, small batches, metrics-driven improvement, and a decoupled architecture, your team is ready for the final phase. Continue to Phase 4: Deliver on Demand.