This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Metrics

Detailed definitions for key delivery metrics. Understand what to measure and why.

Adapted from Dojo Consortium

These metrics help you assess your current delivery performance and track improvement over time. Start with the metrics most relevant to your current phase.

Key Metrics

Metric What It Measures
Integration Frequency How often code is integrated to trunk
Build Duration Time from commit to artifact creation
Development Cycle Time Time from starting work to delivery
Lead Time Time from request to delivery
Change Fail Rate Percentage of changes requiring remediation
Mean Time to Repair Time to restore service after failure
Release Frequency How often releases reach production
Work in Progress Amount of started but unfinished work

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

1 - Integration Frequency

How often developers integrate code changes to the trunk – a leading indicator of CI maturity and small batch delivery.

Adapted from Dojo Consortium

Definition

Integration Frequency measures the average number of production-ready pull requests a team merges to trunk per day, normalized by team size. On a team of five developers, healthy continuous integration practice produces at least five integrations per day – roughly one per developer.

This metric is a direct indicator of how well a team practices Continuous Integration. Teams that integrate frequently work in small batches, receive fast feedback, and reduce the risk associated with large, infrequent merges.

integrationFrequency = mergedPullRequests / day / numberOfDevelopers

A value of 1.0 or higher per developer per day indicates that work is being decomposed into small, independently deliverable increments.

How to Measure

  1. Count trunk merges. Track the number of pull requests (or direct commits) merged to main or trunk each day.
  2. Normalize by team size. Divide the daily count by the number of developers actively contributing that day.
  3. Calculate the rolling average. Use a 5-day or 10-day rolling window to smooth daily variation and surface meaningful trends.

Most source control platforms expose this data through their APIs:

  • GitHub – list merged pull requests via the REST or GraphQL API.
  • GitLab – query merged merge requests per project.
  • Bitbucket – use the pull request activity endpoint.

Alternatively, count commits to the default branch if pull requests are not used.

Targets

Level Integration Frequency (per developer per day)
Low Less than 1 per week
Medium A few times per week
High Once per day
Elite Multiple times per day

The elite target aligns with trunk-based development, where developers push small changes to the trunk multiple times daily and rely on automated testing and feature flags to manage risk.

Common Pitfalls

  • Meaningless commits. Teams may inflate the count by integrating trivial or empty changes. Pair this metric with code review quality and defect rate.
  • Breaking the trunk. Pushing faster without adequate test coverage leads to a red build and slows the entire team. Always pair Integration Frequency with build success rate and Change Fail Rate.
  • Counting the wrong thing. Merges to long-lived feature branches do not count. Only merges to the trunk or main integration branch reflect true CI practice.
  • Ignoring quality. If defect rates rise as integration frequency increases, the team is skipping quality steps. Use defect rate as a guardrail metric.

Connection to CD

Integration Frequency is the foundational metric for Continuous Delivery. Without frequent integration, every downstream metric suffers:

  • Smaller batches reduce risk. Each integration carries less change, making failures easier to diagnose and fix.
  • Faster feedback loops. Frequent integration means the CI pipeline runs more often, catching issues within minutes instead of days.
  • Enables trunk-based development. High integration frequency is incompatible with long-lived branches. Teams naturally move toward short-lived branches or direct trunk commits.
  • Reduces merge conflicts. The longer code stays on a branch, the more likely it diverges from trunk. Frequent integration keeps the delta small.
  • Prerequisite for deployment frequency. You cannot deploy more often than you integrate. Improving this metric directly unblocks improvements to Release Frequency.

To improve Integration Frequency:

  • Decompose stories into smaller increments using Behavior-Driven Development.
  • Use Test-Driven Development to produce modular, independently testable code.
  • Adopt feature flags or branch by abstraction to decouple integration from release.
  • Practice Trunk-Based Development with short-lived branches lasting less than one day.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

2 - Build Duration

Time from code commit to a deployable artifact – a critical constraint on feedback speed and mean time to repair.

Adapted from Dojo Consortium

Definition

Build Duration measures the elapsed time from when a developer pushes a commit until the CI pipeline produces a deployable artifact and all automated quality gates have passed. This includes compilation, unit tests, integration tests, static analysis, security scans, and artifact packaging.

Build Duration represents the minimum possible time between deciding to make a change and having that change ready for production. It sets a hard floor on Lead Time and directly constrains how quickly a team can respond to production incidents.

buildDuration = artifactReadyTimestamp - commitPushTimestamp

This metric is sometimes referred to as “pipeline cycle time” or “CI cycle time.” The book Accelerate references it as part of “hard lead time.”

How to Measure

  1. Record the commit timestamp. Capture when the commit arrives at the CI server (webhook receipt or pipeline trigger time).
  2. Record the artifact-ready timestamp. Capture when the final pipeline stage completes successfully and the deployable artifact is published.
  3. Calculate the difference. Subtract the commit timestamp from the artifact-ready timestamp.
  4. Track the median and p95. The median shows typical performance. The 95th percentile reveals worst-case builds that block developers.

Most CI platforms expose build duration natively:

  • GitHub ActionscreatedAt and updatedAt on workflow runs.
  • GitLab CI – pipeline created_at and finished_at.
  • Jenkins – build start time and duration fields.
  • CircleCI – workflow duration in the Insights dashboard.

Set up alerts when builds exceed your target threshold so the team can investigate regressions immediately.

Targets

Level Build Duration
Low More than 30 minutes
Medium 10 – 30 minutes
High 5 – 10 minutes
Elite Less than 5 minutes

The ten-minute threshold is a widely recognized guideline. Builds longer than ten minutes break developer flow, discourage frequent integration, and increase the cost of fixing failures.

Common Pitfalls

  • Removing tests to hit targets. Reducing test count or skipping test types (integration, security) lowers build duration but degrades quality. Always pair this metric with Change Fail Rate and defect rate.
  • Ignoring queue time. If builds wait in a queue before execution, the developer experiences the queue time as part of the feedback delay even though it is not technically “build” time. Measure wall-clock time from commit to result.
  • Optimizing the wrong stage. Profile the pipeline before optimizing. Often a single slow test suite or a sequential step that could run in parallel dominates the total duration.
  • Flaky tests. Tests that intermittently fail cause retries, effectively doubling or tripling build duration. Track flake rate alongside build duration.

Connection to CD

Build Duration is a critical bottleneck in the Continuous Delivery pipeline:

  • Constrains Mean Time to Repair. When production is down, the build pipeline is the minimum time to get a fix deployed. A 30-minute build means at least 30 minutes of downtime for any fix, no matter how small. Reducing build duration directly improves MTTR.
  • Enables frequent integration. Developers are unlikely to integrate multiple times per day if each integration takes 30 minutes to validate. Short builds encourage higher Integration Frequency.
  • Shortens feedback loops. The sooner a developer learns that a change broke something, the less context they have lost and the cheaper the fix. Builds under ten minutes keep developers in flow.
  • Supports continuous deployment. Automated deployment pipelines cannot deliver changes rapidly if the build stage is slow. Build duration is often the largest component of Lead Time.

To improve Build Duration:

  • Parallelize stages. Run unit tests, linting, and security scans concurrently rather than sequentially.
  • Replace slow end-to-end tests. Move heavyweight end-to-end tests to an asynchronous post-deploy verification stage. Use contract tests and service virtualization in the main pipeline.
  • Decompose large services. Smaller codebases compile and test faster. If build duration is stubbornly high, consider breaking the service into smaller domains.
  • Cache aggressively. Cache dependencies, Docker layers, and compilation artifacts between builds.
  • Set a build time budget. Alert the team whenever a new test or step pushes the build past your target, so test efficiency is continuously maintained.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

3 - Development Cycle Time

Average time from when work starts until it is running in production – a key flow metric for identifying delivery bottlenecks.

Adapted from Dojo Consortium

Definition

Development Cycle Time measures the elapsed time from when a developer begins work on a story or task until that work is deployed to production and available to users. It captures the full construction phase of delivery: coding, code review, testing, integration, and deployment.

developmentCycleTime = productionDeployTimestamp - workStartedTimestamp

This is distinct from Lead Time, which includes the time a request spends waiting in the backlog before work begins. Development Cycle Time focuses exclusively on the active delivery phase.

The Accelerate research uses “lead time for changes” (measured from commit to production) as a key DORA metric. Development Cycle Time extends this slightly further back to when work starts, capturing the full development process including any time between starting work and the first commit.

How to Measure

  1. Record when work starts. Capture the timestamp when a story moves to “In Progress” in your issue tracker, or when the first commit for the story appears.
  2. Record when work reaches production. Capture the timestamp of the production deployment that includes the completed story.
  3. Calculate the difference. Subtract the start time from the production deploy time.
  4. Report the median and distribution. The median provides a typical value. The distribution (or a control chart) reveals variability and outliers that indicate process problems.

Sources for this data include:

  • Issue trackers (Jira, GitHub Issues, Azure Boards) – status transition timestamps.
  • Source control – first commit timestamp associated with a story.
  • Deployment logs – timestamp of production deployments linked to stories.

Linking stories to deployments is essential. Use commit message conventions (e.g., story IDs in commit messages) or deployment metadata to create this connection.

Targets

Level Development Cycle Time
Low More than 2 weeks
Medium 1 – 2 weeks
High 2 – 7 days
Elite Less than 2 days

Elite teams deliver completed work to production within one to two days of starting it. This is achievable only when work is decomposed into small increments, the pipeline is fast, and deployment is automated.

Common Pitfalls

  • Marking work “Done” before it reaches production. If “Done” means “code complete” rather than “deployed,” the metric understates actual cycle time. The Definition of Done must include production deployment.
  • Skipping the backlog. Moving items from “Backlog” directly to “Done” after deploying hides the true wait time and development duration. Ensure stories pass through the standard workflow stages.
  • Splitting work into functional tasks. Breaking a story into separate “development,” “testing,” and “deployment” tasks obscures the end-to-end cycle time. Measure at the story or feature level.
  • Ignoring variability. A low average can hide a bimodal distribution where some stories take hours and others take weeks. Use a control chart or histogram to expose the full picture.
  • Optimizing for speed without quality. If cycle time drops but Change Fail Rate rises, the team is cutting corners. Use quality metrics as guardrails.

Connection to CD

Development Cycle Time is the most comprehensive measure of delivery flow and sits at the heart of Continuous Delivery:

  • Exposes bottlenecks. A long cycle time reveals where work gets stuck – waiting for code review, queued for testing, blocked by a manual approval, or delayed by a slow pipeline. Each bottleneck is a target for improvement.
  • Drives smaller batches. The only way to achieve a cycle time under two days is to decompose work into very small increments. This naturally leads to smaller changes, less risk, and faster feedback.
  • Reduces waste from changing priorities. Long cycle times mean work in progress is exposed to priority changes, context switches, and scope creep. Shorter cycles reduce the window of vulnerability.
  • Improves feedback quality. The sooner a change reaches production, the sooner the team gets real user feedback. Short cycle times enable rapid learning and course correction.
  • Subsumes other metrics. Cycle time is affected by Integration Frequency, Build Duration, and Work in Progress. Improving any of these upstream metrics will reduce cycle time.

To improve Development Cycle Time:

  • Decompose work into stories that can be completed and deployed within one to two days.
  • Remove handoffs between teams (e.g., separate dev and QA teams).
  • Automate the build and deploy pipeline to eliminate manual steps.
  • Improve test design so the pipeline runs faster without sacrificing coverage.
  • Limit Work in Progress so the team focuses on finishing work rather than starting new items.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

4 - Lead Time

Total time from when a change is committed until it is running in production – a DORA key metric for delivery throughput.

Adapted from Dojo Consortium

Definition

Lead Time measures the total elapsed time from when a code change is committed to the version control system until that change is successfully running in production. This is one of the four key metrics identified by the DORA (DevOps Research and Assessment) team as a predictor of software delivery performance.

leadTime = productionDeployTimestamp - commitTimestamp

In the broader value stream, “lead time” can also refer to the time from a customer request to delivery. The DORA definition focuses specifically on the segment from commit to production, which the Accelerate research calls “lead time for changes.” This narrower definition captures the efficiency of your delivery pipeline and deployment process.

Lead Time includes Build Duration plus any additional time for deployment, approval gates, environment provisioning, and post-deploy verification. It is a superset of build time and a subset of Development Cycle Time, which also includes the coding phase before the first commit.

How to Measure

  1. Record the commit timestamp. Use the timestamp of the commit as recorded in source control (not the local author timestamp, but the time it was pushed or merged to the trunk).
  2. Record the production deployment timestamp. Capture when the deployment containing that commit completes successfully in production.
  3. Calculate the difference. Subtract the commit time from the deploy time.
  4. Aggregate across commits. Report the median lead time across all commits deployed in a given period (daily, weekly, or per release).

Data sources:

  • Source control – commit or merge timestamps from Git, GitHub, GitLab, etc.
  • CI/CD platform – pipeline completion times from Jenkins, GitHub Actions, GitLab CI, etc.
  • Deployment tooling – production deployment timestamps from Argo CD, Spinnaker, Flux, or custom scripts.

For teams practicing continuous deployment, lead time may be nearly identical to build duration. For teams with manual approval gates or scheduled release windows, lead time will be significantly longer.

Targets

Level Lead Time for Changes
Low More than 6 months
Medium 1 – 6 months
High 1 day – 1 week
Elite Less than 1 hour

These levels are drawn from the DORA State of DevOps research. Elite performers deliver changes to production in under an hour from commit, enabled by fully automated pipelines and continuous deployment.

Common Pitfalls

  • Measuring only build time. Lead time includes everything after the commit, not just the CI pipeline. Manual approval gates, scheduled deployment windows, and environment provisioning delays must all be included.
  • Ignoring waiting time. A change may sit in a queue waiting for a release train, a change advisory board (CAB) review, or a deployment window. This wait time is part of lead time and often dominates the total.
  • Tracking requests instead of commits. Some teams measure from customer request to delivery. While valuable, this conflates backlog prioritization with delivery efficiency. Keep this metric focused on the commit-to-production segment.
  • Hiding items from the backlog. Requests tracked in spreadsheets or side channels before entering the backlog distort lead time measurements. Ensure all work enters the system of record promptly.
  • Reducing quality to reduce lead time. Shortening approval processes or skipping test stages reduces lead time at the cost of quality. Pair this metric with Change Fail Rate as a guardrail.

Connection to CD

Lead Time is one of the four DORA metrics and a direct measure of your delivery pipeline’s end-to-end efficiency:

  • Reveals pipeline bottlenecks. A large gap between build duration and lead time points to manual processes, approval queues, or deployment delays that the team can target for automation.
  • Measures the cost of failure recovery. When production breaks, lead time is the minimum time to deliver a fix (unless you roll back). This makes lead time a direct input to Mean Time to Repair.
  • Drives automation. The primary way to reduce lead time is to automate every step between commit and production: build, test, security scanning, environment provisioning, deployment, and verification.
  • Reflects deployment strategy. Teams using continuous deployment have lead times measured in minutes. Teams using weekly release trains have lead times measured in days. The metric makes the cost of batching visible.
  • Connects speed and stability. The DORA research shows that elite performers achieve both low lead time and low Change Fail Rate. Speed and quality are not trade-offs – they reinforce each other when the delivery system is well-designed.

To improve Lead Time:

  • Automate the deployment pipeline end to end, eliminating manual gates.
  • Replace change advisory board (CAB) reviews with automated policy checks and peer review.
  • Deploy on every successful build rather than batching changes into release trains.
  • Reduce Build Duration to shrink the largest component of lead time.
  • Monitor and eliminate environment provisioning delays.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

5 - Change Fail Rate

Percentage of production deployments that cause a failure or require remediation – a DORA key metric for delivery stability.

Adapted from Dojo Consortium

Definition

Change Fail Rate measures the percentage of deployments to production that result in degraded service, negative customer impact, or require immediate remediation such as a rollback, hotfix, or patch.

changeFailRate = failedChangeCount / totalChangeCount * 100

A “failed change” includes any deployment that:

  • Is rolled back.
  • Requires a hotfix deployed within a short window (commonly 24 hours).
  • Triggers a production incident attributed to the change.
  • Requires manual intervention to restore service.

This is one of the four DORA key metrics. It measures the stability side of delivery performance, complementing the throughput metrics of Lead Time and Release Frequency.

How to Measure

  1. Count total production deployments over a defined period (weekly, monthly).
  2. Count deployments classified as failures using the criteria above.
  3. Divide failures by total deployments and express as a percentage.

Data sources:

  • Deployment logs – total deployment count from your CD platform.
  • Incident management – incidents linked to specific deployments (PagerDuty, Opsgenie, ServiceNow).
  • Rollback records – deployments that were reverted, either manually or by automated rollback.
  • Hotfix tracking – deployments tagged as hotfixes or emergency changes.

Automate the classification where possible. For example, if a deployment is followed by another deployment of the same service within a defined window (e.g., one hour), flag the original as a potential failure for review.

Targets

Level Change Fail Rate
Low 46 – 60%
Medium 16 – 45%
High 0 – 15%
Elite 0 – 5%

These levels are drawn from the DORA State of DevOps research. Elite performers maintain a change fail rate below 5%, meaning fewer than 1 in 20 deployments causes a problem.

Common Pitfalls

  • Not recording failures. Deploying fixes without logging the original failure understates the true rate. Ensure every incident and rollback is tracked.
  • Reclassifying defects. Creating review processes that reclassify production defects as “feature requests” or “known limitations” hides real failures.
  • Inflating deployment count. Re-deploying the same working version to increase the denominator artificially lowers the rate. Only count deployments that contain new changes.
  • Pursuing zero defects at the cost of speed. An obsessive focus on eliminating all failures can slow Release Frequency to a crawl. A small failure rate with fast recovery is preferable to near-zero failures with monthly deployments.
  • Ignoring near-misses. Changes that cause degraded performance but do not trigger a full incident are still failures. Define clear criteria for what constitutes a failed change and apply them consistently.

Connection to CD

Change Fail Rate is the primary quality signal in a Continuous Delivery pipeline:

  • Validates pipeline quality gates. A rising change fail rate indicates that the automated tests, security scans, and quality checks in the pipeline are not catching enough defects. Each failure is an opportunity to add or improve a quality gate.
  • Enables confidence in frequent releases. Teams will only deploy frequently if they trust the pipeline. A low change fail rate builds this trust and supports higher Release Frequency.
  • Smaller changes fail less. The DORA research consistently shows that smaller, more frequent deployments have lower failure rates than large, infrequent releases. Improving Integration Frequency naturally improves this metric.
  • Drives root cause analysis. Each failed change should trigger a blameless investigation: what automated check could have caught this? The answers feed directly into pipeline improvements.
  • Balances throughput metrics. Change Fail Rate is the essential guardrail for Lead Time and Release Frequency. If those metrics improve while change fail rate worsens, the team is trading quality for speed.

To improve Change Fail Rate:

  • Deploy smaller changes more frequently to reduce the blast radius of failures.
  • Identify the root cause of each failure and add automated checks to prevent recurrence.
  • Strengthen the test suite, particularly integration and contract tests that validate interactions between services.
  • Implement progressive delivery (canary releases, feature flags) to limit the impact of defective changes before they reach all users.
  • Conduct blameless post-incident reviews and feed learnings back into the delivery pipeline.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

6 - Mean Time to Repair

Average time from when a production incident is detected until service is restored – a DORA key metric for recovery capability.

Adapted from Dojo Consortium

Definition

Mean Time to Repair (MTTR) measures the average elapsed time between when a production incident is detected and when it is fully resolved and service is restored to normal operation.

mttr = sum(resolvedTimestamp - detectedTimestamp) / incidentCount

MTTR reflects an organization’s ability to recover from failure. It encompasses detection, diagnosis, fix development, build, deployment, and verification. A short MTTR depends on the entire delivery system working well – fast builds, automated deployments, good observability, and practiced incident response.

The Accelerate research identifies MTTR as one of the four key DORA metrics and notes that “software delivery performance is a combination of lead time, release frequency, and MTTR.” It is the stability counterpart to the throughput metrics.

How to Measure

  1. Record the detection timestamp. This is when the team first becomes aware of the incident – typically when an alert fires, a customer reports an issue, or monitoring detects an anomaly.
  2. Record the resolution timestamp. This is when the incident is resolved and service is confirmed to be operating normally. Resolution means the customer impact has ended, not merely that a fix has been deployed.
  3. Calculate the duration for each incident.
  4. Compute the average across all incidents in a given period.

Data sources:

  • Incident management platforms – PagerDuty, Opsgenie, ServiceNow, or Statuspage provide incident lifecycle timestamps.
  • Monitoring and alerting – alert trigger times from Datadog, Prometheus Alertmanager, CloudWatch, or equivalent.
  • Deployment logs – timestamps of rollbacks or hotfix deployments.

Report both the mean and the median. The mean can be skewed by a single long outage, so the median gives a better sense of typical recovery time. Also track the maximum MTTR per period to highlight worst-case incidents.

Targets

Level Mean Time to Repair
Low More than 1 week
Medium 1 day – 1 week
High Less than 1 day
Elite Less than 1 hour

Elite performers restore service in under one hour. This requires automated rollback or roll-forward capability, fast build pipelines, and well-practiced incident response processes.

Common Pitfalls

  • Closing incidents prematurely. Marking an incident as resolved before the customer impact has actually ended artificially deflates MTTR. Define “resolved” clearly and verify that service is truly restored.
  • Not counting detection time. If the team discovers a problem informally (e.g., a developer notices something odd) and fixes it before opening an incident, the time is not captured. Encourage consistent incident reporting.
  • Ignoring recurring incidents. If the same issue keeps reappearing, each individual MTTR may be short, but the cumulative impact is high. Track recurrence as a separate quality signal.
  • Conflating MTTR with MTTD. Mean Time to Detect (MTTD) and Mean Time to Repair overlap but are distinct. If you only measure from alert to resolution, you miss the detection gap – the time between when the problem starts and when it is detected. Both matter.
  • Optimizing MTTR without addressing root causes. Getting faster at fixing recurring problems is good, but preventing those problems in the first place is better. Pair MTTR with Change Fail Rate to ensure the number of incidents is also decreasing.

Connection to CD

MTTR is a direct measure of how well the entire Continuous Delivery system supports recovery:

  • Pipeline speed is the floor. The minimum possible MTTR for a roll-forward fix is the Build Duration plus deployment time. A 30-minute build means you cannot restore service via a code fix in less than 30 minutes. Reducing build duration directly reduces MTTR.
  • Automated deployment enables fast recovery. Teams that can deploy with one click or automatically can roll back or roll forward in minutes. Manual deployment processes add significant time to every incident.
  • Feature flags accelerate mitigation. If a failing change is behind a feature flag, the team can disable it in seconds without deploying new code. This can reduce MTTR from minutes to seconds for flag-protected changes.
  • Observability shortens detection and diagnosis. Good logging, metrics, and tracing help the team identify the cause of an incident quickly. Without observability, diagnosis dominates the repair timeline.
  • Practice improves performance. Teams that deploy frequently have more experience responding to issues. High Release Frequency correlates with lower MTTR because the team has well-rehearsed recovery procedures.
  • Trunk-based development simplifies rollback. When trunk is always deployable, the team can roll back to the previous commit. Long-lived branches and complex merge histories make rollback risky and slow.

To improve MTTR:

  • Keep the pipeline always deployable so a fix can be deployed at any time.
  • Reduce Build Duration to enable faster roll-forward.
  • Implement feature flags for large changes so they can be disabled without redeployment.
  • Invest in observability – structured logging, distributed tracing, and meaningful alerting.
  • Practice incident response regularly, including deploying rollbacks and hotfixes.
  • Conduct blameless post-incident reviews and feed learnings back into the pipeline and monitoring.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

7 - Release Frequency

How often changes are deployed to production – a DORA key metric for delivery throughput and team capability.

Adapted from Dojo Consortium

Definition

Release Frequency (also called Deployment Frequency) measures how often a team successfully deploys changes to production. It is expressed as deployments per day, per week, or per month, depending on the team’s current cadence.

releaseFrequency = productionDeployments / timePeriod

This is one of the four DORA key metrics. It measures the throughput side of delivery performance – how rapidly the team can get completed work into the hands of users. Higher release frequency enables faster feedback, smaller batch sizes, and reduced deployment risk.

Each deployment should deliver a meaningful change. Re-deploying the same artifact or deploying empty changes does not count.

How to Measure

  1. Count production deployments. Record each successful deployment to the production environment over a defined period.
  2. Exclude non-changes. Do not count re-deployments of unchanged artifacts, infrastructure-only changes (unless relevant), or deployments to non-production environments.
  3. Calculate frequency. Divide the count by the time period. Express as deployments per day (for high performers) or per week/month (for teams earlier in their journey).

Data sources:

  • CD platforms – Argo CD, Spinnaker, Flux, Octopus Deploy, or similar tools track every deployment.
  • CI/CD pipeline logs – GitHub Actions, GitLab CI, Jenkins, and CircleCI record deployment job executions.
  • Cloud provider logs – AWS CodeDeploy, Azure DevOps, GCP Cloud Deploy, and Kubernetes audit logs.
  • Custom deployment scripts – Add a logging line that records the timestamp, service name, and version to a central log or metrics system.

Targets

Level Release Frequency
Low Less than once per 6 months
Medium Once per month to once per 6 months
High Once per week to once per month
Elite Multiple times per day

These levels are drawn from the DORA State of DevOps research. Elite performers deploy on demand, multiple times per day, with each deployment containing a small set of changes.

Common Pitfalls

  • Counting empty deployments. Re-deploying the same artifact or building artifacts that contain no changes inflates the metric without delivering value. Count only deployments with meaningful changes.
  • Ignoring failed deployments. If you count deployments that are immediately rolled back, the frequency looks good but the quality is poor. Pair with Change Fail Rate to get the full picture.
  • Equating frequency with value. Deploying frequently is a means, not an end. Deploying 10 times a day delivers no value if the changes do not meet user needs. Release Frequency measures capability, not outcome.
  • Batch releasing to hit a target. Combining multiple changes into a single release to deploy “more often” defeats the purpose. The goal is small, individual changes flowing through the pipeline independently.
  • Focusing on speed without quality. If release frequency increases but Change Fail Rate also increases, the team is releasing faster than its quality processes can support. Slow down and improve the pipeline.

Connection to CD

Release Frequency is the ultimate output metric of a Continuous Delivery pipeline:

  • Validates the entire delivery system. High release frequency is only possible when the pipeline is fast, tests are reliable, deployment is automated, and the team has confidence in the process. It is the end-to-end proof that CD is working.
  • Reduces deployment risk. Each deployment carries less change when deployments are frequent. Less change means less risk, easier rollback, and simpler debugging when something goes wrong.
  • Enables rapid feedback. Frequent releases get features and fixes in front of users sooner. This shortens the feedback loop and allows the team to course-correct before investing heavily in the wrong direction.
  • Exercises recovery capability. Teams that deploy frequently practice the deployment process daily. When a production incident occurs, the deployment process is well-rehearsed and reliable, directly improving Mean Time to Repair.
  • Decouples deploy from release. At high frequency, teams separate the act of deploying code from the act of enabling features for users. Feature flags, progressive delivery, and dark launches become standard practice.

To improve Release Frequency:

  • Reduce Development Cycle Time by decomposing work into smaller increments.
  • Remove manual handoffs to other teams (e.g., ops, QA, change management).
  • Automate every step of the deployment process, from build through production verification.
  • Replace manual change approval boards with automated policy checks and peer review.
  • Convert hard dependencies on other teams or services into soft dependencies using feature flags and service virtualization.
  • Adopt Trunk-Based Development so that trunk is always in a deployable state.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.

8 - Work in Progress

Number of work items started but not yet completed – a leading indicator of flow problems, context switching, and delivery delays.

Adapted from Dojo Consortium

Definition

Work in Progress (WIP) is the total count of work items that have been started but not yet completed and delivered to production. This includes all types of work: stories, defects, tasks, spikes, and any other items that a team member has begun but not finished.

wip = countOf(items where status is between "started" and "done")

WIP is a leading indicator from Lean manufacturing. Unlike trailing metrics such as Development Cycle Time or Lead Time, WIP tells you about problems that are happening right now. High WIP predicts future delivery delays, increased cycle time, and lower quality.

Little’s Law provides the mathematical relationship:

cycleTime = wip / throughput

If throughput (the rate at which items are completed) stays constant, increasing WIP directly increases cycle time. The only way to reduce cycle time without working faster is to reduce WIP.

How to Measure

  1. Count all in-progress items. At a regular cadence (daily or at each standup), count the number of items in any active state on your team’s board. Include everything between “To Do” and “Done.”
  2. Normalize by team size. Divide WIP by the number of team members to get a per-person ratio. This makes the metric comparable across teams of different sizes.
  3. Track over time. Record the WIP count daily and observe trends. A rising WIP count is an early warning of delivery problems.

Data sources:

  • Kanban boards – Jira, Azure Boards, Trello, GitHub Projects, or physical boards. Count cards in any column between the backlog and done.
  • Issue trackers – Query for items with an “In Progress,” “In Review,” “In QA,” or equivalent active status.
  • Manual count – At standup, ask: “How many things are we actively working on right now?”

The simplest and most effective approach is to make WIP visible by keeping the team board up to date and counting active items daily.

Targets

Level WIP per Team
Low More than 2x team size
Medium Between 1x and 2x team size
High Equal to team size
Elite Less than team size (ideally half)

The guiding principle is that WIP should never exceed team size. A team of five should have at most five items in progress at any time. Elite teams often work in pairs, bringing WIP to roughly half the team size.

Common Pitfalls

  • Hiding work. Not moving items to “In Progress” when working on them keeps WIP artificially low. The board must reflect reality. If someone is working on it, it should be visible.
  • Marking items done prematurely. Moving items to “Done” before they are deployed to production understates WIP. The Definition of Done must include production deployment.
  • Creating micro-tasks. Splitting a single story into many small tasks (development, testing, code review, deployment) and tracking each separately inflates the item count without changing the actual work. Measure WIP at the story or feature level.
  • Ignoring unplanned work. Production support, urgent requests, and interruptions consume capacity but are often not tracked on the board. If the team is spending time on it, it is WIP and should be visible.
  • Setting WIP limits but not enforcing them. WIP limits only work if the team actually stops starting new work when the limit is reached. Treat WIP limits as a hard constraint, not a suggestion.

Connection to CD

WIP is the most actionable flow metric and directly impacts every aspect of Continuous Delivery:

  • Predicts cycle time. Per Little’s Law, WIP and cycle time are directly proportional. Reducing WIP is the fastest way to reduce Development Cycle Time without changing anything else about the delivery process.
  • Reduces context switching. When developers juggle multiple items, they lose time switching between contexts. Research consistently shows that each additional item in progress reduces effective productivity. Low WIP means more focus and faster completion.
  • Exposes blockers. When WIP limits are in place and an item gets blocked, the team cannot simply start something new. They must resolve the blocker first. This forces the team to address systemic problems rather than working around them.
  • Enables continuous flow. CD depends on a steady flow of small changes moving through the pipeline. High WIP creates irregular, bursty delivery. Low WIP creates smooth, predictable flow.
  • Improves quality. When teams focus on fewer items, each item gets more attention. Code reviews happen faster, testing is more thorough, and defects are caught sooner. This naturally reduces Change Fail Rate.
  • Supports trunk-based development. High WIP often correlates with many long-lived branches. Reducing WIP encourages developers to complete and integrate work before starting something new, which aligns with Integration Frequency goals.

To reduce WIP:

  • Set explicit WIP limits for the team and enforce them. Start with a limit equal to team size and reduce it over time.
  • Prioritize finishing work over starting new work. At standup, ask “What can I help finish?” before “What should I start?”
  • Prioritize code review and pairing to unblock teammates over picking up new items.
  • Make the board visible and accurate. Use it as the single source of truth for what the team is working on.
  • Identify and address recurring blockers that cause items to stall in progress.

This content is adapted from the Dojo Consortium, licensed under CC BY 4.0.