Baseline Metrics
6 minute read
Phase 0 - Assess | Scope: Team
You cannot improve what you have not measured. Before making any changes to your delivery process, capture two types of baseline measurements: CI health metrics and DORA outcome metrics.
- CI health metrics are leading indicators. They reflect current team behaviors and move immediately when those behaviors change. Use them to drive improvement experiments throughout the migration.
- DORA metrics are lagging outcome metrics. They reflect the cumulative effect of many upstream behaviors and move slowly. Capture them now as your honest “before” picture for reporting progress to leadership.
Without baselines, you cannot prove improvement, cannot detect regression, and default to fixing what is visible rather than what is the actual constraint.
CI Health Metrics
These three metrics tell you whether your team’s integration practices are healthy. They surface problems immediately and are your primary signal during the migration.
Integration Frequency
What it measures: How often developers commit and integrate to trunk per day.
How to capture it: Count commits merged to trunk over the last 10 working days. Divide by the number of active developers and by 10.
| Frequency | What It Suggests |
|---|---|
| 2 or more per developer per day | Small batches, fast feedback |
| 1 per developer per day | Reasonable starting point |
| Less than 1 per developer per day | Long-lived branches or large work items |
Record your number: ______ average commits to trunk per developer per day.
Build Success Rate
What it measures: The percentage of CI builds that pass on the first attempt.
How to capture it: Pull the last 30 days of CI build history from your pipeline tool. Divide passing builds by total builds.
| Success Rate | What It Suggests |
|---|---|
| 90% or higher | Reliable pipeline; developers integrate with confidence |
| 70-90% | Flaky tests or inconsistent local validation before pushing |
| Below 70% | Broken build is normalized; integration discipline is low |
Record your number: ______ % of CI builds that pass on first attempt.
Time to Fix a Broken Build
What it measures: The elapsed time from a build breaking on trunk to the next green build.
How to capture it: Identify build failures on trunk over the last 30 days. For each failure, record the time from first red build to next green build. Take the median.
| Time to Fix | What It Suggests |
|---|---|
| Less than 10 minutes | Team treats broken builds as stop-the-line |
| 10-60 minutes | Manual but fast response |
| More than 1 hour | Broken build is not treated as urgent |
Record your number: ______ median time to fix a broken build.
DORA Metrics
The DORA research program (now part of Google Cloud) identified four metrics that predict software delivery performance and organizational outcomes. These are lagging indicators - they confirm that improvement work is compounding into better delivery outcomes.
Do not use these as improvement targets. See DORA Metrics as Delivery Improvement Goals.
Deployment Frequency
What it measures: How often your team deploys to production.
How to capture it: Count the number of production deployments in the last 30 days. Check your pipeline system, deployment logs, or change management records.
| Frequency | What It Suggests |
|---|---|
| Multiple times per day | You may already be practicing continuous delivery |
| Once per week | Regular cadence but likely batch changes |
| Once per month or less | Large batches, high risk per deployment, likely manual process |
Record your number: ______ deployments in the last 30 days.
Lead Time for Changes
What it measures: The elapsed time from when code is committed to trunk to when it is running in production.
How to capture it: Pick your last 5-10 production deployments. For each one, find the merge timestamp of the oldest change included and subtract it from the deployment timestamp. Take the median.
| Lead Time | What It Suggests |
|---|---|
| Less than 1 hour | Fast flow, small batches, good automation |
| 1 day to 1 week | Reasonable with room for improvement |
| 1 week to 1 month | Significant queuing or manual gates |
| More than 1 month | Major constraints in testing, approval, or deployment |
Record your number: ______ median lead time for changes.
Change Failure Rate
What it measures: The percentage of deployments to production that result in a degraded service requiring remediation (rollback, hotfix, or patch).
How to capture it: Look at your last 20-30 production deployments. Count how many caused an incident, required a rollback, or needed an immediate hotfix. Divide by total deployments.
| Failure Rate | What It Suggests |
|---|---|
| 0-15% | Strong quality practices and small change sets |
| 16-30% | Typical for teams with some automation |
| Above 30% | Systemic quality problems |
Record your number: ______ % of deployments that required remediation.
Mean Time to Restore (MTTR)
What it measures: How long it takes to restore service after a production failure caused by a deployment.
How to capture it: Look at your production incidents from the last 3-6 months. For each incident caused by a deployment, record the time from detection to resolution. Take the median.
| MTTR | What It Suggests |
|---|---|
| Less than 1 hour | Good incident response, likely automated rollback |
| 1-4 hours | Manual but practiced recovery process |
| 4-24 hours | Significant manual intervention required |
| More than 1 day | Serious gaps in observability or rollback capability |
Record your number: ______ median time to restore service.
What Your Baselines Tell You
Your numbers point toward specific constraints:
| Signal | Likely Constraint | Where to Look |
|---|---|---|
| Low integration frequency | Long-lived branches or large work items | Work Decomposition |
| Low build success rate | Flaky tests or insufficient local validation | Testing Fundamentals |
| Long time to fix broken builds | No stop-the-line culture | Working Agreements |
| Low deployment frequency + long lead time | Large batches or manual gates | Value Stream Map |
| High change failure rate | Insufficient testing | Testing Fundamentals |
| Long MTTR | No rollback capability or poor observability | Rollback |
Use these signals alongside your value stream map to identify your top constraints.
“When a measure becomes a target, it ceases to be a good measure.”
These metrics are diagnostic tools, not performance targets. Use them within the team, for the team. Never use them to rank individuals or compare teams.
Next Step
With your baselines recorded, proceed to Identify Constraints to determine which bottleneck to address first.
Related Content
- Integration Frequency - how often developers integrate to trunk
- Build Duration - pipeline speed as a feedback signal
- Deployment Frequency - how often you deploy to production
- Lead Time for Changes - time from commit to production
- Change Failure Rate - percentage of deployments that cause failures
- Mean Time to Restore - recovery speed after production incidents
- Metrics-Driven Improvement - how to use these metrics to drive improvement in Phase 3
- DORA Metrics as Delivery Improvement Goals - why not to use these as targets