A tech lead reads Accelerate, agrees with the premise, and decides the team should start tracking DORA metrics. Three weeks later, they have a spreadsheet that disagrees with itself week on week, a Jira dashboard that nobody trusts, and four different definitions of “deployment” depending on who you ask. This is where DORA adoption usually stalls. Not at understanding the metrics, but at measuring them consistently. The definitions are clean on paper. Measuring them on a real CI/CD estate is where the trouble starts.
This guide walks through the practical decisions and data sources for measuring each of the four DORA metrics. If you need a refresher on what the metrics are and why they matter, start with what are DORA metrics. The rest of this post assumes you already know the definitions and want to know how to actually compute them.
Before you measure, agree the rules
The single biggest reason DORA numbers vary between teams is that different people are answering different questions with the same labels. Settle these four definitions before you write any queries or pick a tool. Write them down somewhere durable.
What counts as a deployment?
A successful run of a specific workflow? A tag push to main? A promotion through a staging gate to a customer-facing environment? The right answer is whatever correlates with code actually reaching users. A build workflow that only runs tests is not a deployment. A feature-flag toggle in production often is. Pick the rule that matches your release process and apply it uniformly across services.
When does the lead-time clock start?
DORA defines lead time as commit-to-production. In practice, most teams start the clock at merge-to-main. The first commit on a feature branch is sensitive to how people work locally, and a merge to main is unambiguous in Git history. The review queue time then shows up separately in PR-health metrics rather than being buried inside lead time.
What qualifies as a change failure?
A rollback is the clearest signal. A hotfix deployed within 24 hours of the previous release is another. An incident opened in PagerDuty or Opsgenie and linked to a specific deployment is a third. Pick two or three of these signals, combine them with OR logic, and stick with the definition. The biggest source of drift in change failure rate over time is quietly narrowing the definition.
When does recovery stop?
The clock stops when customers are no longer affected, not when the deploy goes green. Disabling a broken feature behind a flag is sometimes recovery and sometimes not, depending on whether the feature was required. The cleanest rule is to tie recovery to incident close-time in your on-call system, with “resolved” meaning verified customer impact has ended.
Measuring deployment frequency
The data you need: a timestamped record of every successful production deployment. The simplest source is your CI provider's workflow-run history, filtered to runs whose workflow matches your deployment rule (for example, a workflow called deploy-prod.yml that completed with a success status).
Three practical gotchas to plan for:
- Reruns of the same commit should not count twice. Deduplicate by commit SHA before counting.
- Rollback deploys are a judgement call. Most teams count them, on the basis that they still produced a change in production.
- Multiple services can be aggregated or reported per-service. Aggregation hides healthy per-service frequencies behind a mixed overall number. Per-service is usually more useful for engineering decisions.
Report over rolling windows, not calendar months. A 28-day rolling window smooths out weekend effects and captures genuine velocity change without waiting for a month-end boundary.
Measuring lead time for changes
This is the metric that most often goes wrong, because it requires linking two events across different systems: the commit-or-merge timestamp in Git, and the deployment-success timestamp in your CI provider.
The practical pattern:
- For every successful production deployment, note the commit SHA that deployed.
- Look up the merge-to-main timestamp for that SHA in Git history.
- Subtract. That is the lead time for that one change.
- Report the distribution (p50, p75, p90), not just the mean. The mean is dominated by outliers like commits that sat in non-released state for weeks.
Two common traps. First, if your deployment pipeline deploys multiple commits at once, every commit in that batch gets the same deploy timestamp and the mean lead time collapses. Count each commit, not each deploy. Second, commits that never deploy (reverted, squashed, abandoned) should be excluded. Lead time measures changes that reached users, not changes that were written.
Measuring change failure rate
The numerator is deployments that caused a failure; the denominator is total production deployments. Both come from the deployment record you already built for deployment frequency, plus a signal for which of those deployments turned bad.
Failure signals, in decreasing order of how cleanly they attribute:
- Rollbacks — the deployment was undone. Unambiguous.
- Hotfix-within-window — a follow-up deploy within 24 hours explicitly labelled as a hotfix. Strong but not perfect (some hotfixes are for pre-existing bugs).
- Incident linkage — an incident in your on-call system tagged with the deploy. Requires discipline during incidents.
Most mature teams combine all three signals and treat a deploy as failed if any fires. The cheapest way to start is rollbacks-only, which you can detect from CI data alone without needing integration with PagerDuty or incident tooling.
Measuring mean time to recovery
The data you need: for each change failure, the timestamp at which customer impact started and the timestamp at which it ended.
This is the hardest of the four metrics to measure from CI data alone, because both timestamps usually live in your monitoring or on-call system rather than your pipeline. A practical approach:
- Start time:the incident's created-at in PagerDuty or Opsgenie, or the first alert that fired in your monitoring system. Not the deploy time, because impact can be delayed.
- End time:the incident's resolved-at, with a policy that resolved means verified restoration, not acknowledgement.
- Association: link the incident to a deployment via deploy-tag, commit SHA, or a manual field in your incident template. If you cannot associate, the incident counts in MTTR but not in change failure rate.
Report the median MTTR rather than the mean. One six-hour incident will dominate twenty twenty-minute incidents in the mean, making a month look catastrophic when it was mostly healthy.
The multi-provider complication
Teams that run more than one CI/CD provider face an extra layer of work. Each provider uses different event models (workflow runs in GitHub Actions, pipelines with nested jobs in GitLab CI, freestyle versus Pipeline jobs in Jenkins), different APIs for querying historical runs, and different rate-limit budgets. A team with GitHub Actions for application services and Jenkins for infrastructure deployments cannot simply concatenate the two datasets; the semantics differ.
Rate limits are the constraint that catches most DIY attempts off guard. GitHub's REST API allows 5,000 requests per hour per token. Listing every workflow run for a large organisation at a 15-minute cadence burns through that quickly. ETag conditional requests (responding 304 when nothing has changed) and webhook-driven updates are usually necessary to keep coverage current without exhausting the budget.
Build vs buy
The DIY path is a data pipeline: extract workflow runs and Git events on a schedule, normalise them into a warehouse table, join with incident data from your on-call system, compute the four metrics with SQL, render the result in a BI tool. This works well if you already have a data team and existing warehouse infrastructure. It is a significant ongoing investment if you do not. Every provider API change, every incident tool migration, and every schema drift becomes maintenance for the team that owns the pipeline.
The buy path trades that maintenance for a subscription. Anything in the DORA-metrics category will calculate the four metrics automatically from your providers, with the trade-off that you are accepting a vendor's definitions unless the tool lets you configure them.
How CI/CD Watch measures DORA metrics
CI/CD Watch, a CI/CD observability platform that monitors pipelines across GitHub Actions, GitLab CI, Bitbucket Pipelines, CircleCI, Azure DevOps, and Jenkins, computes all four DORA metrics from your existing pipeline data. Deployments are detected by configurable deployment rules (by workflow name, branch, tag pattern, or environment), so the definition of “deployment” stays under your control rather than assuming every workflow run counts.
Lead time is computed at the commit level using merge-to-main as the start event. Change failure rate combines rollbacks with optional incident-system integration. MTTR is sourced from the linked incident record. The measurement rules and the performance-level thresholds are configurable per team via metrics settings, so the numbers reflect your release process rather than a generic default. Full methodology is in the DORA metrics documentation.
If most of your CI runs on GitHub Actions, our companion post on building a GitHub Actions dashboard covers the provider-specific side of unified monitoring.
Try DORA metrics in CI/CD Watch
The Free tier covers pipeline monitoring. DORA metrics, trend charts, and alerts are available on Team plan and above. Connect a provider and you will have your own numbers within minutes, with no manual tagging and no custom instrumentation. For a manager-focused view of how teams use these numbers, see our engineering managers use case.
Key takeaways
- Agree definitions before you measure. Most DORA drift comes from quiet definition changes, not bad data.
- Deployment frequency and change failure rate share a source of truth. Build the deployment record once, use it for both.
- Lead time requires joining Git events to deploy events by commit SHA. Report the p50/p75/p90 distribution, not the mean.
- MTTR is sourced from incident tooling, not CI. Start the clock at customer impact, stop it at verified restoration.
- Multi-provider teams face rate-limit and event-model differences that make naive concatenation unreliable.
CI/CD Watch is built by 3CS Technologies Ltd. It started as an internal tool for tracking pipeline health across a mixed GitHub Actions and Jenkins estate. The same engine now powers the SaaS platform.