How to Measure DORA Metrics: A Practical Guide

A tech lead reads Accelerate, agrees with the premise, and decides the team should start tracking DORA metrics. Three weeks later, they have a spreadsheet that disagrees with itself week on week, a Jira dashboard that nobody trusts, and four different definitions of “deployment” depending on who you ask. This is where DORA adoption usually stalls. Not at understanding the metrics, but at measuring them consistently. The definitions are clean on paper. Measuring them on a real CI/CD estate is where the trouble starts.

This guide walks through the practical decisions and data sources for measuring each of the five DORA metrics (the original four plus deployment rework rate, which DORA added in 2024). If you need a refresher on what the metrics are and why they matter, start with what are DORA metrics. The rest of this post assumes you already know the definitions and want to know how to actually compute them.

Before you measure, agree the rules

The single biggest reason DORA numbers vary between teams is that different people are answering different questions with the same labels. Settle these five definitions before you write any queries or pick a tool. Write them down somewhere durable.

What counts as a deployment?

A successful run of a specific workflow? A tag push to main? A promotion through a staging gate to a customer-facing environment? The right answer is whatever correlates with code actually reaching users. A build workflow that only runs tests is not a deployment. A feature-flag toggle in production often is. Pick the rule that matches your release process and apply it uniformly across services.

When does the lead-time clock start?

DORA defines lead time as commit-to-production. The strict reading, and our position, places the clock start at the first commit of the change on the feature branch, even if it sits unpushed for a day. Many product dashboards and how-to guides default to merge-to-main on the grounds that it is unambiguous in Git history and easy to query; a merge-to-main number is a CI duration metric with a DORA label on it, because it hides the review queue and the pre-merge iteration time. The full argument, including how to handle clock-skew and squash-merges, is treated separately under lead time for changes.

What qualifies as a change failure?

A rollback is the clearest signal. A hotfix deployed within 24 hours of the previous release is another. An incident opened in PagerDuty or Opsgenie and linked to a specific deployment is a third. Pick two or three of these signals, combine them with OR logic, and stick with the definition. The biggest source of drift in change failure rate over time is quietly narrowing the definition.

When does recovery stop?

The clock stops when customers are no longer affected, not when the deploy goes green. Disabling a broken feature behind a flag is sometimes recovery and sometimes not, depending on whether the feature was required. The cleanest rule is to tie recovery to incident close-time in your on-call system, with “resolved” meaning verified customer impact has ended.

What counts as rework?

Rework rate, the fifth DORA metric added in 2024, needs its own definition because its signal depends entirely on which follow-up deploys get counted. The clearest rule is: any deploy whose purpose is to address a production incident, tagged either through a hotfix branch, a commit message convention, or a link to an incident ticket. A simpler fallback: any deploy made within 24 hours of a change-fail deploy, counted until the incident is resolved. Pick one rule, apply it everywhere, and resist the urge to exclude “small” rework deploys — the point of the metric is to surface exactly that kind of activity.

Measuring deployment frequency

The data you need: a timestamped record of every successful production deployment. The simplest source is your CI provider's workflow-run history, filtered to runs whose workflow matches your deployment rule (for example, a workflow called deploy-prod.yml that completed with a success status).

Three practical gotchas to plan for:

Reruns of the same commit should not count twice. Deduplicate by commit SHA before counting.
Rollback deploys are a judgement call. Most teams count them, on the basis that they still produced a change in production.
Multiple services can be aggregated or reported per-service. Aggregation hides healthy per-service frequencies behind a mixed overall number. Per-service is usually more useful for engineering decisions.

Report over rolling windows, not calendar months. A 28-day rolling window smooths out weekend effects and captures genuine velocity change without waiting for a month-end boundary.

Measuring lead time for changes

This is the metric that most often goes wrong, because it requires linking two events across different systems: the commit-or-merge timestamp in Git, and the deployment-success timestamp in your CI provider.

The practical pattern:

For every successful production deployment, note the commit SHA that deployed.
Look up the merge-to-main timestamp for that SHA in Git history.
Subtract. That is the lead time for that one change.
Report the distribution (p50, p75, p90), not just the mean. The mean is dominated by outliers like commits that sat in non-released state for weeks.

Two common traps. First, if your deployment pipeline deploys multiple commits at once, every commit in that batch gets the same deploy timestamp and the mean lead time collapses. Count each commit, not each deploy. Second, commits that never deploy (reverted, squashed, abandoned) should be excluded. Lead time measures changes that reached users, not changes that were written.

Measuring change failure rate

The numerator is deployments that caused a failure; the denominator is total production deployments. Both come from the deployment record you already built for deployment frequency, plus a signal for which of those deployments turned bad.

Failure signals, in decreasing order of how cleanly they attribute:

Rollbacks — the deployment was undone. Unambiguous.
Hotfix-within-window — a follow-up deploy within 24 hours explicitly labelled as a hotfix. Strong but not perfect (some hotfixes are for pre-existing bugs).
Incident linkage — an incident in your on-call system tagged with the deploy. Requires discipline during incidents.

Most mature teams combine all three signals and treat a deploy as failed if any fires. The cheapest way to start is rollbacks-only, which you can detect from CI data alone without needing integration with PagerDuty or incident tooling.

Measuring mean time to recovery

The data you need: for each change failure, the timestamp at which customer impact started and the timestamp at which it ended.

This is the hardest of the five DORA metrics to measure from CI data alone, because both timestamps usually live in your monitoring or on-call system rather than your pipeline. A practical approach:

Start time:the incident's created-at in PagerDuty or Opsgenie, or the first alert that fired in your monitoring system. Not the deploy time, because impact can be delayed.
End time:the incident's resolved-at, with a policy that resolved means verified restoration, not acknowledgement.
Association: link the incident to a deployment via deploy-tag, commit SHA, or a manual field in your incident template. If you cannot associate, the incident counts in MTTR but not in change failure rate.

Report the median MTTR rather than the mean. One six-hour incident will dominate twenty twenty-minute incidents in the mean, making a month look catastrophic when it was mostly healthy.

Measuring deployment rework rate

Rework rate is the ratio of deployments that are unplanned follow-ups to a production incident. Measuring it needs two things: the same deployment record you already built for deployment frequency, plus a way of classifying which deployments are unplanned rework rather than planned feature work.

Rework signals, in decreasing order of how cleanly they attribute:

Hotfix labels — if your release process tags hotfix deploys explicitly (branch name, commit message prefix, or a CI workflow dedicated to hotfixes), these are the cleanest signal.
Incident-linked deploys — any deployment whose commit message or PR references an open incident ticket. Requires discipline but works well once the habit lands.
Follow-up window — any deployment within 24 hours of a change-failure deploy. A reasonable default when you don't yet have explicit hotfix tagging in place.

Change fail rate captures which deploys needed intervention; rework rate captures how many deploys that intervention took. A team with a 5% change fail rate but a 20% rework rate is recovering from each failure with four follow-up deploys on average, which is still a stability problem — just a different one. Reporting both together keeps teams from hiding post-failure activity inside “routine” deployment counts.

The multi-provider complication

Teams that run more than one CI/CD provider face an extra layer of work. Each provider uses different event models (workflow runs in GitHub Actions, pipelines with nested jobs in GitLab CI, freestyle versus Pipeline jobs in Jenkins), different APIs for querying historical runs, and different rate-limit budgets. A team with GitHub Actions for application services and Jenkins for infrastructure deployments cannot simply concatenate the two datasets; the semantics differ.

Rate limits are the constraint that catches most DIY attempts off guard. GitHub's REST API allows 5,000 requests per hour per token. Listing every workflow run for a large organisation at a 15-minute cadence burns through that quickly. ETag conditional requests (responding 304 when nothing has changed) and webhook-driven updates are usually necessary to keep coverage current without exhausting the budget.

Build vs buy

The DIY path is a data pipeline: extract workflow runs and Git events on a schedule, normalise them into a warehouse table, join with incident data from your on-call system, compute the five metrics with SQL, render the result in a BI tool. This works well if you already have a data team and existing warehouse infrastructure. It is a significant ongoing investment if you do not. Every provider API change, every incident tool migration, and every schema drift becomes maintenance for the team that owns the pipeline.

The buy path trades that maintenance for a subscription. Anything in the DORA-metrics category will calculate the five metrics automatically from your providers, with the trade-off that you are accepting a vendor's definitions unless the tool lets you configure them.

How CI/CD Watch measures DORA metrics

CI/CD Watch, a CI/CD observability platform that monitors pipelines across GitHub Actions, GitLab CI, Bitbucket Pipelines, CircleCI, Azure DevOps, and Jenkins, computes the DORA metrics from your existing pipeline data, including deployment rework rate alongside the original four. Deployments are detected by configurable deployment rules (by workflow name, branch, tag pattern, or environment), so the definition of “deployment” stays under your control rather than assuming every workflow run counts.

Lead time is computed at the commit level using merge-to-main as the start event. Change failure rate combines rollbacks with optional incident-system integration. MTTR is sourced from the linked incident record. The measurement rules and the performance-level thresholds are configurable per team via metrics settings, so the numbers reflect your release process rather than a generic default. Full methodology is in the DORA metrics documentation.

If most of your CI runs on GitHub Actions, our companion post on building a GitHub Actions dashboard covers the provider-specific side of unified monitoring.

Try DORA metrics in CI/CD Watch

The Free tier covers pipeline monitoring. DORA metrics, trend charts, and alerts are available on Team plan and above. Connect a provider and you will have your own numbers within minutes, with no manual tagging and no custom instrumentation. For a manager-focused view of how teams use these numbers, see our engineering managers use case.

Get started free

Key takeaways

Agree definitions before you measure. Most DORA drift comes from quiet definition changes, not bad data.
Deployment frequency and change failure rate share a source of truth. Build the deployment record once, use it for both.
Lead time requires joining Git events to deploy events by commit SHA. Report the p50/p75/p90 distribution, not the mean.
MTTR is sourced from incident tooling, not CI. Start the clock at customer impact, stop it at verified restoration.
Multi-provider teams face rate-limit and event-model differences that make naive concatenation unreliable.

CI/CD Watch is built by 3CS Technologies Ltd. It started as an internal tool for tracking pipeline health across a mixed GitHub Actions and Jenkins estate. The same engine now powers the SaaS platform.