Pipeline Stability

CI/CD Watch classifies every pipeline as healthy, flaky, or broken based on its run history. This gives you a quick way to spot unreliable pipelines and understand where your CI/CD system is wasting time and money.

Classification

Each pipeline is classified using its failure rate and flip rate within the selected time window.

Classification	Criteria
Broken	Failure rate ≥ 80%
Flaky	Flip rate ≥ 30% andfailure rate > 0%
Healthy	Everything else

Flip Rate

What it measures:How often a pipeline's status transitions between passing and failing. A high flip rate means the pipeline is unpredictable, it passes one run and fails the next without any code change.

How it's calculated:The number of status transitions (pass → fail or fail → pass) divided by the total number of runs minus one. Only terminal statuses are considered, succeeded and failed. Runs that are still in progress, cancelled, or in any other non-terminal state are excluded.

flipRate = flipCount / (totalRuns - 1)

Severity

Flaky pipelines are assigned a severity level to help you prioritize which ones to investigate first.

Severity	Criteria
High	Flip rate ≥ 50%
Medium	Flip rate < 50%

Job-Level Stability

Stability is also calculated per job within a pipeline. When a pipeline is flaky, the job-level breakdown helps you pinpoint exactly which job is causing the instability rather than investigating the entire pipeline.

Each job gets its own failure rate and flip rate, calculated the same way as pipeline-level metrics. This makes it easy to see if flakiness is concentrated in a single test job or spread across multiple stages.

Test-Level Stability

Beyond pipeline and job-level stability, CI/CD Watch tracks stability at the individual test level. The flaky tests page (under Stability → Flaky Tests) shows a sortable table of all tests with failures, including their failure rate, flip rate, total runs, and classification (healthy, flaky, or broken).

Expand any test to see its last failure details, error messages, stack traces, commit SHA, and a link to the CI run. This helps you diagnose flaky tests without switching to your CI provider.

Waste Impact

Flaky and broken pipelines waste both compute resources and developer time. Every failed run that didn't need to fail costs money in CI minutes and blocks a developer from getting feedback on their change.

The stability page shows the estimated cost of unreliable pipelines, combining compute charges and developer wait time. This helps you make a business case for fixing the pipelines that are costing you the most.

Flaky Tests , individual test stability tracking and classification
Cost Calculations , how waste from unstable pipelines is quantified
Performance Ratings , pipeline duration ratings and optimisation suggestions
Deployment Detection , how deployment pipeline stability affects DORA metrics