AI-assisted software development

AI-assisted software development amplifies a team's existing CI/CD practices rather than fixing them. The signals that told you whether your delivery was healthy last year matter more this year, not less. The classifications (healthy / flaky / broken), the DORA metrics, the cost-of-wait-time framing, all become the inputs that determine whether AI assistance pays off. Buyers looking for the setup walkthrough should read the companion page on CI/CD for AI-assisted development.

~12 min readUpdated 22 May 2026

A developer is mid-task in Cursor. The agent has drafted a fix for a bug in the auth module. It runs the test suite. Three tests fail. The agent reads the output, suggests a tweak, retries. Two pass. The third keeps failing. The agent tries another tweak. Now four tests fail. The agent narrates its confusion in the chat and asks if the developer can rerun the suite manually.

The third test is flaky. It has been flaky for three weeks. It is on the team’s quarantine list but nobody got around to skipping it. None of that context exists in the assistant’s prompt. The assistant is working blind on a problem the team already classified and shelved.

CI/CD Watch per-test flaky tests view showing flip rate, failure rate, last failure, and rerun count per test, the structured signal an AI assistant can query via MCP instead of inferring from failure output
The per-test view the assistant could have queried before its second retry. Flip rate, failure rate, last failure, and rerun count per test, all structured, all available through one MCP call.

Your AI assistant is flying blind without CI/CD context

AI coding assistants see your code, your IDE state, and whatever you paste into the chat. They do not see your pipeline history, your team’s DORA numbers, the audit findings the platform team filed last week, the cost waste that built up on a runaway nightly job, or which tests have been flaky long enough to be quarantined. They infer everything they say about delivery health from the small window of context you happen to give them.

That gap matters more as assistants take more autonomous actions. A pair-programming assistant guessing about a flake costs you minutes of wasted chat. An agent triggering its own rerun on a flake costs you a CI minute and a compounded false signal. An agent suggesting a refactor based on what it assumes about your pipeline practices costs you a review cycle when the suggestion turns out to contradict the team’s actual rules.

The fix is structural: give the assistant a way to read the same data the platform team reads. Not by training, not by pasting screenshots, and not by hoping the assistant guesses correctly. By exposing the data through a typed interface the assistant can call. The MCP (Model Context Protocol) is the standard for that today. CI/CD Watch implements it; for the buyer-facing setup walkthrough see CI/CD for AI-assisted development and the underlying MCP server reference.

What an assistant needs to see

Once connected, an AI assistant gains access to eight tools that mirror the data surfaces in the web app: recent pipeline runs, connected providers, DORA metrics, cost analysis, performance analysis, audit runs, individual audit run detail, and audit findings. Every tool is read-only by design. The assistant can query everything the platform team queries; it cannot trigger reruns, change connections, or alter settings.

The shape of the data matters. The MCP server does not return raw provider payloads; it returns the same normalised data the web app renders, with consistent fields across GitHub Actions, GitLab CI, Bitbucket Pipelines, CircleCI, Azure DevOps, and Jenkins. So an assistant asking “what is the flip rate on login-flow.test.ts” gets the same answer whether the repo runs on GitHub or Jenkins. Provider differences sit behind the protocol, not in the assistant’s prompt.

The classifications are surfaced too. Healthy, flaky, and broken come back as fields on workflows and tests. The audit pillars (tests, lint, supply chain, workflow efficiency, flaky-test handling, process hygiene, cost waste) come back as structured findings with severity, evidence, and a remediation pointer. The assistant gets the conclusions, not just the raw signal, which means it spends fewer tokens deriving them itself and more tokens acting on them.

What changes in practice once it can

Three patterns show up once an assistant has CI/CD context. The first is grounded triage. The assistant answers “why did this test fail” with a query against the flaky-test data rather than a guess from the error output. If the test is on the quarantine list it says so; if the flip rate is high it says so; if the failure is a genuine regression it says that too. Every answer is checkable against the platform’s own classification.

The second is DORA-aware suggestions. An assistant proposing a workflow change can query get-dora-metricsfirst and reason about whether the change would help or hurt the team’s lead time. A suggestion to add more matrix dimensions looks different when the assistant can see that the workflow’s p95 duration is already at the slow tier; a suggestion to merge before a flake clears looks different when the assistant can see the change failure rate trend.

The third is audit-grounded refactors. The stability classification and the audit findings give an assistant structured opinion about what the platform team wants. An assistant that knows the supply-chain pillar has flagged unpinned actions can suggest pinning them; an assistant that knows the workflow-efficiency pillar has flagged a missing timeout can add one. The platform team effectively writes the rules once and every MCP-connected assistant follows them.

The amplification thesis

As the 2025 DORA Report on AI-Assisted Software Development frames it, AI tools amplify a team’s existing practices rather than fixing them. A team with strong trunk-based development, fast tests, and a healthy stability classification gets more value from AI assistance because the feedback the assistant receives is fast and reliable. A team with long-running pipelines, flaky tests, and a permanent rerun-as-policy culture gets less value, because the assistant’s feedback loop is slow and noisy.

The corollary follows: the observability signals that told you whether your CI/CD practices were healthy last year matter more this year, not less. DORA, stability, cost, and audit are not separate concerns from AI-assisted development; they are the inputs that determine whether AI assistance pays off.

This is why the CI/CD Watch product surface and the AI-integration surface are the same product. The same signals the platform team uses to keep delivery healthy are the signals an AI assistant needs to make grounded suggestions. Closing the assistant’s context gap is closing the platform team’s feedback loop, from a new direction.

Anti-pattern: AI throughput as the metric

The strongest anti-pattern around AI-assisted dev right now is measuring AI throughput as a productivity proxy. Lines accepted. Suggestions merged. Time saved per prompt. These numbers are easy to extract from the assistant’s telemetry, hard to interpret, and actively misleading when read as productivity. An assistant that accepts more lines may be producing more code that needs more review and more rework. A suggestion accepted in the IDE is not a deployment that landed in production.

The DORA metrics measure the right thing for the AI era for the same reason they measured the right thing before. Deployment frequency, lead time for changes, change failure rate, mean time to recovery, and deployment rework rate care about whether code reaches production safely, not about who or what wrote it. If AI assistance is helping, those numbers move in the right direction. If it is not, the throughput metrics will say it is anyway. Lead with the outcome metrics, not the activity metrics.

Read together

The amplification thesis is why CI/CD Watch invested in the MCP server early. We treat AI-assisted development as a downstream effect of the same observability signals our customers already use. The pillar pages on CI/CD monitoring, DORA metrics, pipeline stability, and CI/CD cost cover the underlying surfaces; this page covers how an AI assistant reads them.

The signals an assistant can read

The MCP server exposes eight read-only signals that together describe a team’s delivery state. Five are analytical (list-runs, get-dora-metrics, get-costs, get-performance, list-connections) and three are audit-focused (list-audit-runs, get-audit-run, list-audit-findings). Read-only by design, so an assistant can query every surface the platform team queries but cannot trigger reruns, change settings, or alter connections.

ToolWhat it returns
list-runsRecent pipeline runs across connected providers, optionally filtered by period
list-connectionsThe CI/CD providers connected to your tenant
get-dora-metricsDeployment frequency, lead time for changes, change failure rate, MTTR, deployment rework rate
get-costsCompute and wait-time cost per workflow, with waste categories
get-performanceMedian and p95 duration per workflow, rating tier, and trend
list-audit-runsRecent audit runs with status, timing, and worker info
get-audit-runA specific audit run by ID
list-audit-findingsAudit findings filtered by state, pillar, rule, organisation, or repository

The four analytical tools (list-runs, get-dora-metrics, get-costs, get-performance) accept an optional periodDaysparameter (1–365) and default to 30 days. The audit tools take their own inputs: get-audit-run takes a run ID; list-audit-findings takes optional state, pillar, rule, org, and repo filters. Free-tier callers get counts only on audit findings; paid tiers get the full evidence payload.

Where CI/CD Watch fits

CI/CD Watch is a CI/CD observability platform that monitors pipelines across GitHub Actions, GitLab CI, Bitbucket Pipelines, CircleCI, Azure DevOps, and Jenkins. The web app is the team-facing surface; the MCP server and CLI are the assistant-facing surfaces. All three read the same normalised data and apply the same classifications.

The Free tier covers pipeline-run monitoring across connected providers, which lets you wire up the MCP server and validate the assistant integration before committing to a paid tier. DORA metrics, cost analysis, performance analysis, and the rich audit findings live on Team and above. Treat the MCP integration the way you treat the API: same data, different transport, same plan gates.

We built the MCP server early because we treat AI-assisted development as a first-class consumer of CI/CD signals, not a downstream marketing angle. If you are integrating an assistant against a CI/CD tool that does not have an MCP server, the alternative is parsing the provider’s native API yourself for every question the assistant asks. The protocol is the reason this works at all.

FAQ

Common questions

What is MCP and why does it matter for CI/CD?
The Model Context Protocol is an open standard that lets AI assistants connect to external data sources through a typed tool interface. For CI/CD it matters because the alternative is pasting pipeline logs or DORA screenshots into your assistant's prompt. With MCP your assistant queries the same structured data the web app reads, and grounds its answers in what is actually happening in your pipelines rather than in what you remember to share.
Which AI assistants work with the CI/CD Watch MCP server?
Any MCP-compatible client. The ones we have explicit setup steps for are Claude Desktop, Claude Code, Cursor, and Windsurf. Other clients work the same way as long as they accept an HTTP MCP server with a Bearer token, which is the standard transport. There is nothing assistant-specific in the protocol itself.
Do I need to install anything to use the MCP server?
No. The MCP server is hosted; there is no local server, Docker container, or extension to install. The setup walkthrough (config block, file paths per client, API key creation) lives on the buyer-facing companion page at /use-cases/ai-assisted-development.
What data can my AI assistant actually see?
The same eight surfaces the web app exposes: recent pipeline runs, connected providers, DORA metrics, cost analysis, performance analysis, audit runs, individual audit run detail, and audit findings filtered by state or pillar. It cannot write anything; the tools are read-only by design, so an assistant cannot trigger reruns or change settings even if asked.
Does the assistant get my customer's data?
It gets your tenant's data, scoped by the API key. The key you create has read access to your tenant only; it cannot see other tenants. Free-tier callers see counts rather than full evidence on audit findings; paid tiers see the full payload. Treat the API key as a credential and rotate it the way you would any other secret.
What is the CLI for if I have the MCP server?
Agents that prefer a binary over a protocol use the CLI. The cicd binary ships nine subcommands covering the same surface as the MCP tools, plus a TUI dashboard. Most teams use the MCP server day to day and reach for the CLI when scripting against the API from CI workflows themselves or when wiring CI/CD Watch into agentic frameworks that do not yet support MCP.
Does this make my assistant write better code?
It makes the assistant's answers about your pipelines accurate. Whether that translates to better code depends on what you ask. Questions like 'why is this test flaky' or 'what does this workflow's rerun rate look like' get grounded answers instead of plausible-sounding guesses. Code suggestions that act on real DORA or stability signals are stronger than suggestions that ignore them.

Read on

This page covers the concept: why AI-assisted development amplifies rather than fixes a team’s practices, what the eight signals are that an assistant needs to read, and the anti-pattern of measuring AI throughput instead of delivery outcomes.

For the buyer-facing version, including the MCP setup walkthrough, plan/pricing, and the per-assistant integration shapes, read CI/CD for AI-assisted development. For the canonical MCP server reference (config blocks, file paths per client, full tool schemas), see the MCP server docs. For the DORA framing this page leans on, read DORA metrics.

Last updated 21 May 2026.