Give Your AI Assistant CI/CD Context with MCP

A developer asks Claude Code why the auth test keeps failing in CI. The assistant reads the failure output, suggests a fix to the assertion, asks for a rerun. The test passes. The developer ships it. The next morning the same test fails on another developer's commit. The original failure was not a bug in the assertion. It was a flake that has been flipping for three weeks.

That gap has a name: the AI assistant CI/CD context problem. The assistant saw the failure output, the test source, and the diff. It did not see the flip rate, the rerun history, or the fact that the platform team had already triaged this test and put it on the quarantine list. Every decision it made was downstream of context it did not have.

The AI assistant CI/CD context gap

AI coding assistants see your code and whatever you paste into the chat. They do not see your pipeline state, DORA numbers, audit findings, cost waste, or which tests are quarantined. That data lives behind the provider's API and dashboard; the assistant has no path to it. Pasting screenshots into the prompt does not scale, and prompt context is not persistent between sessions. The fix is structural rather than prompt engineering: give the assistant a typed interface to the same data the platform team reads.

The Model Context Protocol is the open standard for that interface today. An MCP server exposes a fixed set of tools with typed parameters and return values; any MCP-compatible client (Claude Desktop, Claude Code, Cursor, Windsurf, GitHub Copilot in VS Code or JetBrains, plus any agent that speaks the protocol) can call those tools. The server decides which tools to expose and what they do; the assistant can call only what is defined, with the parameters the tool declares.

What follows covers the capabilities an MCP server for CI/CD should expose, three queries that change the conversation once those capabilities exist, and what implementing the pattern looks like in practice today. The wider framing on why this matters more in 2026 than it did a year ago sits in the AI-assisted software development guide.

Eight capabilities an MCP-connected CI/CD layer should expose

Eight capabilities cover the surface area a platform engineer works from. Five are analytical, three are audit-focused. Together they let an assistant answer almost any question about delivery state with grounded data instead of inference.

Recent pipeline runs. Filterable by repo, workflow, branch, status. The base primitive every other query usually starts from.
Connected providers. What CI/CD systems the tenant has wired up. Useful for the assistant to ground itself before asking anything else.
DORA metrics. Deployment frequency, change lead time, change failure rate, failed deployment recovery time, deployment rework rate. Per repo, per window, banded against the Elite / High / Medium / Low thresholds.
Cost analysis. Compute cost and developer wait-time cost per workflow, with the wait-to-compute ratio. The category the assistant needs to reason about whether a slow pipeline is mostly burning machines or mostly burning engineers.
Performance signals. P95 duration, median duration, and a slow / fast / fresh classification per workflow. So an assistant can spot workflows already at the slow tier before suggesting changes that would push them slower.
Audit runs. Recent pipeline health checks across pillars like supply chain, workflow efficiency, flaky-test handling, and cost waste. Pass / fail rolls at the pillar level, so the assistant can spot which area is currently in poorest shape.
Single audit run detail. Drilldown on one run: which rules fired, which passed, which were dismissed by the team. Used once the rolled-up audit list surfaces something worth a closer look.
Audit findings. Findings filtered by state (gaps, clear, dismissed) and pillar. So the assistant can ask “what unpinned actions do we have” and act on the answer instead of guessing at the security posture.

The shape of the data matters as much as the count. A well-designed CI/CD MCP server returns normalised data across providers rather than passing through raw provider payloads. So a query about flip rate on a specific test returns the same shape whether the repo runs on GitHub Actions, GitLab CI, or Jenkins. Provider differences sit behind the protocol, not in the assistant's prompt.

Three queries that change the conversation

The capabilities are dry on their own. What matters is the kinds of questions an assistant can answer once they exist.

“Why is this test flaky?”

Without CI/CD context the assistant reads the failure output, guesses at race conditions, suggests a sleep or a retry. With an audit-findings query filtered to the flaky-test pillar, it gets the flip rate, last-failure timestamp, and rerun count for the specific test, and answers with the actual classification. An example response, with illustrative numbers: “this is on the team’s flaky list, flip rate around 18 percent in the last month, several reruns in that window. The team has tagged it for triage. Not a regression in your change.” The diagnostic framework behind that data lives in the pipeline stability reference; the assistant reads the conclusion the framework produces.

“Will this commit hurt our DORA numbers?”

A common ask before landing a large refactor: would committing it now hurt the team's stability numbers. Without CI/CD context the assistant has opinions but no data. With a DORA-metrics query it sees the team is sitting in the High band on change failure rate and Elite on deployment frequency, and reasons about whether the refactor's characteristics put either at risk. The five canonical metrics are documented in the DORA metrics reference; the assistant gets them as a JSON object rather than as a screenshot the developer would have to interpret aloud.

“Are we wasting compute on this workflow?”

An assistant asked to optimise CI cost queries cost and performance signals together. As an example: on a workflow whose wait-to-compute ratio is out of proportion to compute spend, it concludes that the bottleneck is queueing rather than compute, and proposes moving from sequential gating to parallel jobs rather than buying a bigger runner. That is the kind of answer that requires two observability signals at once; the protocol is what makes them composable in a single conversational turn.

What MCP is not

Two clarifications worth making explicit, because the MCP category is young and the marketing around it is loud.

It is not unlimited access. The tools the server exposes are explicit and typed; the assistant can call only what is defined, with the parameters the tool declares. There is no general “ask the assistant to query my CI” capability that bypasses the surface. Eight specific tools is the surface; the assistant composes them to answer any given question. That constraint is what makes the integration auditable: every action the assistant takes is one of a known set, in a shape the protocol can log.

It is also not a magic productivity layer. Connecting MCP does not make code suggestions better in general; it makes the subset of suggestions that depend on pipeline state grounded rather than guessed. For assistants used mostly for code completion, MCP changes very little. For assistants used for triage, optimisation, or any agentic workflow that touches CI, the difference is the gap between guessing and querying.

Where CI/CD Watch fits

CI/CD Watch, a CI/CD observability platform that monitors pipelines across GitHub Actions, GitLab CI, Bitbucket Pipelines, CircleCI, Azure DevOps, and Jenkins, ships the capabilities above as a hosted MCP server today. The web app is the team-facing surface; the MCP server and the CLI are the assistant-facing surfaces, reading the same normalised data.

The eight capabilities map one-to-one to the tools the server exposes: list-runs, list-connections, get-dora-metrics, get-costs, get-performance, list-audit-runs, get-audit-run, and list-audit-findings. Input and return shapes live in the MCP server reference; the CLI binary covers the same surface for agents that prefer it over a protocol.

The integration works on the Free tier so the wiring can be tested end to end. DORA metrics and performance bands are on Free; cost analysis, test-level stability detail, the full audit-findings payload, and bulk Public API access sit on the Team plan and above. For the per-assistant breakdown of how Copilot, Cursor, and Claude Code each use this surface, read CI/CD for Copilot, Cursor, and Claude Code.

Wire it up

Three steps: create an API key in the web app, add a JSON config block to your MCP client pointing at the hosted endpoint with the Bearer token, restart the client. Per-client file paths, key rotation, and tool signatures live in the MCP server reference.

The smallest useful test is one query the assistant could not have answered before: create a Free-tier account, connect one provider, generate an API key, paste the config into your client. The next time the assistant gets confused about a CI failure, ask it to call the audit-findings tool instead of guessing.

CI/CD Watch is built by 3CS Technologies Ltd, a UK consultancy that has run pipeline audits across regulated programmes and now runs the same engine inside the SaaS platform. The MCP server was the first integration surface we shipped after the web app, because we treat AI-assisted software development as a first-class consumer of CI/CD signals.