Autonomous AI Agents in CI/CD — OpenCode Headless Mode

Imagine you're on-call and it's 2am, your pager goes off because the nightly deployment just broke something. You open your laptop half-awake, bracing yourself for the usual hour of SSH sessions, kubectl, scrolling through journals, cross-referencing recent PRs.

Except someone already did all of that. Logs checked, cluster inspected, recent changes correlated, an RCA already waiting for you in Slack. That someone is a headless AI agent.

Most AI coding tools are interactive, you sit in a terminal or IDE, type a question, get an answer, type another. The human is always in the loop. But what if the AI could just run on its own, no terminal, no human waiting, just a task, access to tools, and the autonomy to figure it out?

That's what headless AI agents are. Programs that reason, use tools, and complete tasks triggered by events in your system, not by a person typing. This blog is about how to build them using OpenCode's headless mode and MCPs.

What is OpenCode?

OpenCode is an open-source AI coding tool that runs in the terminal. Interactive mode is like pair programming, you chat, it edits files, runs commands, etc.

As you start using these tools, you realise that they are not just limited to coding, but you can do a lot more with them. It can act like a single dashboard for all your work like manage tickets, write code, deploy, debug, understand architecture, manage infra and a lot more.

But the important feature here is opencode run:

opencode run --model github-copilot/claude-sonnet-4.6 "Your task here"

One command and no interactive UI. The agent receives the prompt, connects to all configured tool servers, runs an autonomous reasoning loop (think → use tool → read result → think again), and exits when done.

This means you can put an AI agent anywhere you can run a shell command. GitHub Actions, Jenkins, GitLab CI, cron jobs, webhooks, Makefiles. Anywhere! (T&C, your LLM provider is reachable ;))

What is MCP?

MCP — Model Context Protocol — is an open standard for connecting AI models to external tools.

MCP servers expose "tools" over HTTP. The model sees available tools, decides which to call, passes parameters, and receives results. Then it reasons about the results and decides what to do next.

Examples of MCP tools:

issue_manager — CRUD Jira tickets
query_database — run a SQL query
query_actions — read github actions run

MCP is the bridge between "AI that thinks" and "AI that acts".

Combining them: Autonomous agentic workflows

When you combine OpenCode headless with MCP servers, you get something interesting: autonomous workflows that can interact with real systems, reason about what they find, and produce actionable output.

The flow looks like this:

Event (CI failure, cron, webhook, PR opened, stage failure in pipeline)
↓

opencode run --model <model> "task description"
↓
 → connects to MCP Server A (e.g., GitHub)
 → connects to MCP Server B (e.g., target machine)
 → connects to MCP Server C (e.g., DB query)
 → connects to MCP Server D (e.g., Jira)
↓

Agent loop: think → call tool → read result → think → call tool → ⟳
↓

Output (file, Slack message, PR comment, ticket, report)

This is different from traditional automation where you write a bash script that runs a fixed sequence of commands. Here, the "script" is a goal — the agent figures out the steps.

Main Agents: The ones that orchestrate your task

Main agents are the orchestrators, each one owns a specific scenario and knows exactly what to do and how to do it. When you have multiple independent workflows, like nightly failure triage, PR review, or compliance audits, you give each its own main agent rather than cramming everything into one. What makes a good main agent definition:

Scope boundary: what this agent is responsible for, and what it's NOT. "You diagnose CI failures. You do NOT fix them or push code." Without this, agents wander.
Delegation rules: when to hand off. "Need code context, delegate to codebase agent. Need to check known issues, delegate to ticket agent."
Tool access: which MCP servers this agent needs.

They don't carry all the context upfront either, they pull in skills and delegate to sub-agents on the fly based on what the task actually needs.

Skills: Domain knowledge as code

A model with tool access but no context is like giving a new hire root access on day one. They have the ability to do things but don't know what to look for.

Skills solve this. They're markdown files that get loaded into the agent's context, teaching it domain-specific knowledge:

What your system looks like when it's healthy
Common failure patterns and their root causes
Which logs or commands to check first for specific symptoms
How your deployment pipeline works
What the naming conventions are

This is the highest-leverage thing you can do. A mid-tier model with good skills outperforms a frontier model with no context. Encode your team's expertise as markdown.

Sub-agents: Delegation

For complex tasks, the main agent can delegate to specialist sub-agents:

A codebase agent that knows how to search repositories
A ticket agent that looks up related issues in your tracker
A documentation agent that searches your wiki
A debugging agent that has the environment context to debug

Like a senior engineer who knows when to ask someone else instead of figuring everything out alone.

So your .md files would look something like:

.opencode/
├── skills/
│   ├── platform-knowledge/SKILL.md # Architecture, services
│   ├── debugging/SKILL.md          # What to check
│   ├── ci-workflows/SKILL.md       # How your pipeline works
│   └── pr-correlation/SKILL.md     # Trace failures to changes
├── agents/
│   ├── codebase-agent.md           # Searches repositories
│   ├── ticket-agent.md             # Looks up related issues
│   ├── docs-agent.md               # Searches your wiki
│   └── debug-agent.md              # env context to debug
└── agent/
    └── nightly-check.md            # Constraints and rules

Building a custom MCP server

Off-the-shelf MCP servers exist for GitHub, Jira, databases, etc. But for interacting with your own infrastructure, you'll likely build a custom one.

Ours is a Python application in a Docker container that runs on the target machine:

docker run -d \
  --name mcp_server \
  -p 8080:8080 \
  -e MCP_POLICY=secure \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /usr/bin/kubectl:/usr/local/bin/kubectl \
  -v /root/.kube/config:/root/.kube/config:ro \
  -v /var/log/journal:/var/log/journal:ro \
  mcp_server:latest

It exposes tools: execute_command, read_file, list_directory, get_system_info.

The important design choice: command safety classification and safe mounts.

Every command is classified as read-only or destructive:

Read-only (kubectl get pods, docker ps, df -h) → executes immediately
Destructive (kubectl delete, docker stop, rm) → requires explicit confirmation

In headless mode, no human is there to confirm. So destructive commands physically cannot execute. The agent is limited to observation. Safety is enforced at the tool layer, not just the prompt.

OpenCode reads opencode.json from the working directory:

{
  "mcp": {
    "target-system": { // custom MCP
      "type": "remote",
      "url": "http://localhost:8080/sse",
      "enabled": true
    }
  }
}

Safety: Defense in depth

Running AI agents with system access needs multiple safety layers:

System prompt — tells the agent it's in read-only, observation-only mode
MCP server policy — classifies and blocks destructive commands without confirmation
Container isolation — filesystems mounted read-only where possible - if you have a live machine
Network constraints — air-gapped or restricted environments limit blast radius
Hard timeouts — agent cannot run indefinitely consuming resources

No single layer is the sole protection. If the prompt fails, the MCP server catches it. If somehow both fail, the container mounts are read-only for live machines.

Example: How we use it

We operate air-gapped Linux-based Kubernetes clusters. Our nightly CI pipeline installs/upgrades the platform and runs 200+ automated tests. When things fail, an AI analysis job triggers automatically.

The workflow:

Failure detected → analysis workflow triggers
MCP server deployed on target VM
SSH tunnel from CI runner to VM's MCP port
PR changelog generated (all PRs merged since last stable)
opencode run executes with a diagnostic prompt
Agent runs kubectl, reads journals, checks containers, correlates with PRs
Summary posted as Slack thread reply to the original failure notification

- name: Run Analyzer
  timeout-minutes: 15
  run: |
    opencode run --model github-copilot/claude-sonnet-4.6 \
      "Investigate the failure on the connected VM.
       Correlate with PRs in reports/prs_summary.json."

The on-call engineer wakes up to a pre-triaged finding instead of a raw failure notification. Even when the agent is wrong (about 30% of the time), it narrows the search space enough to cut triage time in half.

The agent config lives in a separate repository, cloned at runtime. We iterate on skills and prompts without touching CI YAML. Push a better skill → next pipeline run uses it.

Key lessons

Timeouts are mandatory. LLMs get stuck in loops. Always. If it hasn't converged in 15 minutes, fail loudly.

Skills > model size. Encode "what would a senior engineer check first?" as markdown. This is your highest ROI investment.

Decouple agent config from CI config. You'll iterate on prompts 10x faster than workflow YAML. Separate repos, runtime clone.

Wrong answers still help. "The agent thinks it's etcd-related" is better than "something failed somewhere." It gives the human a starting point.

Correlation is the real value. "System is broken" is obvious. "PR #1432 broke it because ..." is actionable. Always give the agent context about recent changes.

Use cases I can think of where headless agents make sense

PagerDuty Incident response pre-work: By the time the on-call opens their laptop, relevant logs, metrics, and recent changes are already collected.
PR review with live system context: Not just static code review — the agent checks if the proposed change conflicts with actual system state.
Scheduled compliance audits: Check CIS benchmarks, network policies, secret rotation, certificate expiry.
Release documentation: Not just PR titles — actual summaries of what changed, why, and what it affects.
Database migration validation: Migration runs → agent checks schema, data integrity, query plans → reports issues

Getting started

Install OpenCode: curl -fsSL https://opencode.ai/install | bash
Configure MCP servers in opencode.json — start with GitHub MCP, add custom ones as needed
Write skills — markdown files describing your system, common failures, what to check
Build or reuse an MCP server — the mcp is open, a minimal server is ~200 lines of Python
Add to workflows: opencode run --model <model> "your task"

Your pipelines can talk to external infra and reason now. That changes what's automate-able.

In the end, you are just giving hands to your AI models. Be sure you're not handing over a loaded AK47.

Autonomous AI Agents in Workflows/Pipelines using OpenCode Headless Mode

What is OpenCode?

What is MCP?

Combining them: Autonomous agentic workflows

Main Agents: The ones that orchestrate your task

Skills: Domain knowledge as code

Sub-agents: Delegation

Building a custom MCP server

Safety: Defense in depth

Example: How we use it

Key lessons

Use cases I can think of where headless agents make sense

Getting started

Comments

More from this blog

Conversational Agentic Debugging for Linux and Kubernetes Platforms

Cloud Event-Driven architecture

Publishing Custom Metrics from Multi-Pod and Multi-Thread Applications to Grafana in Prometheus Client in Python

Command Palette

What is OpenCode?

What is MCP?

Combining them: Autonomous agentic workflows

Main Agents: The ones that orchestrate your task

Skills: Domain knowledge as code

Sub-agents: Delegation

Building a custom MCP server

Safety: Defense in depth

Example: How we use it

Key lessons

Use cases I can think of where headless agents make sense

Getting started

Comments

More from this blog