What You Will Learn

  • What GitHub Agentic Workflows are
  • How they differ from traditional GitHub Actions
  • Benefits for Claude Code users
  • What can be automated in a 200K-line SaaS project
  • Key factors in the adoption decision

What Are GitHub Agentic Workflows?

On February 13, 2026, GitHub released this as a technical preview. Co-developed by GitHub Next, Microsoft Research, and Azure Core Upstream, it’s open source under the MIT license.

In short, a mechanism for automatically running AI coding agents on GitHub Actions.

Traditional GitHub Actions strictly define “when X happens, do Y” in YAML. Agentic Workflows write “when X happens, make this kind of judgment” in Markdown. The AI makes the judgment.

Traditional GitHub Actions:
  Event → YAML-defined steps → Deterministic execution

Agentic Workflows:
  Event → Markdown-described objectives → AI judges and executes

How It Works

Workflow Definition

Place Markdown files in .github/workflows/. Markdown, not YAML.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
on:
  issues: opened
permissions:
  contents: read
  issues: write
safe-outputs:
  add-comment: true
  add-labels: true
engine: claude
---

## Issue Triage

When a new issue is created, analyze its content and apply appropriate labels.

## Criteria

- Bug report → `bug` label
- Feature request → `enhancement` label
- Question → `question` label
- Security-related → `security` label + raise priority

## Comments

Leave triage results as a comment.

The frontmatter specifies the trigger, permissions, and AI engine to use. The body describes “what you want done” in natural language.

Compilation and Execution

1
2
3
4
5
# Compile with CLI (generates lock.yml from Markdown)
gh aw compile

# Manual trigger
gh aw run

gh aw compile parses the Markdown and generates a .lock.yml for GitHub Actions. This lock file is the actual workflow that runs. The Markdown is the human-readable specification; the lock.yml is the machine-executable procedure.

Available AI Engines

EngineAuthenticationNotes
Copilot CLIAccount auth tied to Copilot licenseDefault
Claude CodeANTHROPIC_API_KEYRequires Anthropic API key
OpenAI CodexOPENAI_API_KEYRequires OpenAI API key

The ability to choose Claude Code as the engine makes it a natural choice for developers already using Claude Code.

Differences from Traditional GitHub Actions

AspectGitHub Actions (YAML)Agentic Workflows (Markdown)
DefinitionYAML (strict syntax)Markdown (natural language)
Execution natureDeterministic (same input → same output)Non-deterministic (AI judges)
Best suited forBuilds, tests, deploysTriage, reviews, reports
PermissionsSpecified in workflow definitionRead-only by default + safe outputs
Error handlingExplicitly definedAI judges

The important point is that Agentic Workflows are not a replacement for CI/CD. GitHub’s official blog states this explicitly:

Don’t use agentic workflows as a replacement for GitHub Actions YAML workflows for CI/CD.

Builds, tests, and deploys remain with traditional YAML workflows. Agentic Workflows handle “ambiguous tasks” requiring AI judgment. Under the concept of “Continuous AI,” they complement existing CI/CD.

Considering Application to the Saru Project

Current Automation

Saru already has the following automated via GitHub Actions:

WorkflowPurposeType
build-apis.ymlGo lint, unit tests, integration testsCI (YAML)
build-portals.ymlFrontend type-check, lint, buildCI (YAML)
e2e-tests.ymlE2E tests for all portalsCI (YAML)
security-scan.ymlgosec, npm auditCI (YAML)
cross-post.ymlBlog cross-posting to platformsCD (YAML)

These are all deterministic processes — no reason to replace them with Agentic Workflows.

What Could Be Automated with Agentic Workflows

So what can be automated? Let me identify “manual tasks that require judgment.”

1. Automatic Issue Triage

Current state: After creating an issue, I manually add labels and set priority. As a solo developer, I’m the only one doing this.

With Agentic Workflows: Trigger on issue creation to automatically analyze content, apply labels, set priority, and identify related files.

Assessment: Low impact for solo development. Little need to triage issues I wrote myself. Would be effective once the project goes OSS and external issues increase.

2. Automatic CI Failure Investigation

Current state: When CI fails, I read logs, investigate the cause, and fix it. As covered in Part 7, CI stabilization required enormous effort.

With Agentic Workflows: Trigger on CI failure to analyze logs, identify root causes, and automatically create fix PRs.

Assessment: The most compelling use case. Especially for E2E test flaky failures where root cause identification takes time. Even just having AI do the initial investigation would save significant time.

3. Automatic Dependabot PR Triage

Current state: When Dependabot PRs pile up, I review each one individually before merging.

With Agentic Workflows: Trigger on Dependabot PRs to review changes and make judgments: “patch version + tests pass → auto-merge,” “major version → add needs-manual-review label.”

Assessment: Effective. Dependabot PR handling is monotonous yet requires judgment — exactly what Agentic Workflows excel at.

4. Daily Status Report

Current state: None. Development status exists only in my head.

With Agentic Workflows: Auto-generate reports on daily issue/PR status, CI health, and outstanding items.

Assessment: Overkill for solo development. Would be effective for team development or when the project has OSS contributors.

Application Summary

Use CaseImpactPriority
CI failure investigationHigh
Dependabot PR triageMedium
Issue triageLow (solo phase)
Daily status reportLow (solo phase)

Concerns About Adoption

1. Cost

Running Agentic Workflows incurs AI engine API calls.

  • Copilot: ~2 premium requests per execution (agent execution + safe outputs)
  • Claude Code: API billing via ANTHROPIC_API_KEY
  • Codex: API billing via OPENAI_API_KEY

If AI runs on every CI failure, monthly costs become unpredictable. E2E tests especially have many jobs, so failure frequency × API cost must be estimated.

2. Technical Preview Instability

As of February 2026, it’s still a technical preview. GitHub’s official documentation explicitly states “at your own risk.” Too early to integrate into production CI/CD pipelines.

Documentation is still developing — details around Markdown frontmatter specifications and engine configuration require some trial-and-error exploration.

3. Trust in Non-Deterministic Execution

In the CI/CD world, “same input → same output” is a fundamental principle. Agentic Workflows are inherently non-deterministic — AI judgment may differ each time.

Safe outputs and read-only defaults provide safety margins, but handling cases like “AI applied the wrong label” or “created an irrelevant fix PR” becomes necessary.

4. Compatibility with Self-Hosted Runners

Saru runs parallel E2E tests on 15 self-hosted runners. Whether Agentic Workflows function correctly on self-hosted runners is unverified. Official documentation mostly assumes GitHub-hosted runners.

5. Coexistence with Claude Code CLI

This is the most important consideration. Saru already uses Claude Code CLI locally for development. If Claude Code also runs automatically on GitHub, clear role separation becomes essential:

Local development:
  Human + Claude Code CLI → Code implementation, test creation

On GitHub:
  Copilot → PR review (already in use)
  Agentic Workflows → CI failure investigation, triage (under consideration)

Multiple AIs operating on the same repository with different contexts requires clearly defined roles to avoid confusion.

Next Steps

This article stays at the investigation and evaluation level. In the next article, I plan to actually implement Agentic Workflows in the Saru repository and verify:

  • Building a CI failure auto-investigation workflow
  • Execution with the Claude Code engine
  • Operation on self-hosted runners
  • Actual cost measurement

Summary

ItemDetail
What are GitHub Agentic WorkflowsA mechanism for auto-running AI agents on GitHub Actions
Definition methodNatural language in Markdown, not YAML
AI enginesCopilot CLI / Claude Code / OpenAI Codex
Relationship with CI/CDComplement, not replacement (Continuous AI)
Effective use cases for solo devCI failure investigation, Dependabot PR triage
Current judgmentWorth evaluating, but too early for production given technical preview status

Series Articles