A repo-agnostic harness for the Claude Agent SDK and Codex SDK.
Drop a harness.config.json into any repo. The harness reads your plans, drives a coding agent through implement → review → fix → ship, and migrates the plan file when the PR merges. No CLI subprocesses. No direct API calls. Just the official SDKs, in-process.
[implement] thread.started
[implement] turn.started
[implement] command.executed: $ rg -n "TODO" src/
[implement] file.changed: [completed] add src/feature.ts
[implement] file.changed: [completed] update src/index.ts
[implement] turn.completed (104680/1095 tok)
{
"ok": true,
"planId": "0001",
"branch": "task/0001-feature",
"phasesRun": ["implement"],
"tokensTotal": { "inputTokens": 104680, ... }
}
What it is
Every path — plan dir, app paths, dev server, e2e command, base branch — is read from harness.config.{json,ts} in your target repo. The harness has no opinion about your layout.
Every harness role — implementer, reviewer, planner, fidelity-checker — runs through @harness/agent-runner, which wraps the official Claude Agent SDK and Codex SDK. No claude / codex CLI subprocesses. No direct OpenAI or Anthropic API calls.
The 1,375-line bash runner is gone. The orchestrator daemon calls runTask() directly. Logs, token tracking, rate-limit detection — all typed, all visible.
How it works
A plan moves through phases. Each phase is its own agent role: a different prompt, a different tool permission set, a different convergence criterion.
┌──────── plan (planDir/NNNN-*.md) ────────┐
│ │
▼ │
┌──────────┐ ┌────────┐ ┌─────┐ ┌────────────┐
│implement │───▶│ review │───▶│ fix │───▶│ preflight │┘
└──────────┘ └────────┘ ◀──┴─────┘ └────────────┘
│ "No blocking findings."
▼
┌─────────────┐ ┌──────────┐
│ prepare-pr │───▶│ open PR │
└─────────────┘ └──────────┘
│
▼
┌─────────────────┐
│ wait for CI │
│ + e2e-verify │
└─────────────────┘
│
▼
┌─────────────────┐
│ merge-readiness │
│ → label + merge │
└─────────────────┘
Implementer / fix / prepare-pr. Full tools, file edits, command execution. Bypasses tool prompts.
Reviewer / merge-checker / fidelity. Read-only repo access. Sentinels gate convergence.
Planner / site-reverse JSON output. Single-turn, no tools by default. Optional output schema.
Quick start
{
"agent": {
"provider": "claude",
"maxReviewPasses": 5
},
"appPaths": ["src", "app"],
"baseBranch": "main",
"devServer": {
"command": "npm run dev",
"url": "http://localhost:3000"
},
"e2e": {
"command": "npm run e2e",
"artifactDirs": [
"playwright-report",
"test-results"
]
}
}
# One-shot: drive a single plan HARNESS_TARGET_REPO=/path/to/your-app \ npm run harness -- run 0001 --phase all # Long-running: the daemon picks up # new plans + observes merges HARNESS_TARGET_REPO=/path/to/your-app \ npm run harness -- daemon start # Operator surface (port 4500) curl localhost:4500/state curl -X POST localhost:4500/freeze curl -X POST localhost:4500/resume
Both providers (claude and codex) implement all three modes. Switch by editing one config field; no code change in your harness or your target repo.
Why this shape
The harness should outlive any one model. Replacing the SDK is a config flag, not a rewrite. That forced the agent-runner abstraction.
The harness is its own product. Coupling it to one target repo's layout meant every fork rewrote half the runner. That forced harness.config.
Bash hid bugs behind exit codes and tee'd logs. Every weird daemon failure traced back to a string-parse on stdout. Porting to TypeScript made these typed and visible.
Reference
Design notes — north star, the loop, convergence rules, daemon architecture, lessons.
Full Zod schema, two complete examples (Next.js+Playwright, Codex+Python), env-var override table.
Three-mode contract (exec / review / complete), CompletionClient adapter, provider extension guide.
MIT licensed. TypeScript monorepo. Node 22+.