From 8e95a8714b0e6dd2310b04cd015f3589c8b88036 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 13:55:55 +0200
Subject: [PATCH 01/10] docs: plan coding-agent target runtimes

---
 ...03-coding-agent-target-runtime-contract.md | 313 ++++++++++++++++++
 1 file changed, 313 insertions(+)
 create mode 100644 docs/plans/2026-07-03-coding-agent-target-runtime-contract.md

diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
new file mode 100644
index 000000000..17de8f900
--- /dev/null
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -0,0 +1,313 @@
+---
+artifact_contract: ce-unified-plan/v1
+artifact_readiness: implementation-ready
+product_contract_source: av-vrx8-research
+execution: code
+title: "Coding-agent target runtime contract"
+created_at: 2026-07-03
+type: feature
+bead: av-y7eq
+---
+
+# Coding-agent target runtime contract
+
+## Goal Capsule
+
+- **Objective:** Make AgentV's coding-agent targets reliable by default while
+  preserving rich transcripts and local "run the agent I use" workflows.
+- **Core decision:** Target authoring uses the compact shape
+  `label` + `provider` + `runtime` + `config`. SDK-backed coding-agent
+  providers, when retained, default to internal process isolation rather than
+  importing risky agent SDKs in the AgentV orchestrator process.
+- **Primary Bead:** `av-y7eq`
+- **Implementation Beads:** `av-y7eq.1` through `av-y7eq.5`; existing SDK
+  subprocess follow-up `av-57i` / `av-57i.1`.
+- **Non-goal:** Do not replace AgentV with Promptfoo, Symphony, Kata, Margin, or
+  Vercel agent-eval. Borrow their proven boundaries and keep AgentV's
+  repo-native run bundle model.
+
+## Summary
+
+AgentV should treat coding-agent targets as external runtimes to orchestrate,
+not as libraries to call in-process by default. The default path should be
+subprocess, protocol, or sandbox based:
+
+- Codex: `codex-app-server` first for rich protocol control, `codex-cli` as the
+  simpler process-boundary path, `codex-sdk` explicit and internally isolated.
+- Pi: `pi-rpc` or `pi-cli` first, following Kata's `pi --mode rpc` pattern;
+  `pi-coding-agent`/`pi-sdk` explicit and internally isolated if retained.
+- Claude: `claude-cli` first; `claude-sdk` explicit and internally isolated if
+  retained. There is no separate Claude app-server/RPC surface identified.
+- Copilot: prefer CLI/session-log/process-boundary paths where possible;
+  `copilot-sdk` follows the same explicit SDK isolation rule.
+
+The target schema should not expose every implementation detail as a top-level
+field. Runtime placement is a single concept:
+
+```yaml
+targets:
+  - label: codex-local
+    provider: codex-app-server
+    runtime: host
+    config:
+      command: codex
+      model: gpt-5-codex
+```
+
+Expanded form is used only when needed:
+
+```yaml
+targets:
+  - label: codex-clean
+    provider: codex-cli
+    runtime:
+      mode: profile
+      home: .agentv/profiles/codex-clean
+    config:
+      command: codex
+      model: gpt-5-codex
+```
+
+```yaml
+targets:
+  - label: pi-rpc-local
+    provider: pi-rpc
+    runtime: host
+    config:
+      command: pi
+      model: gpt-5-codex
+```
+
+## Product Contract
+
+### Stable Fields
+
+| Field | Meaning |
+| --- | --- |
+| `label` | Human and result identity for the target. Used by CLI selection, run artifacts, Dashboard, and comparisons. |
+| `provider` | Adapter/control protocol kind: `codex-cli`, `codex-app-server`, `codex-sdk`, `pi-cli`, `pi-rpc`, `claude-cli`, `claude-sdk`, etc. |
+| `runtime` | Where and how the provider runs: `host`, `profile`, or `sandbox`. May be a string shorthand or an object with `mode`. |
+| `config` | Provider-specific configuration. Keep `model`, `command`, timeouts, permission flags, and provider knobs here. |
+
+Do not add competing top-level fields such as `isolation`, `sandbox`,
+`install`, `container`, `environment`, or `profile`. Those details live under
+`runtime` or `config` only when a provider needs them.
+
+### Runtime Modes
+
+| Runtime | Boundary | Use case |
+| --- | --- | --- |
+| `host` | User's installed runtime and normal config/auth/skills/plugins. | Local research and "evaluate the exact agent I use." |
+| `profile` | Host process execution with isolated home/config/env, such as `CODEX_HOME`, `HOME`, temp dirs, and explicit auth profile. | Cleaner local evals without full container cost. |
+| `sandbox` | Separate execution substrate such as Docker, Vercel Sandbox, remote worker, or another container/sandbox backend. | CI, reproducibility, untrusted tasks, stronger crash and filesystem containment. |
+
+A sandbox may contain an internal profile, but the top-level runtime remains
+`sandbox` because the execution substrate boundary is stronger than host-side
+config isolation.
+
+### SDK Rule
+
+SDK-backed coding-agent providers are allowed only as explicit provider kinds
+and should default to internal process isolation:
+
+```yaml
+targets:
+  - label: codex-sdk-isolated
+    provider: codex-sdk
+    runtime: host
+    config:
+      model: gpt-5-codex
+```
+
+The YAML should not need an opt-in such as `sdk_isolation: process` for the
+safe path. If AgentV cannot isolate an SDK provider yet, that provider should be
+documented as explicit/non-default or temporarily rejected with an actionable
+message.
+
+The parent AgentV process must not import the risky coding-agent SDK for the
+default safe path. Instead, use a provider child runner:
+
+```text
+AgentV parent
+  -> spawn child runner with target config + provider request JSON
+  <- NDJSON events/logs
+  <- one final ProviderResponse envelope
+  <- child exit status
+```
+
+Failure mapping:
+
+- child nonzero exit before result -> target error
+- malformed child JSON -> target error
+- timeout/cancel -> kill child process group, target timeout error
+- crash after partial transcript -> failed target result with partial logs
+- parent still finalizes `index.jsonl`, summaries, transcripts, and run bundle
+
+## External Pattern Mapping
+
+| Source | Relevant pattern | AgentV decision |
+| --- | --- | --- |
+| Promptfoo | Provider object uses `id`/`label`/`config`; Codex and Claude SDK providers put `model` in `config.model`; direct SDK adapters exist. | Keep `label`/`provider`/`config` ergonomics; keep `model` under `config`; do not make in-process SDK the default. |
+| OpenAI Symphony | Codex app-server subprocess with workspace/session orchestration, approval/sandbox policy, max-turn boundaries, and structured streaming/status. | Use `codex-app-server` as the preferred rich-control Codex provider. |
+| Kata Symphony | Pi is launched as `pi --mode rpc` locally or over SSH and controlled over stdio/RPC; workers must already have the runtime installed. | Add/prefer `pi-rpc` for rich Pi control; do not import Pi coding-agent SDK into AgentV's orchestrator. |
+| Vercel agent-eval | Installs agent CLIs inside ephemeral sandboxes and captures transcripts from CLI JSON/session logs. | `runtime.mode: sandbox` should support managed/pinned CLI install and transcript capture without host config bleed. |
+| Margin Evals | Runs cases in Docker, captures PTY/runtime/control logs, optional ATIF trajectory hooks. | Treat container/sandbox as runtime substrate and preserve logs/trajectories as run artifacts. |
+| SWE-bench | Applies predictions and runs tests inside Docker containers with logs, timeouts, and cleanup. | Keep container details under runtime/harness config, not target identity. |
+| DeepEval | Pytest/metric/tracing loop that coding agents can call, not a coding-agent target orchestrator. | Useful grader/eval-loop reference, not a target runtime model. |
+
+## Provider Contract
+
+### Codex
+
+Use explicit provider kinds:
+
+- `codex-cli`: spawn `codex exec` or a user shim. Capture stdout/stderr, JSONL
+  stream, exit code, final text, and raw logs.
+- `codex-app-server`: spawn `codex app-server` or a user shim plus app-server
+  args. Prefer for rich transcript, turn/session control, cancellation, and
+  structured JSON-RPC events.
+- `codex-sdk`: explicit SDK provider. Internally isolated in a child process if
+  retained.
+
+Do not add `codex-rpc` unless Codex exposes a distinct RPC mode separate from
+app-server. For Codex, app-server is the protocol provider.
+
+`config.command` is the executable or shim, not the provider identity:
+
+```yaml
+targets:
+  - label: codex-personal
+    provider: codex-cli
+    runtime: host
+    config:
+      command: codex-personal
+```
+
+### Pi
+
+Use explicit provider kinds:
+
+- `pi-cli`: simple Pi CLI subprocess and transcript capture.
+- `pi-rpc`: Kata-style protocol subprocess that launches `pi --mode rpc` and
+  controls it over stdio/RPC.
+- `pi-coding-agent` or `pi-sdk`: explicit SDK provider only; internally
+  isolated if retained.
+
+Keep `pi-ai` for plain LLM/model calls. Do not treat `pi-ai` as the coding-agent
+runtime boundary.
+
+### Claude
+
+Use explicit provider kinds:
+
+- `claude-cli`: default subprocess path using structured stream output.
+- `claude-sdk`: explicit SDK provider using `@anthropic-ai/claude-agent-sdk`,
+  internally isolated if retained.
+
+No separate Claude app-server/RPC provider has been identified. The CLI
+structured stream is the subprocess-first rich transcript path. Claude Agent SDK
+may spawn Claude Code internally, but importing the SDK in AgentV still creates
+an in-process adapter risk unless wrapped by a child runner.
+
+### Copilot
+
+Keep provider names explicit by control boundary:
+
+- `copilot-cli`: subprocess/protocol CLI path.
+- `copilot-log`: passive transcript/log replay path.
+- `copilot-sdk`: explicit SDK path, internally isolated if retained.
+
+## Implementation Units
+
+### U1. Target Schema And Docs (`av-y7eq.1`)
+
+- Add `runtime: host` shorthand and `runtime.mode: host | profile | sandbox`.
+- Keep `model` and `command` under `config`.
+- Preserve `label` as target identity and `provider` as adapter/backend kind.
+- Reject invalid runtime modes with focused validation errors.
+- Document why `runtime` is the umbrella field.
+
+### U2. Codex Host/Profile Providers (`av-y7eq.2`)
+
+- Split current ambiguous `codex` registry behavior into explicit
+  `codex-cli`, `codex-app-server`, and `codex-sdk`.
+- Make bare `codex`, if retained at all, alias to the chosen safe default
+  (`codex-app-server`) or reject it during the cleanup. It must not silently
+  select in-process SDK.
+- Support `config.command` shims such as `codex-personal` and `codex-eng`.
+- Implement host/profile environment construction, including deliberate
+  `HOME`, `CODEX_HOME`, temp dirs, and env allowlists for profile mode.
+
+### U3. Sandbox Runtime (`av-y7eq.3`)
+
+- Implement `runtime.mode: sandbox` using the existing or smallest viable
+  sandbox/container substrate.
+- Install or locate the target CLI inside the sandbox with pinned/configurable
+  inputs.
+- Mount only explicit workspace, result, and credential paths.
+- Preserve stdout/stderr/transcript artifacts and distinguish sandbox infra
+  failure from target task failure.
+
+### U4. SDK Provider Isolation (`av-y7eq.4`, `av-57i`, `av-57i.1`)
+
+- Move retained coding-agent SDK providers behind child-runner process
+  boundaries.
+- Start with Pi SDK isolation if that remains the quickest proof slice.
+- Generalize only after the first provider proves the protocol.
+- Do not install broad parent-process exception/EPIPE swallowing.
+
+### U5. Pi RPC Runtime (`av-y7eq.5`)
+
+- Add or document `pi-rpc` as the preferred rich-control Pi provider.
+- Launch `pi --mode rpc` through a process/stdio boundary.
+- Model remote execution after Kata only where AgentV needs it; worker
+  provisioning can remain explicit and out of scope for the first slice.
+- Keep `pi-coding-agent` SDK explicit/non-default.
+
+## Result And Artifact Requirements
+
+Every coding-agent provider must return or fail through a structured result
+envelope. AgentV must preserve:
+
+- target label, provider kind, runtime mode, command, cwd, and model
+- stdout/stderr logs
+- structured event transcript when available
+- final assistant output
+- tool/file-change events when available
+- timeout, cancellation, spawn failure, nonzero exit, malformed output, and
+  crash metadata
+- partial transcript/logs on failure
+
+Target crashes are target results. They must not become AgentV orchestrator
+crashes.
+
+## Open Questions
+
+- Whether to keep a bare `codex` alias at all. If kept, it should resolve to the
+  safe default, not SDK.
+- Whether to rename `pi-coding-agent` to `pi-sdk` during the major cleanup or
+  keep the existing provider name as an explicit legacy SDK provider.
+- Which sandbox substrate should be the first implementation target if existing
+  AgentV runner support is insufficient.
+- How much transcript normalization belongs in provider adapters versus a shared
+  transcript post-processor.
+
+## Validation Plan
+
+- Schema tests for `runtime` shorthand/object forms and invalid values.
+- Provider registry tests proving explicit provider names and safe aliases.
+- Codex CLI/app-server tests for command shims, host/profile env, timeout kill,
+  nonzero exit, malformed output, and transcript capture.
+- Pi RPC tests with a fake `pi --mode rpc` process.
+- SDK child-runner tests for success, child crash before result, child crash
+  after partial events, malformed JSON, timeout, and cancellation.
+- Docs/examples validation after examples are updated.
+- Live provider dogfood before implementation PRs are marked ready, per repo
+  verification rules.
+
+## Handoff
+
+Implementation workers should start with `av-y7eq.1` before provider changes so
+the normalized contract exists. `av-y7eq.2` and `av-y7eq.5` can then proceed in
+parallel for Codex and Pi subprocess/protocol providers. `av-y7eq.4` should
+coordinate with `av-57i.1` rather than creating a second SDK isolation design.

From ab9cb40b3adddfbf61d14fd6f1f3ce22672b981a Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:04:56 +0200
Subject: [PATCH 02/10] docs: preserve target config in runtime plan

---
 ...03-coding-agent-target-runtime-contract.md | 63 ++++++++++++++++++-
 1 file changed, 61 insertions(+), 2 deletions(-)

diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
index 17de8f900..8352a5649 100644
--- a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -51,6 +51,7 @@ targets:
     runtime: host
     config:
       command: codex
+      args: ["--config", "model_reasoning_effort=high"]
       model: gpt-5-codex
 ```
 
@@ -65,6 +66,7 @@ targets:
       home: .agentv/profiles/codex-clean
     config:
       command: codex
+      args: ["--sandbox", "workspace-write"]
       model: gpt-5-codex
 ```
 
@@ -87,12 +89,57 @@ targets:
 | `label` | Human and result identity for the target. Used by CLI selection, run artifacts, Dashboard, and comparisons. |
 | `provider` | Adapter/control protocol kind: `codex-cli`, `codex-app-server`, `codex-sdk`, `pi-cli`, `pi-rpc`, `claude-cli`, `claude-sdk`, etc. |
 | `runtime` | Where and how the provider runs: `host`, `profile`, or `sandbox`. May be a string shorthand or an object with `mode`. |
-| `config` | Provider-specific configuration. Keep `model`, `command`, timeouts, permission flags, and provider knobs here. |
+| `config` | Provider-specific configuration. Keep `model`, `command`, `args`, timeouts, permission flags, and provider knobs here. |
 
 Do not add competing top-level fields such as `isolation`, `sandbox`,
 `install`, `container`, `environment`, or `profile`. Those details live under
 `runtime` or `config` only when a provider needs them.
 
+### Preserve Existing AgentV Surface
+
+This plan is a targeted provider-boundary cleanup, not a rewrite from Promptfoo
+or another framework. Preserve AgentV's current target capabilities unless an
+implementation Bead explicitly removes one.
+
+For coding-agent providers, `config.command` is the executable or shim identity,
+such as `codex`, `codex-personal`, `pi`, or an absolute binary path. It may
+also be a non-empty argv array where the first token is the executable and the
+remaining tokens are extra arguments. Normalize that form internally to
+executable plus argv tokens. `config.args` remains the explicit argv token array
+for extra provider-specific arguments. Keep the existing `executable`/`binary`
+compatibility aliases and the existing `args`/`arguments` array aliases during
+migration. Do not require users to pass shell-joined command strings for
+coding-agent providers.
+
+If `config.command` is an argv array, reject simultaneous `config.args` unless
+the provider defines an unambiguous merge order. This keeps command resolution
+predictable and avoids hidden shell parsing.
+
+The generic `provider: cli` path is different: it currently uses a command
+template string with placeholders such as `{PROMPT}` and healthcheck support.
+Keep that compatibility path intact while adding coding-agent-specific runtime
+boundaries.
+
+Also preserve the common and provider-specific knobs already used by AgentV:
+
+- common target behavior: `grader_target`, `fallback_targets`, `workers`,
+  `subagent_mode_allowed`, env interpolation, `cwd`, and `timeout_seconds`
+- artifact/log behavior: `stream_log`, `log_dir`/`log_directory`, stdout/stderr
+  capture, raw protocol events, and partial logs on failure
+- Codex knobs: `model`, `reasoning_effort`/`model_reasoning_effort`,
+  `model_verbosity`, `base_url`/`endpoint`, `api_key`, `api_format`,
+  `sandbox_mode`, `approval_policy`, and `system_prompt`
+- Pi knobs: `subprovider`, `model`/`pi_model`, `api_key`, `base_url`/`endpoint`,
+  `tools`/`pi_tools`, `thinking`/`pi_thinking`, `args`, and `system_prompt`
+- Claude knobs: `model`, `max_turns`, `max_budget_usd`,
+  `bypass_permissions`, and `system_prompt`
+- Copilot knobs: `model`, custom provider settings, GitHub token/auth knobs,
+  ACP/prompt execution behavior, `args`, and `system_prompt`
+
+Where the new normalized contract uses nested `config`, implement migration by
+normalizing the existing flat target fields into that internal shape. Do not
+drop existing accepted YAML fields as a side effect of adding `runtime`.
+
 ### Runtime Modes
 
 | Runtime | Boundary | Use case |
@@ -172,7 +219,9 @@ Use explicit provider kinds:
 Do not add `codex-rpc` unless Codex exposes a distinct RPC mode separate from
 app-server. For Codex, app-server is the protocol provider.
 
-`config.command` is the executable or shim, not the provider identity:
+`config.command` is the executable or shim, not the provider identity. Extra
+arguments may be supplied with `config.args` or, for compact argv-style input,
+as a command array:
 
 ```yaml
 targets:
@@ -181,6 +230,16 @@ targets:
     runtime: host
     config:
       command: codex-personal
+      args: ["--model", "gpt-5-codex"]
+```
+
+```yaml
+targets:
+  - label: codex-eng
+    provider: codex-cli
+    runtime: host
+    config:
+      command: ["codex-eng", "--model", "gpt-5-codex"]
 ```
 
 ### Pi

From 00fc889cdb3241ed3fedadd77c9ffbf17e59b3dd Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:06:13 +0200
Subject: [PATCH 03/10] docs: separate target runtime from scheduler policy

---
 ...2026-07-03-coding-agent-target-runtime-contract.md | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
index 8352a5649..ac8c2098b 100644
--- a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -122,8 +122,8 @@ boundaries.
 
 Also preserve the common and provider-specific knobs already used by AgentV:
 
-- common target behavior: `grader_target`, `fallback_targets`, `workers`,
-  `subagent_mode_allowed`, env interpolation, `cwd`, and `timeout_seconds`
+- common target behavior: `use_target`, `grader_target`, `fallback_targets`,
+  env interpolation, `cwd`, and `timeout_seconds`
 - artifact/log behavior: `stream_log`, `log_dir`/`log_directory`, stdout/stderr
   capture, raw protocol events, and partial logs on failure
 - Codex knobs: `model`, `reasoning_effort`/`model_reasoning_effort`,
@@ -140,6 +140,13 @@ Where the new normalized contract uses nested `config`, implement migration by
 normalizing the existing flat target fields into that internal shape. Do not
 drop existing accepted YAML fields as a side effect of adding `runtime`.
 
+Do not promote orchestration scheduler fields into the new target runtime
+contract. `workers`, `batch_requests`, and `subagent_mode_allowed` are existing
+compatibility/runtime-policy fields, not part of the target's coding-agent
+control boundary. Continue to handle them where AgentV already accepts them,
+but prefer `--workers`, project `execution.workers`, `evaluate_options`, or
+runtime policy for new scheduling behavior.
+
 ### Runtime Modes
 
 | Runtime | Boundary | Use case |

From 9324c29d1682d53d1f6c063488f1337d168596dc Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:08:46 +0200
Subject: [PATCH 04/10] docs: clarify grader target routing

---
 ...03-coding-agent-target-runtime-contract.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
index ac8c2098b..a29a63494 100644
--- a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -147,6 +147,25 @@ control boundary. Continue to handle them where AgentV already accepts them,
 but prefer `--workers`, project `execution.workers`, `evaluate_options`, or
 runtime policy for new scheduling behavior.
 
+`grader_target` is different. It is not a coding-agent runtime field, but the
+concept is not redundant: coding-agent targets usually cannot act as structured
+LLM graders, and AgentV workspaces often contain multiple LLM providers or
+endpoints. AgentV still needs a default grader target selection. Preserve the
+current resolution behavior while cleaning up provider runtimes:
+
+- CLI `--grader-target` is the strongest run-level override.
+- Per-evaluator `target` remains the specific grader override.
+- Target-level `grader_target` remains the compatibility/default grader for
+  that target until a clearer eval/project-level default is introduced.
+- If a new canonical default is added later, prefer a grader/eval policy field
+  such as `default_grader_target` over putting grader selection inside
+  `runtime` or coding-agent provider `config`.
+
+Promptfoo's comparable mechanism is assertion/test grading provider selection:
+assertions can set a `provider`, tests/defaultTest can provide fallback grading
+providers, and model-graded matchers fall back to type-specific default grading
+providers. It does not put grader selection in the target provider runtime.
+
 ### Runtime Modes
 
 | Runtime | Boundary | Use case |

From db5c39b26bba3582f5869437277b23cfff583a97 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:16:04 +0200
Subject: [PATCH 05/10] docs: clarify target registry file layout

---
 ...03-coding-agent-target-runtime-contract.md | 45 +++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
index a29a63494..204a45f40 100644
--- a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -166,6 +166,51 @@ assertions can set a `provider`, tests/defaultTest can provide fallback grading
 providers, and model-graded matchers fall back to type-specific default grading
 providers. It does not put grader selection in the target provider runtime.
 
+### Project File Layout
+
+Keep registries separate from policy:
+
+```text
+.agentv/
+  config.yaml
+  targets.yaml
+  graders.yaml
+```
+
+Project-local `.agentv/config.yaml` should remain the portable project policy
+file: defaults, `execution`, `eval_patterns`, `refs`, tags, result defaults, and
+other run-level settings. It may point at the default target/grader by name, but
+it should not become the registry that holds all target and grader definitions.
+
+`targets.yaml` should remain the registry of subjects under test. `graders.yaml`
+should be the registry of reusable grading providers. This keeps target runtime
+contracts reviewable, keeps grader credentials/endpoints separate from agent
+runtimes, and matches AgentV's existing artifact model where run manifests carry
+explicit `targets_path` and `graders_path` entries.
+
+The global `$AGENTV_HOME/config.yaml` is different: it owns Dashboard/operator
+state such as the `projects:` registry. Do not use the existence of global
+`projects:` as a reason to put project-local target/grader registries into
+project-local `.agentv/config.yaml`. If custom locations are needed, add
+project-local config pointers such as `targets_file` / `graders_file` rather
+than embedding both registries inline.
+
+Greenfield, the cleanest global shape would put Dashboard project registry
+state in `$AGENTV_HOME/projects.yaml` and leave `$AGENTV_HOME/config.yaml` for
+global settings. Current AgentV code and docs use `$AGENTV_HOME/config.yaml`
+with a top-level `projects:` registry, so do not migrate this as part of the
+coding-agent target-runtime work. If the team wants the cleaner split, create a
+separate migration Bead with backwards-compatible reading from current
+`projects:` locations and a clear write target.
+
+Promptfoo's comparable file-structure guidance is simpler: a main
+`promptfooconfig.yaml` commonly contains `providers`, `prompts`, `defaultTest`,
+and `tests`, while larger configs can reference external files such as provider
+YAML with `file://...`. Promptfoo does not have AgentV's separate home-scoped
+Dashboard project registry, so it is useful as a modular-config reference but
+not a direct reason to collapse AgentV's project, target, and grader registries
+into one file.
+
 ### Runtime Modes
 
 | Runtime | Boundary | Use case |

From d1e42e658f067b5034a8b9ecd8d12f8317d5879a Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:19:54 +0200
Subject: [PATCH 06/10] docs: align config references with promptfoo style

---
 ...03-coding-agent-target-runtime-contract.md | 60 ++++++++++++++++---
 1 file changed, 52 insertions(+), 8 deletions(-)

diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
index 204a45f40..986b2f5b0 100644
--- a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -181,6 +181,27 @@ Project-local `.agentv/config.yaml` should remain the portable project policy
 file: defaults, `execution`, `eval_patterns`, `refs`, tags, result defaults, and
 other run-level settings. It may point at the default target/grader by name, but
 it should not become the registry that holds all target and grader definitions.
+Following Promptfoo's modular-config idiom, use direct field references rather
+than a named import table:
+
+```yaml
+# .agentv/config.yaml
+targets: file://targets.yaml
+graders: file://graders.yaml
+
+defaults:
+  target: codex-local
+  grader: openai-grader
+
+execution:
+  workers: 3
+```
+
+Do not introduce a greenfield `files:` or `imports:` section for this unless
+AgentV needs a capability that direct field references cannot express.
+Promptfoo's pattern is `providers: file://configs/providers.yaml`,
+`tests: file://tests/`, and `defaultTest: file://configs/default-test.yaml`;
+the field being configured names the thing being loaded.
 
 `targets.yaml` should remain the registry of subjects under test. `graders.yaml`
 should be the registry of reusable grading providers. This keeps target runtime
@@ -188,20 +209,43 @@ contracts reviewable, keeps grader credentials/endpoints separate from agent
 runtimes, and matches AgentV's existing artifact model where run manifests carry
 explicit `targets_path` and `graders_path` entries.
 
+For Promptfoo-style field references, the referenced file should contain the
+value for that field. Greenfield examples:
+
+```yaml
+# .agentv/targets.yaml
+- id: codex-local
+  provider: codex-app-server
+  runtime: host
+  config:
+    command: ["codex"]
+```
+
+```yaml
+# .agentv/graders.yaml
+- id: openai-grader
+  provider: openai
+  config:
+    model: gpt-5-mini
+```
+
+For compatibility with AgentV's existing standalone `targets.yaml` convention,
+the loader can also accept wrapped forms such as `targets: [...]` and
+`graders: [...]`, but the Promptfoo-like authored shape is the bare field value.
+
 The global `$AGENTV_HOME/config.yaml` is different: it owns Dashboard/operator
 state such as the `projects:` registry. Do not use the existence of global
 `projects:` as a reason to put project-local target/grader registries into
-project-local `.agentv/config.yaml`. If custom locations are needed, add
-project-local config pointers such as `targets_file` / `graders_file` rather
-than embedding both registries inline.
+project-local `.agentv/config.yaml`.
 
 Greenfield, the cleanest global shape would put Dashboard project registry
 state in `$AGENTV_HOME/projects.yaml` and leave `$AGENTV_HOME/config.yaml` for
-global settings. Current AgentV code and docs use `$AGENTV_HOME/config.yaml`
-with a top-level `projects:` registry, so do not migrate this as part of the
-coding-agent target-runtime work. If the team wants the cleaner split, create a
-separate migration Bead with backwards-compatible reading from current
-`projects:` locations and a clear write target.
+global settings. If using Promptfoo-style references, the global config would
+say `projects: file://projects.yaml`. Current AgentV code and docs use
+`$AGENTV_HOME/config.yaml` with a top-level `projects:` registry, so do not
+migrate this as part of the coding-agent target-runtime work. If the team wants
+the cleaner split, create a separate migration Bead with backwards-compatible
+reading from current `projects:` locations and a clear write target.
 
 Promptfoo's comparable file-structure guidance is simpler: a main
 `promptfooconfig.yaml` commonly contains `providers`, `prompts`, `defaultTest`,

From 5eff05b57f724f0ae1b96e1380b3625ab3a96eea Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:23:32 +0200
Subject: [PATCH 07/10] docs: make target runtime plan greenfield

---
 ...03-coding-agent-target-runtime-contract.md | 181 ++++++++----------
 1 file changed, 79 insertions(+), 102 deletions(-)

diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
index 986b2f5b0..a7ddfa392 100644
--- a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -16,7 +16,7 @@ bead: av-y7eq
 - **Objective:** Make AgentV's coding-agent targets reliable by default while
   preserving rich transcripts and local "run the agent I use" workflows.
 - **Core decision:** Target authoring uses the compact shape
-  `label` + `provider` + `runtime` + `config`. SDK-backed coding-agent
+  `id` + `provider` + `runtime` + `config`. SDK-backed coding-agent
   providers, when retained, default to internal process isolation rather than
   importing risky agent SDKs in the AgentV orchestrator process.
 - **Primary Bead:** `av-y7eq`
@@ -46,12 +46,11 @@ field. Runtime placement is a single concept:
 
 ```yaml
 targets:
-  - label: codex-local
+  - id: codex-local
     provider: codex-app-server
     runtime: host
     config:
-      command: codex
-      args: ["--config", "model_reasoning_effort=high"]
+      command: ["codex", "--config", "model_reasoning_effort=high"]
       model: gpt-5-codex
 ```
 
@@ -59,24 +58,23 @@ Expanded form is used only when needed:
 
 ```yaml
 targets:
-  - label: codex-clean
+  - id: codex-clean
     provider: codex-cli
     runtime:
       mode: profile
       home: .agentv/profiles/codex-clean
     config:
-      command: codex
-      args: ["--sandbox", "workspace-write"]
+      command: ["codex", "--sandbox", "workspace-write"]
       model: gpt-5-codex
 ```
 
 ```yaml
 targets:
-  - label: pi-rpc-local
+  - id: pi-rpc-local
     provider: pi-rpc
     runtime: host
     config:
-      command: pi
+      command: ["pi"]
       model: gpt-5-codex
 ```
 
@@ -86,80 +84,64 @@ targets:
 
 | Field | Meaning |
 | --- | --- |
-| `label` | Human and result identity for the target. Used by CLI selection, run artifacts, Dashboard, and comparisons. |
+| `id` | Stable target identity. Used by CLI selection, run artifacts, Dashboard, and comparisons. |
 | `provider` | Adapter/control protocol kind: `codex-cli`, `codex-app-server`, `codex-sdk`, `pi-cli`, `pi-rpc`, `claude-cli`, `claude-sdk`, etc. |
 | `runtime` | Where and how the provider runs: `host`, `profile`, or `sandbox`. May be a string shorthand or an object with `mode`. |
-| `config` | Provider-specific configuration. Keep `model`, `command`, `args`, timeouts, permission flags, and provider knobs here. |
+| `config` | Provider-specific configuration. Keep `model`, `command`, timeouts, permission flags, and provider knobs here. |
 
 Do not add competing top-level fields such as `isolation`, `sandbox`,
 `install`, `container`, `environment`, or `profile`. Those details live under
 `runtime` or `config` only when a provider needs them.
 
-### Preserve Existing AgentV Surface
-
-This plan is a targeted provider-boundary cleanup, not a rewrite from Promptfoo
-or another framework. Preserve AgentV's current target capabilities unless an
-implementation Bead explicitly removes one.
-
-For coding-agent providers, `config.command` is the executable or shim identity,
-such as `codex`, `codex-personal`, `pi`, or an absolute binary path. It may
-also be a non-empty argv array where the first token is the executable and the
-remaining tokens are extra arguments. Normalize that form internally to
-executable plus argv tokens. `config.args` remains the explicit argv token array
-for extra provider-specific arguments. Keep the existing `executable`/`binary`
-compatibility aliases and the existing `args`/`arguments` array aliases during
-migration. Do not require users to pass shell-joined command strings for
-coding-agent providers.
-
-If `config.command` is an argv array, reject simultaneous `config.args` unless
-the provider defines an unambiguous merge order. This keeps command resolution
-predictable and avoids hidden shell parsing.
-
-The generic `provider: cli` path is different: it currently uses a command
-template string with placeholders such as `{PROMPT}` and healthcheck support.
-Keep that compatibility path intact while adding coding-agent-specific runtime
-boundaries.
-
-Also preserve the common and provider-specific knobs already used by AgentV:
-
-- common target behavior: `use_target`, `grader_target`, `fallback_targets`,
-  env interpolation, `cwd`, and `timeout_seconds`
-- artifact/log behavior: `stream_log`, `log_dir`/`log_directory`, stdout/stderr
-  capture, raw protocol events, and partial logs on failure
-- Codex knobs: `model`, `reasoning_effort`/`model_reasoning_effort`,
-  `model_verbosity`, `base_url`/`endpoint`, `api_key`, `api_format`,
-  `sandbox_mode`, `approval_policy`, and `system_prompt`
-- Pi knobs: `subprovider`, `model`/`pi_model`, `api_key`, `base_url`/`endpoint`,
-  `tools`/`pi_tools`, `thinking`/`pi_thinking`, `args`, and `system_prompt`
-- Claude knobs: `model`, `max_turns`, `max_budget_usd`,
-  `bypass_permissions`, and `system_prompt`
-- Copilot knobs: `model`, custom provider settings, GitHub token/auth knobs,
-  ACP/prompt execution behavior, `args`, and `system_prompt`
-
-Where the new normalized contract uses nested `config`, implement migration by
-normalizing the existing flat target fields into that internal shape. Do not
-drop existing accepted YAML fields as a side effect of adding `runtime`.
-
-Do not promote orchestration scheduler fields into the new target runtime
-contract. `workers`, `batch_requests`, and `subagent_mode_allowed` are existing
-compatibility/runtime-policy fields, not part of the target's coding-agent
-control boundary. Continue to handle them where AgentV already accepts them,
-but prefer `--workers`, project `execution.workers`, `evaluate_options`, or
-runtime policy for new scheduling behavior.
-
-`grader_target` is different. It is not a coding-agent runtime field, but the
-concept is not redundant: coding-agent targets usually cannot act as structured
-LLM graders, and AgentV workspaces often contain multiple LLM providers or
-endpoints. AgentV still needs a default grader target selection. Preserve the
-current resolution behavior while cleaning up provider runtimes:
-
-- CLI `--grader-target` is the strongest run-level override.
-- Per-evaluator `target` remains the specific grader override.
-- Target-level `grader_target` remains the compatibility/default grader for
-  that target until a clearer eval/project-level default is introduced.
-- If a new canonical default is added later, prefer a grader/eval policy field
-  such as `default_grader_target` over putting grader selection inside
-  `runtime` or coding-agent provider `config`.
+### Clean Contract
+
+This plan assumes a breaking cleanup. Do not preserve legacy target aliases or
+compatibility-only fields in the new authored contract.
+
+For process-backed coding-agent providers, `config.command` is a non-empty argv
+array. The first token is the executable or shim, such as `codex`,
+`codex-personal`, `pi`, or an absolute binary path. Remaining tokens are extra
+arguments. Do not add separate `args`, `arguments`, `executable`, or `binary`
+fields to the new contract.
+
+```yaml
+targets: file://targets.yaml
+graders: file://graders.yaml
+
+defaults:
+  target: codex-local
+  grader: openai-grader
+```
+
+```yaml
+# targets.yaml
+- id: codex-local
+  provider: codex-app-server
+  runtime: host
+  config:
+    command: ["codex", "--config", "model_reasoning_effort=high"]
+    model: gpt-5-codex
+```
+
+Keep provider-specific knobs under `config`, using one canonical name per
+concept. Examples:
+
+- common target runtime config: `command`, `model`, `cwd`, `timeout_seconds`,
+  `system_prompt`, `stream_log`, `log_dir`
+- Codex config: `reasoning_effort`, `model_verbosity`, `base_url`, `api_key`,
+  `api_format`, `sandbox_mode`, `approval_policy`
+- Pi config: `subprovider`, `tools`, `thinking`
+- Claude config: `max_turns`, `max_budget_usd`, `bypass_permissions`
+- Copilot config: custom provider/auth settings and ACP/prompt mode settings
+
+Orchestration policy is not target runtime config. Keep `workers`, batching,
+retry policy, and subagent dispatch under project/run policy such as
+`execution`, not inside target definitions.
+
+Grader selection is a separate registry/default concern. Do not put
+`grader_target` on targets in the clean schema. Use `defaults.grader` for the
+project default, CLI `--grader` / `--grader-target` for run override, and
+per-evaluator `target` for a specific grader override.
 
 Promptfoo's comparable mechanism is assertion/test grading provider selection:
 assertions can set a `provider`, tests/defaultTest can provide fallback grading
@@ -229,9 +211,9 @@ value for that field. Greenfield examples:
     model: gpt-5-mini
 ```
 
-For compatibility with AgentV's existing standalone `targets.yaml` convention,
-the loader can also accept wrapped forms such as `targets: [...]` and
-`graders: [...]`, but the Promptfoo-like authored shape is the bare field value.
+Do not accept wrapped forms such as `targets: [...]` inside a file already
+loaded through `targets: file://targets.yaml`. The referenced file is the field
+value.
 
 The global `$AGENTV_HOME/config.yaml` is different: it owns Dashboard/operator
 state such as the `projects:` registry. Do not use the existence of global
@@ -241,11 +223,10 @@ project-local `.agentv/config.yaml`.
 Greenfield, the cleanest global shape would put Dashboard project registry
 state in `$AGENTV_HOME/projects.yaml` and leave `$AGENTV_HOME/config.yaml` for
 global settings. If using Promptfoo-style references, the global config would
-say `projects: file://projects.yaml`. Current AgentV code and docs use
-`$AGENTV_HOME/config.yaml` with a top-level `projects:` registry, so do not
-migrate this as part of the coding-agent target-runtime work. If the team wants
-the cleaner split, create a separate migration Bead with backwards-compatible
-reading from current `projects:` locations and a clear write target.
+say `projects: file://projects.yaml`.
+
+Do not add `dashboard.app_name` or other user-configurable AgentV branding to
+the clean config contract. Dashboard product identity is not project policy.
 
 Promptfoo's comparable file-structure guidance is simpler: a main
 `promptfooconfig.yaml` commonly contains `providers`, `prompts`, `defaultTest`,
@@ -274,7 +255,7 @@ and should default to internal process isolation:
 
 ```yaml
 targets:
-  - label: codex-sdk-isolated
+  - id: codex-sdk-isolated
     provider: codex-sdk
     runtime: host
     config:
@@ -309,7 +290,7 @@ Failure mapping:
 
 | Source | Relevant pattern | AgentV decision |
 | --- | --- | --- |
-| Promptfoo | Provider object uses `id`/`label`/`config`; Codex and Claude SDK providers put `model` in `config.model`; direct SDK adapters exist. | Keep `label`/`provider`/`config` ergonomics; keep `model` under `config`; do not make in-process SDK the default. |
+| Promptfoo | Provider object uses `id` plus optional `label` and `config`; Codex and Claude SDK providers put `model` in `config.model`; direct SDK adapters exist. | Use `id` for stable identity, keep `provider`/`config` ergonomics, keep `model` under `config`, and do not make in-process SDK the default. |
 | OpenAI Symphony | Codex app-server subprocess with workspace/session orchestration, approval/sandbox policy, max-turn boundaries, and structured streaming/status. | Use `codex-app-server` as the preferred rich-control Codex provider. |
 | Kata Symphony | Pi is launched as `pi --mode rpc` locally or over SSH and controlled over stdio/RPC; workers must already have the runtime installed. | Add/prefer `pi-rpc` for rich Pi control; do not import Pi coding-agent SDK into AgentV's orchestrator. |
 | Vercel agent-eval | Installs agent CLIs inside ephemeral sandboxes and captures transcripts from CLI JSON/session logs. | `runtime.mode: sandbox` should support managed/pinned CLI install and transcript capture without host config bleed. |
@@ -334,23 +315,21 @@ Use explicit provider kinds:
 Do not add `codex-rpc` unless Codex exposes a distinct RPC mode separate from
 app-server. For Codex, app-server is the protocol provider.
 
-`config.command` is the executable or shim, not the provider identity. Extra
-arguments may be supplied with `config.args` or, for compact argv-style input,
-as a command array:
+`config.command` is the argv array for the executable or shim. It is not the
+provider identity:
 
 ```yaml
 targets:
-  - label: codex-personal
+  - id: codex-personal
     provider: codex-cli
     runtime: host
     config:
-      command: codex-personal
-      args: ["--model", "gpt-5-codex"]
+      command: ["codex-personal", "--model", "gpt-5-codex"]
 ```
 
 ```yaml
 targets:
-  - label: codex-eng
+  - id: codex-eng
     provider: codex-cli
     runtime: host
     config:
@@ -397,7 +376,7 @@ Keep provider names explicit by control boundary:
 
 - Add `runtime: host` shorthand and `runtime.mode: host | profile | sandbox`.
 - Keep `model` and `command` under `config`.
-- Preserve `label` as target identity and `provider` as adapter/backend kind.
+- Use `id` as target identity and `provider` as adapter/backend kind.
 - Reject invalid runtime modes with focused validation errors.
 - Document why `runtime` is the umbrella field.
 
@@ -405,9 +384,8 @@ Keep provider names explicit by control boundary:
 
 - Split current ambiguous `codex` registry behavior into explicit
   `codex-cli`, `codex-app-server`, and `codex-sdk`.
-- Make bare `codex`, if retained at all, alias to the chosen safe default
-  (`codex-app-server`) or reject it during the cleanup. It must not silently
-  select in-process SDK.
+- Remove the bare `codex` provider name from the authored clean contract. Users
+  must choose `codex-cli`, `codex-app-server`, or `codex-sdk` explicitly.
 - Support `config.command` shims such as `codex-personal` and `codex-eng`.
 - Implement host/profile environment construction, including deliberate
   `HOME`, `CODEX_HOME`, temp dirs, and env allowlists for profile mode.
@@ -443,7 +421,7 @@ Keep provider names explicit by control boundary:
 Every coding-agent provider must return or fail through a structured result
 envelope. AgentV must preserve:
 
-- target label, provider kind, runtime mode, command, cwd, and model
+- target id, provider kind, runtime mode, command, cwd, and model
 - stdout/stderr logs
 - structured event transcript when available
 - final assistant output
@@ -457,10 +435,8 @@ crashes.
 
 ## Open Questions
 
-- Whether to keep a bare `codex` alias at all. If kept, it should resolve to the
-  safe default, not SDK.
 - Whether to rename `pi-coding-agent` to `pi-sdk` during the major cleanup or
-  keep the existing provider name as an explicit legacy SDK provider.
+  replace the existing provider name with the shorter explicit SDK name.
 - Which sandbox substrate should be the first implementation target if existing
   AgentV runner support is insufficient.
 - How much transcript normalization belongs in provider adapters versus a shared
@@ -469,7 +445,8 @@ crashes.
 ## Validation Plan
 
 - Schema tests for `runtime` shorthand/object forms and invalid values.
-- Provider registry tests proving explicit provider names and safe aliases.
+- Provider registry tests proving explicit provider names and no bare `codex`
+  fallback to SDK.
 - Codex CLI/app-server tests for command shims, host/profile env, timeout kill,
   nonzero exit, malformed output, and transcript capture.
 - Pi RPC tests with a fake `pi --mode rpc` process.

From 4b5af83e4fd8f8a06abb2fc0cd726d83dce0ecb2 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:31:35 +0200
Subject: [PATCH 08/10] docs: prefer clean AgentV contracts over peer baggage

---
 .agents/product-boundary.md | 2 ++
 AGENTS.md                   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/.agents/product-boundary.md b/.agents/product-boundary.md
index bb821851d..ec959166c 100644
--- a/.agents/product-boundary.md
+++ b/.agents/product-boundary.md
@@ -85,6 +85,8 @@ Research those references from local cloned repositories first when a clone is a
 
 Treat these as reference inputs, not dependencies. AgentV should adopt the shared lowest common denominator when it fits the repo-native artifact model, and document any intentional divergence in the relevant plan, ADR, or contract docs.
 
+Do not copy another framework's schema baggage just because the framework is credible. When a peer contract carries historical constraints, overloaded field names, or compatibility aliases, prefer a cleaner AgentV contract if it preserves the core user need. Document the reason for diverging so future workers do not "realign" it back to the peer shape. For target/provider contracts, keep identity and backend/control boundary separate: use a stable AgentV `id` for the target registry key when `provider` already names the adapter/backend kind. Promptfoo's `label` is useful evidence but should not be copied as target identity merely because Promptfoo uses `id` for provider/backend specs.
+
 ### 5. YAGNI - You Aren't Gonna Need It
 
 Do not build features until there is a concrete need. Start with the simplest version that satisfies current demand.
diff --git a/AGENTS.md b/AGENTS.md
index 6cb2ce7b6..66d672b35 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -26,6 +26,7 @@ Design guardrails:
 - Document composition patterns before inventing a new feature.
 - Match industry-standard lowest-common-denominator contracts when possible.
 - When designing AgentV contracts, check public reference standards such as Claude Skills, Vercel agent-eval, Hugging Face Datasets, and OpenInference before inventing AgentV-specific shapes. Use their shared lowest common denominator where it fits, and document any intentional divergence.
+- Treat peer frameworks as evidence, not schema authority. Do not inherit baggage such as overloaded field names, compatibility aliases, or framework-specific historical constraints when AgentV can express a cleaner repo-native contract. Example: prefer `id` for stable AgentV target identity when `provider` already names the backend/control boundary, even if Promptfoo uses `label` because its `id` field is overloaded as a provider spec.
 - For peer-framework research, use local cloned repositories and DeepWiki MCP before broad web search. In this operator workspace, Promptfoo is cloned at `/home/entity/projects/promptfoo/promptfoo` and DeepEval is cloned at `/home/entity/projects/confident-ai/deepeval`; use DeepWiki repos `promptfoo/promptfoo` and `confident-ai/deepeval` for architecture-level orientation, then verify exact claims with `rg` and `git` in the local clone. If a public contract must be checked for currentness, use official docs and record the source URL or clone commit behind the conclusion.
 - Apply YAGNI aggressively and solve the current request with the smallest surface that works.
 - Keep extensions non-breaking unless a same-week unreleased surface should be hard-corrected.

From c4131d4941e4953a0185c4f2fa53d32821511a7f Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:45:36 +0200
Subject: [PATCH 09/10] docs: clarify composable AgentV config contract

---
 ...03-coding-agent-target-runtime-contract.md | 133 ++++++++++--------
 1 file changed, 77 insertions(+), 56 deletions(-)

diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
index a7ddfa392..9bdffda4e 100644
--- a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -105,24 +105,25 @@ arguments. Do not add separate `args`, `arguments`, `executable`, or `binary`
 fields to the new contract.
 
 ```yaml
-targets: file://targets.yaml
-graders: file://graders.yaml
+targets:
+  - id: codex-local
+    provider: codex-app-server
+    runtime: host
+    config:
+      command: ["codex", "--config", "model_reasoning_effort=high"]
+      model: gpt-5-codex
+
+graders:
+  - id: openai-grader
+    provider: openai
+    config:
+      model: gpt-5-mini
 
 defaults:
   target: codex-local
   grader: openai-grader
 ```
 
-```yaml
-# targets.yaml
-- id: codex-local
-  provider: codex-app-server
-  runtime: host
-  config:
-    command: ["codex", "--config", "model_reasoning_effort=high"]
-    model: gpt-5-codex
-```
-
 Keep provider-specific knobs under `config`, using one canonical name per
 concept. Examples:
 
@@ -134,9 +135,11 @@ concept. Examples:
 - Claude config: `max_turns`, `max_budget_usd`, `bypass_permissions`
 - Copilot config: custom provider/auth settings and ACP/prompt mode settings
 
-Orchestration policy is not target runtime config. Keep `workers`, batching,
-retry policy, and subagent dispatch under project/run policy such as
-`execution`, not inside target definitions.
+Orchestration policy is not target runtime config. Keep general eval
+concurrency, batching, retry policy, and subagent dispatch under project/run
+policy such as `execution`, not inside target definitions. Use
+`execution.max_concurrency` for general parallelism. Reserve `workers` for a
+provider-specific config only when that provider truly uses worker processes.
 
 Grader selection is a separate registry/default concern. Do not put
 `grader_target` on targets in the clean schema. Use `defaults.grader` for the
@@ -150,49 +153,70 @@ providers. It does not put grader selection in the target provider runtime.
 
 ### Project File Layout
 
-Keep registries separate from policy:
+Support composable/decomposable configuration. A single `.agentv/config.yaml`
+and split files should be two authoring forms of the same config graph:
 
 ```text
 .agentv/
   config.yaml
-  targets.yaml
-  graders.yaml
 ```
 
-Project-local `.agentv/config.yaml` should remain the portable project policy
-file: defaults, `execution`, `eval_patterns`, `refs`, tags, result defaults, and
-other run-level settings. It may point at the default target/grader by name, but
-it should not become the registry that holds all target and grader definitions.
-Following Promptfoo's modular-config idiom, use direct field references rather
-than a named import table:
+Project-local `.agentv/config.yaml` should be able to hold the full project
+contract: targets, graders, defaults, `execution`, `eval_patterns`, refs, tags,
+result defaults, and other run-level settings. This matches Promptfoo's primary
+authoring model, where `promptfooconfig.yaml` commonly contains providers,
+prompts, tests, defaultTest, and run options in one file.
+
+In other words, `.agentv/config.yaml` can technically contain every supported
+field that an `eval.yaml` can contain. An eval file is a focused, shareable
+slice of the same config graph, while `.agentv/config.yaml` is the project-root
+manifest that can also carry project defaults and policy. Avoid creating two
+competing top-level schemas for "project config" versus "eval config" unless a
+field is intentionally scoped to one of those contexts.
+
+The `.agentv/` folder still matters even though Promptfoo does not have the same
+project/global split. It gives AgentV a conventional project root for automatic
+discovery, checked-in defaults, repo-local policy, result/artifact adjacency,
+and composable config without requiring every command to pass explicit file
+paths. The global AgentV config can provide operator/user defaults across
+projects, while `.agentv/config.yaml` overrides or composes project-specific
+targets, graders, tests, datasets, and execution policy.
 
 ```yaml
 # .agentv/config.yaml
-targets: file://targets.yaml
-graders: file://graders.yaml
+targets:
+  - id: codex-local
+    provider: codex-app-server
+    runtime: host
+    config:
+      command: ["codex"]
+      model: gpt-5-codex
+
+graders:
+  - id: openai-grader
+    provider: openai
+    config:
+      model: gpt-5-mini
 
 defaults:
   target: codex-local
   grader: openai-grader
 
 execution:
-  workers: 3
+  max_concurrency: 3
 ```
 
-Do not introduce a greenfield `files:` or `imports:` section for this unless
-AgentV needs a capability that direct field references cannot express.
-Promptfoo's pattern is `providers: file://configs/providers.yaml`,
-`tests: file://tests/`, and `defaultTest: file://configs/default-test.yaml`;
-the field being configured names the thing being loaded.
-
-`targets.yaml` should remain the registry of subjects under test. `graders.yaml`
-should be the registry of reusable grading providers. This keeps target runtime
-contracts reviewable, keeps grader credentials/endpoints separate from agent
-runtimes, and matches AgentV's existing artifact model where run manifests carry
-explicit `targets_path` and `graders_path` entries.
+For larger projects, generated configs, or secret-splitting workflows, any
+supported config field can be decomposed into a Promptfoo-style direct field
+reference whose target file contains that field's value. Do not introduce a
+greenfield `files:` or `imports:` section unless AgentV needs a capability that
+direct field references cannot express. Promptfoo's pattern is `providers:
+file://configs/providers.yaml`, `tests: file://tests/`, and `defaultTest:
+file://configs/default-test.yaml`; the field being configured names the thing
+being loaded.
 
 For Promptfoo-style field references, the referenced file should contain the
-value for that field. Greenfield examples:
+value for that field. Optional split-file examples:
 
 ```yaml
 # .agentv/targets.yaml
@@ -212,29 +236,26 @@ value for that field. Greenfield examples:
 ```
 
 Do not accept wrapped forms such as `targets: [...]` inside a file already
-loaded through `targets: file://targets.yaml`. The referenced file is the field
-value.
+loaded through `targets: file://targets.yaml`, or `tests: [...]` inside a file
+already loaded through `tests: file://tests.yaml`. The referenced file is the
+field value.
 
-The global `$AGENTV_HOME/config.yaml` is different: it owns Dashboard/operator
-state such as the `projects:` registry. Do not use the existence of global
-`projects:` as a reason to put project-local target/grader registries into
-project-local `.agentv/config.yaml`.
+The global `$AGENTV_HOME/config.yaml` can also use the same direct-field style,
+including inline `projects:` for small installations or `projects:
+file://projects.yaml` for larger registries. Do not add a separate import table
+for global config either.
 
-Greenfield, the cleanest global shape would put Dashboard project registry
-state in `$AGENTV_HOME/projects.yaml` and leave `$AGENTV_HOME/config.yaml` for
-global settings. If using Promptfoo-style references, the global config would
-say `projects: file://projects.yaml`.
+Greenfield, the cleanest default is one readable config graph. Inline and split
+forms should normalize to the same internal shape.
 
 Do not add `dashboard.app_name` or other user-configurable AgentV branding to
 the clean config contract. Dashboard product identity is not project policy.
 
-Promptfoo's comparable file-structure guidance is simpler: a main
-`promptfooconfig.yaml` commonly contains `providers`, `prompts`, `defaultTest`,
-and `tests`, while larger configs can reference external files such as provider
-YAML with `file://...`. Promptfoo does not have AgentV's separate home-scoped
-Dashboard project registry, so it is useful as a modular-config reference but
-not a direct reason to collapse AgentV's project, target, and grader registries
-into one file.
+Promptfoo's comparable file-structure guidance is the closest reference here:
+a main `promptfooconfig.yaml` commonly contains `providers`, `prompts`,
+`defaultTest`, `tests`, and run options, while larger configs can reference
+external files with `file://...`. AgentV should follow that authoring posture
+while keeping cleaner AgentV field names.
 
 ### Runtime Modes
 

From 7c486f06fee9ef3ca846b5ee9505b7ad8417f293 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Fri, 3 Jul 2026 14:49:11 +0200
Subject: [PATCH 10/10] docs: split config contract from target orchestration
 plan

---
 .../2026-07-03-agentv-config-contract.md      | 232 ++++++++++++++++++
 ...03-coding-agent-target-runtime-contract.md | 115 +--------
 2 files changed, 239 insertions(+), 108 deletions(-)
 create mode 100644 docs/plans/2026-07-03-agentv-config-contract.md

diff --git a/docs/plans/2026-07-03-agentv-config-contract.md b/docs/plans/2026-07-03-agentv-config-contract.md
new file mode 100644
index 000000000..41af7ddca
--- /dev/null
+++ b/docs/plans/2026-07-03-agentv-config-contract.md
@@ -0,0 +1,232 @@
+---
+artifact_contract: ce-unified-plan/v1
+artifact_readiness: implementation-ready
+product_contract_source: av-vrx8-research
+execution: code
+title: "AgentV composable config contract"
+created_at: 2026-07-03
+type: feature
+bead: av-y7eq.1
+---
+
+# AgentV composable config contract
+
+## Goal Capsule
+
+- **Objective:** Give AgentV one clean config graph that works as project
+  manifest, eval definition, and composable split-file config without copying
+  Promptfoo's legacy naming baggage.
+- **Core decision:** `.agentv/config.yaml` and `eval.yaml` use the same eval
+  config graph for eval-definition fields. `.agentv/config.yaml` is the
+  project-root manifest and can additionally carry project defaults and policy.
+- **Primary Bead:** `av-y7eq.1`
+- **Related Beads:** `av-y7eq`, `av-y7eq.8`
+- **Non-goal:** Do not create separate competing schemas for project config and
+  eval config unless a field is intentionally scoped to one context.
+
+## Summary
+
+AgentV should have one composable/decomposable config graph.
+
+Small projects can keep everything in `.agentv/config.yaml`. Larger projects can
+split any supported field into a `file://...` reference whose target file
+contains that field's value. Both forms normalize to the same internal shape.
+
+This follows Promptfoo's useful authoring posture without copying all Promptfoo
+field names. Promptfoo commonly lets `promptfooconfig.yaml` contain providers,
+prompts, tests, defaultTest, and run options directly, and also lets those fields
+point at files. AgentV should do the same at the graph level while preserving
+AgentV terms such as targets, graders, projects, and run bundles.
+
+## Contract
+
+### Config Graph
+
+`.agentv/config.yaml` can technically contain every supported field that an
+`eval.yaml` can contain:
+
+```yaml
+targets:
+  - id: codex-local
+    provider: codex-app-server
+    runtime: host
+    config:
+      command: ["codex", "app-server"]
+      model: gpt-5-codex
+
+graders:
+  - id: openai-grader
+    provider: openai
+    config:
+      model: gpt-5-mini
+
+tests:
+  - id: smoke
+    input: "Fix the failing test"
+
+defaults:
+  target: codex-local
+  grader: openai-grader
+
+execution:
+  max_concurrency: 3
+```
+
+An `eval.yaml` is a focused, shareable slice of the same graph. It may contain
+targets, graders, tests/evaluators, datasets, defaults, execution overrides, and
+other eval-definition fields. `.agentv/config.yaml` is the project-root
+manifest, so it may also own persistent project defaults and policy.
+
+### Scope Distinction
+
+The schemas should be shared where the field meaning is shared, but the file
+roles are not identical:
+
+| File | Role |
+| --- | --- |
+| `.agentv/config.yaml` | Project-root manifest. Provides automatic discovery, checked-in defaults, repo-local policy, result/artifact adjacency, and composition against global defaults. |
+| `eval.yaml` | Portable eval slice. Good for sharing, one-off suites, examples, or benchmark-specific overrides. |
+| `$AGENTV_HOME/config.yaml` | User/operator defaults across projects. May include project registry, default result locations, or global provider defaults. |
+
+Do not pretend every field is valid in every context. Project identity,
+Dashboard project registry, and persistent operator defaults belong in
+`.agentv/config.yaml` or global config, not an eval slice. Eval-definition
+fields should remain shared.
+
+### Field References
+
+Any supported config field can be decomposed into a direct `file://...` reference
+whose target file contains that field's value:
+
+```yaml
+targets: file://targets.yaml
+graders: file://graders.yaml
+tests: file://tests.yaml
+
+defaults:
+  target: codex-local
+  grader: openai-grader
+```
+
+Referenced array-valued fields contain a bare array:
+
+```yaml
+# .agentv/targets.yaml
+- id: codex-local
+  provider: codex-app-server
+  runtime: host
+  config:
+    command: ["codex"]
+```
+
+```yaml
+# .agentv/tests.yaml
+- id: smoke
+  input: "Fix the failing test"
+```
+
+Referenced object-valued fields contain a bare object:
+
+```yaml
+# .agentv/defaults.yaml
+target: codex-local
+grader: openai-grader
+```
+
+Do not introduce a separate `files:` or `imports:` table unless AgentV needs a
+capability direct field references cannot express. The field being configured
+names the value being loaded.
+
+Do not accept wrapped forms such as `targets: [...]` inside a file already
+loaded through `targets: file://targets.yaml`, or `tests: [...]` inside a file
+loaded through `tests: file://tests.yaml`. The referenced file is the field
+value.
+
+### Target And Grader Fields
+
+Target objects use:
+
+| Field | Meaning |
+| --- | --- |
+| `id` | Stable AgentV identity for selection, artifacts, dashboard, and comparisons. |
+| `provider` | Adapter/control boundary such as `codex-cli`, `codex-app-server`, `pi-rpc`, `claude-cli`, or `openai`. |
+| `runtime` | Coding-agent execution placement: `host`, `profile`, or `sandbox`. |
+| `config` | Provider-specific configuration such as `model`, `command`, timeouts, env, protocol, and provider knobs. |
+
+Use `defaults.target` and `defaults.grader` for run defaults. Do not put
+`grader_target` on targets.
+
+Use `config.command` as a non-empty argv array for process-backed providers:
+
+```yaml
+config:
+  command: ["codex-personal", "app-server"]
+```
+
+Do not add parallel `args`, `arguments`, `executable`, or `binary` fields in the
+authored contract.
+
+### Execution Policy
+
+Use `execution.max_concurrency` for general eval parallelism:
+
+```yaml
+execution:
+  max_concurrency: 3
+```
+
+Promptfoo evidence checked on 2026-07-03:
+
+- DeepWiki for `promptfoo/promptfoo` reports general concurrency through
+  `evaluateOptions.maxConcurrency`, `commandLineOptions.maxConcurrency`, and
+  CLI `--max-concurrency` / `-j`.
+- Local Promptfoo clone
+  `/home/entity/projects/promptfoo/promptfoo` at
+  `6bfc5a0c7f16f9c4717ac731d276b578e63d0769` verifies that `src/node/doEval.ts`
+  resolves `maxConcurrency` from CLI, `commandLineOptions`, `evaluateOptions`,
+  then default, and that Python `config.workers` is provider-specific in
+  `src/providers/pythonCompletion.ts`.
+
+Therefore, `workers` should not be AgentV's general run-policy field. Reserve it
+for provider-specific config only when a provider truly manages worker
+processes.
+
+## Rejected Baggage
+
+Do not include these in the greenfield authored contract:
+
+- `label` or `name` as target identity.
+- bare ambiguous provider aliases such as `provider: codex`.
+- target-level `grader_target`.
+- user-configurable `dashboard.app_name`.
+- process field variants `executable`, `binary`, `args`, `arguments`.
+- target-level `workers`, batching, retry, or subagent-dispatch controls.
+- compatibility-only wrapper files for direct field refs.
+
+## Implementation Notes
+
+- Implement refs as field-level resolution before schema normalization.
+- Keep wire-format keys `snake_case`; translate to internal TypeScript
+  `camelCase` only at boundaries.
+- Ensure inline and split forms produce identical normalized objects.
+- Validation errors should point to the authored path, including the referenced
+  file path when applicable.
+- Public docs should show both inline and split-file forms, without presenting
+  split files as mandatory.
+- Migration text is unnecessary unless a later decision requires backward
+  compatibility.
+
+## Acceptance Criteria
+
+- `.agentv/config.yaml` can inline targets, graders, tests/evaluators, defaults,
+  and execution policy.
+- `eval.yaml` can contain the same eval-definition fields and normalize through
+  the same schema path.
+- Any supported field can be a `file://...` ref whose file contains that field's
+  value.
+- Inline and split forms normalize identically.
+- Context-scoped fields are validated according to file role, so project/global
+  identity and registry fields do not accidentally become portable eval-slice
+  fields.
+- `execution.max_concurrency` is the general concurrency field.
+- Removed Promptfoo/legacy baggage fields are rejected with focused errors.
diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
index 9bdffda4e..c61d9c0e5 100644
--- a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
+++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md
@@ -20,8 +20,9 @@ bead: av-y7eq
   providers, when retained, default to internal process isolation rather than
   importing risky agent SDKs in the AgentV orchestrator process.
 - **Primary Bead:** `av-y7eq`
-- **Implementation Beads:** `av-y7eq.1` through `av-y7eq.5`; existing SDK
-  subprocess follow-up `av-57i` / `av-57i.1`.
+- **Implementation Beads:** `av-y7eq.2` through `av-y7eq.7`; config contract
+  prerequisite `av-y7eq.1`; existing SDK subprocess follow-up `av-57i` /
+  `av-57i.1`.
 - **Non-goal:** Do not replace AgentV with Promptfoo, Symphony, Kata, Margin, or
   Vercel agent-eval. Borrow their proven boundaries and keep AgentV's
   repo-native run bundle model.
@@ -78,6 +79,10 @@ targets:
       model: gpt-5-codex
 ```
 
+For config graph, file layout, `eval.yaml` relationship, and field-level
+`file://...` references, see
+[AgentV composable config contract](2026-07-03-agentv-config-contract.md).
+
 ## Product Contract
 
 ### Stable Fields
@@ -151,112 +156,6 @@ assertions can set a `provider`, tests/defaultTest can provide fallback grading
 providers, and model-graded matchers fall back to type-specific default grading
 providers. It does not put grader selection in the target provider runtime.
 
-### Project File Layout
-
-Support composable/decomposable configuration. A single `.agentv/config.yaml`
-and split files should be two authoring forms of the same config graph:
-
-```text
-.agentv/
-  config.yaml
-```
-
-Project-local `.agentv/config.yaml` should be able to hold the full project
-contract: targets, graders, defaults, `execution`, `eval_patterns`, refs, tags,
-result defaults, and other run-level settings. This matches Promptfoo's primary
-authoring model, where `promptfooconfig.yaml` commonly contains providers,
-prompts, tests, defaultTest, and run options in one file.
-
-In other words, `.agentv/config.yaml` can technically contain every supported
-field that an `eval.yaml` can contain. An eval file is a focused, shareable
-slice of the same config graph, while `.agentv/config.yaml` is the project-root
-manifest that can also carry project defaults and policy. Avoid creating two
-competing top-level schemas for "project config" versus "eval config" unless a
-field is intentionally scoped to one of those contexts.
-
-The `.agentv/` folder still matters even though Promptfoo does not have the same
-project/global split. It gives AgentV a conventional project root for automatic
-discovery, checked-in defaults, repo-local policy, result/artifact adjacency,
-and composable config without requiring every command to pass explicit file
-paths. The global AgentV config can provide operator/user defaults across
-projects, while `.agentv/config.yaml` overrides or composes project-specific
-targets, graders, tests, datasets, and execution policy.
-
-```yaml
-# .agentv/config.yaml
-targets:
-  - id: codex-local
-    provider: codex-app-server
-    runtime: host
-    config:
-      command: ["codex"]
-      model: gpt-5-codex
-
-graders:
-  - id: openai-grader
-    provider: openai
-    config:
-      model: gpt-5-mini
-
-defaults:
-  target: codex-local
-  grader: openai-grader
-
-execution:
-  max_concurrency: 3
-```
-
-For larger projects, generated configs, or secret-splitting workflows, any
-supported config field can be decomposed into a Promptfoo-style direct field
-reference whose target file contains that field's value. Do not introduce a
-greenfield `files:` or `imports:` section unless AgentV needs a capability that
-direct field references cannot express. Promptfoo's pattern is `providers:
-file://configs/providers.yaml`, `tests: file://tests/`, and `defaultTest:
-file://configs/default-test.yaml`; the field being configured names the thing
-being loaded.
-
-For Promptfoo-style field references, the referenced file should contain the
-value for that field. Optional split-file examples:
-
-```yaml
-# .agentv/targets.yaml
-- id: codex-local
-  provider: codex-app-server
-  runtime: host
-  config:
-    command: ["codex"]
-```
-
-```yaml
-# .agentv/graders.yaml
-- id: openai-grader
-  provider: openai
-  config:
-    model: gpt-5-mini
-```
-
-Do not accept wrapped forms such as `targets: [...]` inside a file already
-loaded through `targets: file://targets.yaml`, or `tests: [...]` inside a file
-already loaded through `tests: file://tests.yaml`. The referenced file is the
-field value.
-
-The global `$AGENTV_HOME/config.yaml` can also use the same direct-field style,
-including inline `projects:` for small installations or `projects:
-file://projects.yaml` for larger registries. Do not add a separate import table
-for global config either.
-
-Greenfield, the cleanest default is one readable config graph. Inline and split
-forms should normalize to the same internal shape.
-
-Do not add `dashboard.app_name` or other user-configurable AgentV branding to
-the clean config contract. Dashboard product identity is not project policy.
-
-Promptfoo's comparable file-structure guidance is the closest reference here:
-a main `promptfooconfig.yaml` commonly contains `providers`, `prompts`,
-`defaultTest`, `tests`, and run options, while larger configs can reference
-external files with `file://...`. AgentV should follow that authoring posture
-while keeping cleaner AgentV field names.
-
 ### Runtime Modes
 
 | Runtime | Boundary | Use case |