diff --git a/.agents/product-boundary.md b/.agents/product-boundary.md index bb821851d..ec959166c 100644 --- a/.agents/product-boundary.md +++ b/.agents/product-boundary.md @@ -85,6 +85,8 @@ Research those references from local cloned repositories first when a clone is a Treat these as reference inputs, not dependencies. AgentV should adopt the shared lowest common denominator when it fits the repo-native artifact model, and document any intentional divergence in the relevant plan, ADR, or contract docs. +Do not copy another framework's schema baggage just because the framework is credible. When a peer contract carries historical constraints, overloaded field names, or compatibility aliases, prefer a cleaner AgentV contract if it preserves the core user need. Document the reason for diverging so future workers do not "realign" it back to the peer shape. For target/provider contracts, keep identity and backend/control boundary separate: use a stable AgentV `id` for the target registry key when `provider` already names the adapter/backend kind. Promptfoo's `label` is useful evidence but should not be copied as target identity merely because Promptfoo uses `id` for provider/backend specs. + ### 5. YAGNI - You Aren't Gonna Need It Do not build features until there is a concrete need. Start with the simplest version that satisfies current demand. diff --git a/AGENTS.md b/AGENTS.md index 6cb2ce7b6..66d672b35 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -26,6 +26,7 @@ Design guardrails: - Document composition patterns before inventing a new feature. - Match industry-standard lowest-common-denominator contracts when possible. - When designing AgentV contracts, check public reference standards such as Claude Skills, Vercel agent-eval, Hugging Face Datasets, and OpenInference before inventing AgentV-specific shapes. Use their shared lowest common denominator where it fits, and document any intentional divergence. +- Treat peer frameworks as evidence, not schema authority. Do not inherit baggage such as overloaded field names, compatibility aliases, or framework-specific historical constraints when AgentV can express a cleaner repo-native contract. Example: prefer `id` for stable AgentV target identity when `provider` already names the backend/control boundary, even if Promptfoo uses `label` because its `id` field is overloaded as a provider spec. - For peer-framework research, use local cloned repositories and DeepWiki MCP before broad web search. In this operator workspace, Promptfoo is cloned at `/home/entity/projects/promptfoo/promptfoo` and DeepEval is cloned at `/home/entity/projects/confident-ai/deepeval`; use DeepWiki repos `promptfoo/promptfoo` and `confident-ai/deepeval` for architecture-level orientation, then verify exact claims with `rg` and `git` in the local clone. If a public contract must be checked for currentness, use official docs and record the source URL or clone commit behind the conclusion. - Apply YAGNI aggressively and solve the current request with the smallest surface that works. - Keep extensions non-breaking unless a same-week unreleased surface should be hard-corrected. diff --git a/docs/plans/2026-07-03-agentv-config-contract.md b/docs/plans/2026-07-03-agentv-config-contract.md new file mode 100644 index 000000000..41af7ddca --- /dev/null +++ b/docs/plans/2026-07-03-agentv-config-contract.md @@ -0,0 +1,232 @@ +--- +artifact_contract: ce-unified-plan/v1 +artifact_readiness: implementation-ready +product_contract_source: av-vrx8-research +execution: code +title: "AgentV composable config contract" +created_at: 2026-07-03 +type: feature +bead: av-y7eq.1 +--- + +# AgentV composable config contract + +## Goal Capsule + +- **Objective:** Give AgentV one clean config graph that works as project + manifest, eval definition, and composable split-file config without copying + Promptfoo's legacy naming baggage. +- **Core decision:** `.agentv/config.yaml` and `eval.yaml` use the same eval + config graph for eval-definition fields. `.agentv/config.yaml` is the + project-root manifest and can additionally carry project defaults and policy. +- **Primary Bead:** `av-y7eq.1` +- **Related Beads:** `av-y7eq`, `av-y7eq.8` +- **Non-goal:** Do not create separate competing schemas for project config and + eval config unless a field is intentionally scoped to one context. + +## Summary + +AgentV should have one composable/decomposable config graph. + +Small projects can keep everything in `.agentv/config.yaml`. Larger projects can +split any supported field into a `file://...` reference whose target file +contains that field's value. Both forms normalize to the same internal shape. + +This follows Promptfoo's useful authoring posture without copying all Promptfoo +field names. Promptfoo commonly lets `promptfooconfig.yaml` contain providers, +prompts, tests, defaultTest, and run options directly, and also lets those fields +point at files. AgentV should do the same at the graph level while preserving +AgentV terms such as targets, graders, projects, and run bundles. + +## Contract + +### Config Graph + +`.agentv/config.yaml` can technically contain every supported field that an +`eval.yaml` can contain: + +```yaml +targets: + - id: codex-local + provider: codex-app-server + runtime: host + config: + command: ["codex", "app-server"] + model: gpt-5-codex + +graders: + - id: openai-grader + provider: openai + config: + model: gpt-5-mini + +tests: + - id: smoke + input: "Fix the failing test" + +defaults: + target: codex-local + grader: openai-grader + +execution: + max_concurrency: 3 +``` + +An `eval.yaml` is a focused, shareable slice of the same graph. It may contain +targets, graders, tests/evaluators, datasets, defaults, execution overrides, and +other eval-definition fields. `.agentv/config.yaml` is the project-root +manifest, so it may also own persistent project defaults and policy. + +### Scope Distinction + +The schemas should be shared where the field meaning is shared, but the file +roles are not identical: + +| File | Role | +| --- | --- | +| `.agentv/config.yaml` | Project-root manifest. Provides automatic discovery, checked-in defaults, repo-local policy, result/artifact adjacency, and composition against global defaults. | +| `eval.yaml` | Portable eval slice. Good for sharing, one-off suites, examples, or benchmark-specific overrides. | +| `$AGENTV_HOME/config.yaml` | User/operator defaults across projects. May include project registry, default result locations, or global provider defaults. | + +Do not pretend every field is valid in every context. Project identity, +Dashboard project registry, and persistent operator defaults belong in +`.agentv/config.yaml` or global config, not an eval slice. Eval-definition +fields should remain shared. + +### Field References + +Any supported config field can be decomposed into a direct `file://...` reference +whose target file contains that field's value: + +```yaml +targets: file://targets.yaml +graders: file://graders.yaml +tests: file://tests.yaml + +defaults: + target: codex-local + grader: openai-grader +``` + +Referenced array-valued fields contain a bare array: + +```yaml +# .agentv/targets.yaml +- id: codex-local + provider: codex-app-server + runtime: host + config: + command: ["codex"] +``` + +```yaml +# .agentv/tests.yaml +- id: smoke + input: "Fix the failing test" +``` + +Referenced object-valued fields contain a bare object: + +```yaml +# .agentv/defaults.yaml +target: codex-local +grader: openai-grader +``` + +Do not introduce a separate `files:` or `imports:` table unless AgentV needs a +capability direct field references cannot express. The field being configured +names the value being loaded. + +Do not accept wrapped forms such as `targets: [...]` inside a file already +loaded through `targets: file://targets.yaml`, or `tests: [...]` inside a file +loaded through `tests: file://tests.yaml`. The referenced file is the field +value. + +### Target And Grader Fields + +Target objects use: + +| Field | Meaning | +| --- | --- | +| `id` | Stable AgentV identity for selection, artifacts, dashboard, and comparisons. | +| `provider` | Adapter/control boundary such as `codex-cli`, `codex-app-server`, `pi-rpc`, `claude-cli`, or `openai`. | +| `runtime` | Coding-agent execution placement: `host`, `profile`, or `sandbox`. | +| `config` | Provider-specific configuration such as `model`, `command`, timeouts, env, protocol, and provider knobs. | + +Use `defaults.target` and `defaults.grader` for run defaults. Do not put +`grader_target` on targets. + +Use `config.command` as a non-empty argv array for process-backed providers: + +```yaml +config: + command: ["codex-personal", "app-server"] +``` + +Do not add parallel `args`, `arguments`, `executable`, or `binary` fields in the +authored contract. + +### Execution Policy + +Use `execution.max_concurrency` for general eval parallelism: + +```yaml +execution: + max_concurrency: 3 +``` + +Promptfoo evidence checked on 2026-07-03: + +- DeepWiki for `promptfoo/promptfoo` reports general concurrency through + `evaluateOptions.maxConcurrency`, `commandLineOptions.maxConcurrency`, and + CLI `--max-concurrency` / `-j`. +- Local Promptfoo clone + `/home/entity/projects/promptfoo/promptfoo` at + `6bfc5a0c7f16f9c4717ac731d276b578e63d0769` verifies that `src/node/doEval.ts` + resolves `maxConcurrency` from CLI, `commandLineOptions`, `evaluateOptions`, + then default, and that Python `config.workers` is provider-specific in + `src/providers/pythonCompletion.ts`. + +Therefore, `workers` should not be AgentV's general run-policy field. Reserve it +for provider-specific config only when a provider truly manages worker +processes. + +## Rejected Baggage + +Do not include these in the greenfield authored contract: + +- `label` or `name` as target identity. +- bare ambiguous provider aliases such as `provider: codex`. +- target-level `grader_target`. +- user-configurable `dashboard.app_name`. +- process field variants `executable`, `binary`, `args`, `arguments`. +- target-level `workers`, batching, retry, or subagent-dispatch controls. +- compatibility-only wrapper files for direct field refs. + +## Implementation Notes + +- Implement refs as field-level resolution before schema normalization. +- Keep wire-format keys `snake_case`; translate to internal TypeScript + `camelCase` only at boundaries. +- Ensure inline and split forms produce identical normalized objects. +- Validation errors should point to the authored path, including the referenced + file path when applicable. +- Public docs should show both inline and split-file forms, without presenting + split files as mandatory. +- Migration text is unnecessary unless a later decision requires backward + compatibility. + +## Acceptance Criteria + +- `.agentv/config.yaml` can inline targets, graders, tests/evaluators, defaults, + and execution policy. +- `eval.yaml` can contain the same eval-definition fields and normalize through + the same schema path. +- Any supported field can be a `file://...` ref whose file contains that field's + value. +- Inline and split forms normalize identically. +- Context-scoped fields are validated according to file role, so project/global + identity and registry fields do not accidentally become portable eval-slice + fields. +- `execution.max_concurrency` is the general concurrency field. +- Removed Promptfoo/legacy baggage fields are rejected with focused errors. diff --git a/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md new file mode 100644 index 000000000..c61d9c0e5 --- /dev/null +++ b/docs/plans/2026-07-03-coding-agent-target-runtime-contract.md @@ -0,0 +1,384 @@ +--- +artifact_contract: ce-unified-plan/v1 +artifact_readiness: implementation-ready +product_contract_source: av-vrx8-research +execution: code +title: "Coding-agent target runtime contract" +created_at: 2026-07-03 +type: feature +bead: av-y7eq +--- + +# Coding-agent target runtime contract + +## Goal Capsule + +- **Objective:** Make AgentV's coding-agent targets reliable by default while + preserving rich transcripts and local "run the agent I use" workflows. +- **Core decision:** Target authoring uses the compact shape + `id` + `provider` + `runtime` + `config`. SDK-backed coding-agent + providers, when retained, default to internal process isolation rather than + importing risky agent SDKs in the AgentV orchestrator process. +- **Primary Bead:** `av-y7eq` +- **Implementation Beads:** `av-y7eq.2` through `av-y7eq.7`; config contract + prerequisite `av-y7eq.1`; existing SDK subprocess follow-up `av-57i` / + `av-57i.1`. +- **Non-goal:** Do not replace AgentV with Promptfoo, Symphony, Kata, Margin, or + Vercel agent-eval. Borrow their proven boundaries and keep AgentV's + repo-native run bundle model. + +## Summary + +AgentV should treat coding-agent targets as external runtimes to orchestrate, +not as libraries to call in-process by default. The default path should be +subprocess, protocol, or sandbox based: + +- Codex: `codex-app-server` first for rich protocol control, `codex-cli` as the + simpler process-boundary path, `codex-sdk` explicit and internally isolated. +- Pi: `pi-rpc` or `pi-cli` first, following Kata's `pi --mode rpc` pattern; + `pi-coding-agent`/`pi-sdk` explicit and internally isolated if retained. +- Claude: `claude-cli` first; `claude-sdk` explicit and internally isolated if + retained. There is no separate Claude app-server/RPC surface identified. +- Copilot: prefer CLI/session-log/process-boundary paths where possible; + `copilot-sdk` follows the same explicit SDK isolation rule. + +The target schema should not expose every implementation detail as a top-level +field. Runtime placement is a single concept: + +```yaml +targets: + - id: codex-local + provider: codex-app-server + runtime: host + config: + command: ["codex", "--config", "model_reasoning_effort=high"] + model: gpt-5-codex +``` + +Expanded form is used only when needed: + +```yaml +targets: + - id: codex-clean + provider: codex-cli + runtime: + mode: profile + home: .agentv/profiles/codex-clean + config: + command: ["codex", "--sandbox", "workspace-write"] + model: gpt-5-codex +``` + +```yaml +targets: + - id: pi-rpc-local + provider: pi-rpc + runtime: host + config: + command: ["pi"] + model: gpt-5-codex +``` + +For config graph, file layout, `eval.yaml` relationship, and field-level +`file://...` references, see +[AgentV composable config contract](2026-07-03-agentv-config-contract.md). + +## Product Contract + +### Stable Fields + +| Field | Meaning | +| --- | --- | +| `id` | Stable target identity. Used by CLI selection, run artifacts, Dashboard, and comparisons. | +| `provider` | Adapter/control protocol kind: `codex-cli`, `codex-app-server`, `codex-sdk`, `pi-cli`, `pi-rpc`, `claude-cli`, `claude-sdk`, etc. | +| `runtime` | Where and how the provider runs: `host`, `profile`, or `sandbox`. May be a string shorthand or an object with `mode`. | +| `config` | Provider-specific configuration. Keep `model`, `command`, timeouts, permission flags, and provider knobs here. | + +Do not add competing top-level fields such as `isolation`, `sandbox`, +`install`, `container`, `environment`, or `profile`. Those details live under +`runtime` or `config` only when a provider needs them. + +### Clean Contract + +This plan assumes a breaking cleanup. Do not preserve legacy target aliases or +compatibility-only fields in the new authored contract. + +For process-backed coding-agent providers, `config.command` is a non-empty argv +array. The first token is the executable or shim, such as `codex`, +`codex-personal`, `pi`, or an absolute binary path. Remaining tokens are extra +arguments. Do not add separate `args`, `arguments`, `executable`, or `binary` +fields to the new contract. + +```yaml +targets: + - id: codex-local + provider: codex-app-server + runtime: host + config: + command: ["codex", "--config", "model_reasoning_effort=high"] + model: gpt-5-codex + +graders: + - id: openai-grader + provider: openai + config: + model: gpt-5-mini + +defaults: + target: codex-local + grader: openai-grader +``` + +Keep provider-specific knobs under `config`, using one canonical name per +concept. Examples: + +- common target runtime config: `command`, `model`, `cwd`, `timeout_seconds`, + `system_prompt`, `stream_log`, `log_dir` +- Codex config: `reasoning_effort`, `model_verbosity`, `base_url`, `api_key`, + `api_format`, `sandbox_mode`, `approval_policy` +- Pi config: `subprovider`, `tools`, `thinking` +- Claude config: `max_turns`, `max_budget_usd`, `bypass_permissions` +- Copilot config: custom provider/auth settings and ACP/prompt mode settings + +Orchestration policy is not target runtime config. Keep general eval +concurrency, batching, retry policy, and subagent dispatch under project/run +policy such as `execution`, not inside target definitions. Use +`execution.max_concurrency` for general parallelism. Reserve `workers` for a +provider-specific config only when that provider truly uses worker processes. + +Grader selection is a separate registry/default concern. Do not put +`grader_target` on targets in the clean schema. Use `defaults.grader` for the +project default, CLI `--grader` / `--grader-target` for run override, and +per-evaluator `target` for a specific grader override. + +Promptfoo's comparable mechanism is assertion/test grading provider selection: +assertions can set a `provider`, tests/defaultTest can provide fallback grading +providers, and model-graded matchers fall back to type-specific default grading +providers. It does not put grader selection in the target provider runtime. + +### Runtime Modes + +| Runtime | Boundary | Use case | +| --- | --- | --- | +| `host` | User's installed runtime and normal config/auth/skills/plugins. | Local research and "evaluate the exact agent I use." | +| `profile` | Host process execution with isolated home/config/env, such as `CODEX_HOME`, `HOME`, temp dirs, and explicit auth profile. | Cleaner local evals without full container cost. | +| `sandbox` | Separate execution substrate such as Docker, Vercel Sandbox, remote worker, or another container/sandbox backend. | CI, reproducibility, untrusted tasks, stronger crash and filesystem containment. | + +A sandbox may contain an internal profile, but the top-level runtime remains +`sandbox` because the execution substrate boundary is stronger than host-side +config isolation. + +### SDK Rule + +SDK-backed coding-agent providers are allowed only as explicit provider kinds +and should default to internal process isolation: + +```yaml +targets: + - id: codex-sdk-isolated + provider: codex-sdk + runtime: host + config: + model: gpt-5-codex +``` + +The YAML should not need an opt-in such as `sdk_isolation: process` for the +safe path. If AgentV cannot isolate an SDK provider yet, that provider should be +documented as explicit/non-default or temporarily rejected with an actionable +message. + +The parent AgentV process must not import the risky coding-agent SDK for the +default safe path. Instead, use a provider child runner: + +```text +AgentV parent + -> spawn child runner with target config + provider request JSON + <- NDJSON events/logs + <- one final ProviderResponse envelope + <- child exit status +``` + +Failure mapping: + +- child nonzero exit before result -> target error +- malformed child JSON -> target error +- timeout/cancel -> kill child process group, target timeout error +- crash after partial transcript -> failed target result with partial logs +- parent still finalizes `index.jsonl`, summaries, transcripts, and run bundle + +## External Pattern Mapping + +| Source | Relevant pattern | AgentV decision | +| --- | --- | --- | +| Promptfoo | Provider object uses `id` plus optional `label` and `config`; Codex and Claude SDK providers put `model` in `config.model`; direct SDK adapters exist. | Use `id` for stable identity, keep `provider`/`config` ergonomics, keep `model` under `config`, and do not make in-process SDK the default. | +| OpenAI Symphony | Codex app-server subprocess with workspace/session orchestration, approval/sandbox policy, max-turn boundaries, and structured streaming/status. | Use `codex-app-server` as the preferred rich-control Codex provider. | +| Kata Symphony | Pi is launched as `pi --mode rpc` locally or over SSH and controlled over stdio/RPC; workers must already have the runtime installed. | Add/prefer `pi-rpc` for rich Pi control; do not import Pi coding-agent SDK into AgentV's orchestrator. | +| Vercel agent-eval | Installs agent CLIs inside ephemeral sandboxes and captures transcripts from CLI JSON/session logs. | `runtime.mode: sandbox` should support managed/pinned CLI install and transcript capture without host config bleed. | +| Margin Evals | Runs cases in Docker, captures PTY/runtime/control logs, optional ATIF trajectory hooks. | Treat container/sandbox as runtime substrate and preserve logs/trajectories as run artifacts. | +| SWE-bench | Applies predictions and runs tests inside Docker containers with logs, timeouts, and cleanup. | Keep container details under runtime/harness config, not target identity. | +| DeepEval | Pytest/metric/tracing loop that coding agents can call, not a coding-agent target orchestrator. | Useful grader/eval-loop reference, not a target runtime model. | + +## Provider Contract + +### Codex + +Use explicit provider kinds: + +- `codex-cli`: spawn `codex exec` or a user shim. Capture stdout/stderr, JSONL + stream, exit code, final text, and raw logs. +- `codex-app-server`: spawn `codex app-server` or a user shim plus app-server + args. Prefer for rich transcript, turn/session control, cancellation, and + structured JSON-RPC events. +- `codex-sdk`: explicit SDK provider. Internally isolated in a child process if + retained. + +Do not add `codex-rpc` unless Codex exposes a distinct RPC mode separate from +app-server. For Codex, app-server is the protocol provider. + +`config.command` is the argv array for the executable or shim. It is not the +provider identity: + +```yaml +targets: + - id: codex-personal + provider: codex-cli + runtime: host + config: + command: ["codex-personal", "--model", "gpt-5-codex"] +``` + +```yaml +targets: + - id: codex-eng + provider: codex-cli + runtime: host + config: + command: ["codex-eng", "--model", "gpt-5-codex"] +``` + +### Pi + +Use explicit provider kinds: + +- `pi-cli`: simple Pi CLI subprocess and transcript capture. +- `pi-rpc`: Kata-style protocol subprocess that launches `pi --mode rpc` and + controls it over stdio/RPC. +- `pi-coding-agent` or `pi-sdk`: explicit SDK provider only; internally + isolated if retained. + +Keep `pi-ai` for plain LLM/model calls. Do not treat `pi-ai` as the coding-agent +runtime boundary. + +### Claude + +Use explicit provider kinds: + +- `claude-cli`: default subprocess path using structured stream output. +- `claude-sdk`: explicit SDK provider using `@anthropic-ai/claude-agent-sdk`, + internally isolated if retained. + +No separate Claude app-server/RPC provider has been identified. The CLI +structured stream is the subprocess-first rich transcript path. Claude Agent SDK +may spawn Claude Code internally, but importing the SDK in AgentV still creates +an in-process adapter risk unless wrapped by a child runner. + +### Copilot + +Keep provider names explicit by control boundary: + +- `copilot-cli`: subprocess/protocol CLI path. +- `copilot-log`: passive transcript/log replay path. +- `copilot-sdk`: explicit SDK path, internally isolated if retained. + +## Implementation Units + +### U1. Target Schema And Docs (`av-y7eq.1`) + +- Add `runtime: host` shorthand and `runtime.mode: host | profile | sandbox`. +- Keep `model` and `command` under `config`. +- Use `id` as target identity and `provider` as adapter/backend kind. +- Reject invalid runtime modes with focused validation errors. +- Document why `runtime` is the umbrella field. + +### U2. Codex Host/Profile Providers (`av-y7eq.2`) + +- Split current ambiguous `codex` registry behavior into explicit + `codex-cli`, `codex-app-server`, and `codex-sdk`. +- Remove the bare `codex` provider name from the authored clean contract. Users + must choose `codex-cli`, `codex-app-server`, or `codex-sdk` explicitly. +- Support `config.command` shims such as `codex-personal` and `codex-eng`. +- Implement host/profile environment construction, including deliberate + `HOME`, `CODEX_HOME`, temp dirs, and env allowlists for profile mode. + +### U3. Sandbox Runtime (`av-y7eq.3`) + +- Implement `runtime.mode: sandbox` using the existing or smallest viable + sandbox/container substrate. +- Install or locate the target CLI inside the sandbox with pinned/configurable + inputs. +- Mount only explicit workspace, result, and credential paths. +- Preserve stdout/stderr/transcript artifacts and distinguish sandbox infra + failure from target task failure. + +### U4. SDK Provider Isolation (`av-y7eq.4`, `av-57i`, `av-57i.1`) + +- Move retained coding-agent SDK providers behind child-runner process + boundaries. +- Start with Pi SDK isolation if that remains the quickest proof slice. +- Generalize only after the first provider proves the protocol. +- Do not install broad parent-process exception/EPIPE swallowing. + +### U5. Pi RPC Runtime (`av-y7eq.5`) + +- Add or document `pi-rpc` as the preferred rich-control Pi provider. +- Launch `pi --mode rpc` through a process/stdio boundary. +- Model remote execution after Kata only where AgentV needs it; worker + provisioning can remain explicit and out of scope for the first slice. +- Keep `pi-coding-agent` SDK explicit/non-default. + +## Result And Artifact Requirements + +Every coding-agent provider must return or fail through a structured result +envelope. AgentV must preserve: + +- target id, provider kind, runtime mode, command, cwd, and model +- stdout/stderr logs +- structured event transcript when available +- final assistant output +- tool/file-change events when available +- timeout, cancellation, spawn failure, nonzero exit, malformed output, and + crash metadata +- partial transcript/logs on failure + +Target crashes are target results. They must not become AgentV orchestrator +crashes. + +## Open Questions + +- Whether to rename `pi-coding-agent` to `pi-sdk` during the major cleanup or + replace the existing provider name with the shorter explicit SDK name. +- Which sandbox substrate should be the first implementation target if existing + AgentV runner support is insufficient. +- How much transcript normalization belongs in provider adapters versus a shared + transcript post-processor. + +## Validation Plan + +- Schema tests for `runtime` shorthand/object forms and invalid values. +- Provider registry tests proving explicit provider names and no bare `codex` + fallback to SDK. +- Codex CLI/app-server tests for command shims, host/profile env, timeout kill, + nonzero exit, malformed output, and transcript capture. +- Pi RPC tests with a fake `pi --mode rpc` process. +- SDK child-runner tests for success, child crash before result, child crash + after partial events, malformed JSON, timeout, and cancellation. +- Docs/examples validation after examples are updated. +- Live provider dogfood before implementation PRs are marked ready, per repo + verification rules. + +## Handoff + +Implementation workers should start with `av-y7eq.1` before provider changes so +the normalized contract exists. `av-y7eq.2` and `av-y7eq.5` can then proceed in +parallel for Codex and Pi subprocess/protocol providers. `av-y7eq.4` should +coordinate with `av-57i.1` rather than creating a second SDK isolation design.