From f91f2437a6aeb0ae0c4e147f75b04cd6e0123a95 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 23 Jun 2026 15:42:13 +0000
Subject: [PATCH 1/2] Rely on deepagents' auto-added general-purpose subagent
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The live cascade hand-built the general-purpose `task` subagent as an explicit
`subagents=[…]` spec, restating `interrupt_on` and the model/tools omissions.
But deepagents already auto-adds a general-purpose subagent and derives its
`interrupt_on` from the top-level `create_deep_agent(interrupt_on=…)` we pass —
so the subagent inherits the write-gating, gateway-bound model, and sandboxed
toolset without us declaring it. A delegated write still surfaces at the parent
approval gate (the HITL invariant), now verified against the *auto-added*
subagent.

The only thing worth customizing is its prose: deepagents' default subagent
prompt asks for a "complete answer", but a live voice turn wants a short,
spoken-length summary. `subagents.py` now collapses to a harness profile
(`GeneralPurposeSubagentProfile`) carrying just that prompt + description,
registered under the gateway model's provider so the override survives a
`--model` change. Drops the explicit subagent spec and the redundant
per-subagent `interrupt_on`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LXmmmJ53yfuc3CKemRQi7x
---
 aai_cli/AGENTS.md                     |   2 +-
 aai_cli/agent_cascade/brain.py        |   9 ++-
 aai_cli/agent_cascade/subagents.py    |  59 +++++++++-----
 tests/test_agent_cascade_subagents.py | 110 ++++++++++++++++----------
 4 files changed, 114 insertions(+), 66 deletions(-)

diff --git a/aai_cli/AGENTS.md b/aai_cli/AGENTS.md
index 7efd884a..25354894 100644
--- a/aai_cli/AGENTS.md
+++ b/aai_cli/AGENTS.md
@@ -153,7 +153,7 @@ heavily-reworked commands with long bodies; small commands keep the inline
 - **`streaming/`** + `client.stream_audio` — v3 realtime API. Event callbacks run on the SDK reader thread and guard against `BrokenPipeError` (`stdio.silence_stdout()`) so a closed pipe never dumps a thread traceback.
 - **`core/sync_stt.py`** + **`core/signals.py`** + `commands/dictate/` — `assembly dictate`: headless dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). It needs no terminal: recording starts immediately and `dictate_exec._record` polls `signals.stop_on_terminate` between ~100 ms mic chunks for a SIGTERM, which finishes the utterance (clean exit 0) — so a hotkey tool like Hammerspoon can launch it as a background task and `kill -TERM`/`task:terminate()` to transcribe. SIGINT (Ctrl-C) still cancels (exit 130). Both boundaries (the stop latch, mic, HTTP) are injectable, so the suite never needs a real signal or microphone (`tests/test_dictate_exec.py` scripts the SIGTERM latch). Contrast `signals.terminate_as_interrupt` (used by `stream`/`agent`/`speak`), which routes SIGTERM into the *cancel* path instead.
 - **`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`).
-- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): the engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (on by default; `--no-files` opts out) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`, and binds one gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool; spec in `agent_cascade/subagents.py`, omitting `model`/`tools` so it inherits both) for delegating a focused subtask. The subagent's own `interrupt_on` mirrors `_WRITE_TOOLS`, and a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated.
+- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): the engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (on by default; `--no-files` opts out) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`. The gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool) for delegating a focused subtask is **auto-added by deepagents** — we don't declare it. We only override its prose for a voice turn (a spoken-length summary, not the SDK's "complete answer" default) via a harness profile keyed by the gateway model's provider (`subagents.register_gp_subagent_profile`, called from `build_graph` so the deepagents import stays lazy — and kept off `brain.py`, which sits at the 500-line gate). It inherits the gateway-bound model, the sandboxed toolset, *and* the top-level `interrupt_on` (deepagents' `graph.py` merges the top-level config into the auto-added subagent), so a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` with no per-subagent restatement (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated.
 - **`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:** `session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`). The single-voice default-playback path **streams**: `synthesize`'s `on_audio(chunk, sample_rate)` callback is wired to `audio.PcmPlayer.feed`, so speech starts on the first Audio frame (it opens the device lazily, since the rate is only known at Begin) instead of after the whole text — the win for a long `--url` page. `--out` (needs the full buffer) and the multi-voice dialogue path (`synthesize_dialogue` → `_output_audio` → buffered `play_pcm`) stay buffered; `synthesize` still returns the complete PCM for the summary regardless.
 - **`code_gen/`** — backs `--show-code` on `transcribe`/`stream`/`agent`: builds a ready-to-run Python SDK script from exactly the flags passed (no API key needed; generated code reads `ASSEMBLYAI_API_KEY`).
 - **`auth/`** — browser-assisted `assembly login` via AMS + **Stytch B2B OAuth discovery** (`discovery.py`, `flow.py`, `loopback.py`, `ams.py`). Not Stytch Connected Apps.
diff --git a/aai_cli/agent_cascade/brain.py b/aai_cli/agent_cascade/brain.py
index ead1d7fe..aadb306f 100644
--- a/aai_cli/agent_cascade/brain.py
+++ b/aai_cli/agent_cascade/brain.py
@@ -218,20 +218,19 @@ def _graph_kwargs(
 
     Empty when ``--files`` is off, so the graph is built exactly as before. When on: a real-cwd
     backend, ``interrupt_on`` pausing only the mutating tools for human approval, and an
-    in-memory checkpointer (interrupt/resume needs one). ``backend_factory`` is the test seam.
+    in-memory checkpointer (interrupt/resume needs one). ``backend_factory`` is the test seam. No
+    ``subagents`` key: deepagents auto-adds a general-purpose subagent that inherits this
+    ``interrupt_on`` (so a delegated write surfaces at the same parent gate — see ``subagents.py``).
     """
     if not config.files:
         return {}
     from langgraph.checkpoint.memory import InMemorySaver
 
-    from aai_cli.agent_cascade.subagents import general_purpose_subagent
-
     return {
         "backend": backend_factory(),
         "interrupt_on": dict.fromkeys(_WRITE_TOOLS, True),
         "checkpointer": InMemorySaver(),
         "memory": ["./.deepagents/AGENTS.md"],
-        "subagents": [general_purpose_subagent(dict.fromkeys(_WRITE_TOOLS, True))],
     }
 
 
@@ -270,7 +269,9 @@ def build_graph(
 
     from aai_cli.agent_cascade.mcp_tools import load_mcp_tools
     from aai_cli.agent_cascade.model import build_model
+    from aai_cli.agent_cascade.subagents import register_gp_subagent_profile
 
+    register_gp_subagent_profile()
     model = build_model(
         api_key, model=config.model, max_tokens=config.max_tokens, extra=config.llm_extra
     )
diff --git a/aai_cli/agent_cascade/subagents.py b/aai_cli/agent_cascade/subagents.py
index 0d571610..2eeb5dae 100644
--- a/aai_cli/agent_cascade/subagents.py
+++ b/aai_cli/agent_cascade/subagents.py
@@ -1,34 +1,53 @@
-"""The general-purpose subagent for ``assembly live --files`` (deepagents' ``task`` tool).
+"""Tune deepagents' auto-added general-purpose subagent for ``assembly live`` (the ``task`` tool).
 
-One subagent the live agent delegates a focused multi-step subtask to. It OMITS ``model`` (so it
-inherits the AssemblyAI gateway-bound model — never a ``provider:model`` string) and ``tools`` (so
-it inherits the main sandboxed toolset, keeping its ``execute`` OS-confined). Its ``interrupt_on``
-mirrors the main agent's write tools, so the subagent's own mutations prompt through the same
-approval loop (verified to surface at the parent gate — see the HITL regression test).
+deepagents auto-adds a ``general-purpose`` subagent and derives its ``interrupt_on`` from the
+top-level ``create_deep_agent(interrupt_on=…)``, so we don't declare the subagent ourselves — it
+inherits the gateway-bound model, the sandboxed toolset, *and* the write-gating, and a delegated
+write still surfaces at the *parent* approval gate (locked by ``tests/test_agent_cascade_subagents.py``).
+The only thing we override is its prose: the SDK's default subagent prompt asks for a "complete
+answer", but a live voice turn wants a short, spoken-length summary. We set that (and the
+description) through a harness profile, keeping this off ``brain.py`` (which sits at the 500-line gate).
 """
 
 from __future__ import annotations
 
-_SYSTEM_PROMPT = (
+_GP_SYSTEM_PROMPT = (
     "You are a focused coworker handling one delegated subtask in the user's project. Work in the "
     "current directory, use the available tools to research or make a contained change, and return "
     "a concise, spoken-length summary of what you did or found — not a transcript."
 )
+_GP_DESCRIPTION = (
+    "Delegate a focused multi-step subtask — research, gather context, or implement a "
+    "contained change — and get back a short summary. Keeps the main voice turn lean."
+)
+
+# The harness-profile registry is keyed by model provider/identifier; the gateway model is a
+# ChatOpenAI subclass, so its provider is "openai". We register under the bare provider (not
+# provider:model) so the override still applies when --model overrides the default identifier.
+# Safe to scope this broadly: brain.build_graph is the *only* create_deep_agent call in the CLI.
+_GP_PROFILE_MODEL_PROVIDER = "openai"
 
 
-def general_purpose_subagent(interrupt_on: dict[str, bool]) -> dict[str, object]:
-    """The ``task`` subagent spec: gateway-bound (no ``model``), full sandboxed tools (no ``tools``),
-    with ``interrupt_on`` mirroring the caller's write tools so its mutations stay gated.
+def register_gp_subagent_profile() -> None:
+    """Override the auto-added general-purpose subagent's prompt + description for a voice turn.
 
-    ``interrupt_on`` is a parameter (not a local constant) so this module needn't import
-    ``brain._WRITE_TOOLS`` — that would be a circular import, since ``brain`` imports this.
+    Registers a harness profile that swaps in a spoken-length summary prompt (instead of
+    deepagents' "complete answer" default) and our short description. The subagent keeps
+    inheriting the gateway-bound model, the sandboxed toolset, and the top-level ``interrupt_on``.
+    Idempotent — re-registers the same profile under the same key; ``brain.build_graph`` calls it
+    once per graph build (the deepagents import stays lazy here, off the startup path).
     """
-    return {
-        "name": "general-purpose",
-        "description": (
-            "Delegate a focused multi-step subtask — research, gather context, or implement a "
-            "contained change — and get back a short summary. Keeps the main voice turn lean."
+    from deepagents import (
+        GeneralPurposeSubagentProfile,
+        HarnessProfile,
+        register_harness_profile,
+    )
+
+    register_harness_profile(
+        _GP_PROFILE_MODEL_PROVIDER,
+        HarnessProfile(
+            general_purpose_subagent=GeneralPurposeSubagentProfile(
+                system_prompt=_GP_SYSTEM_PROMPT, description=_GP_DESCRIPTION
+            )
         ),
-        "system_prompt": _SYSTEM_PROMPT,
-        "interrupt_on": interrupt_on,
-    }
+    )
diff --git a/tests/test_agent_cascade_subagents.py b/tests/test_agent_cascade_subagents.py
index 19baf998..f0db500a 100644
--- a/tests/test_agent_cascade_subagents.py
+++ b/tests/test_agent_cascade_subagents.py
@@ -1,48 +1,81 @@
-"""Tests for the general-purpose subagent spec (`assembly live --files` task tool)."""
+"""Tests for the live agent's general-purpose subagent (`assembly live` task tool).
+
+deepagents auto-adds a `general-purpose` subagent (its `task` tool) and derives that subagent's
+`interrupt_on` from the top-level `create_deep_agent(interrupt_on=…)`, so we don't declare the
+subagent ourselves. We only register a harness profile that overrides its prose for a voice turn
+(`brain._register_gp_subagent_profile`). These tests guard both halves: the profile override and
+the inherited write-gating (a delegated write must still surface at the *parent* approval gate).
+"""
 
 from __future__ import annotations
 
+import deepagents
 from langchain_core.language_models.chat_models import BaseChatModel
 from langchain_core.messages import AIMessage
 
-from aai_cli.agent_cascade import brain
+from aai_cli.agent_cascade import brain, subagents
 from aai_cli.agent_cascade.config import CascadeConfig
-from aai_cli.agent_cascade.subagents import general_purpose_subagent
 from tests._cascade_fakes import FakeChatModel
 
 
-def test_spec_has_required_keys_and_omits_model_and_tools():
-    spec = general_purpose_subagent({"write_file": True, "edit_file": True, "execute": True})
-    assert spec["name"] == "general-purpose"
-    assert isinstance(spec["description"], str) and spec["description"]
-    assert isinstance(spec["system_prompt"], str) and spec["system_prompt"]
-    # AssemblyAI-only invariant: no provider:model string — must inherit the gateway-bound model.
-    assert "model" not in spec
-    # Full-tools path: tools omitted so the subagent inherits the sandboxed main toolset.
-    assert "tools" not in spec
+def test_register_gp_subagent_profile_overrides_prompt_and_description(monkeypatch):
+    # Capture the registration instead of reading the process-global registry (other tests
+    # populate it, and `register_harness_profile` is imported inside the helper, so patching
+    # the module attribute is picked up at call time).
+    captured: dict[str, object] = {}
+    monkeypatch.setattr(
+        deepagents,
+        "register_harness_profile",
+        lambda key, profile: captured.update(key=key, profile=profile),
+    )
+
+    subagents.register_gp_subagent_profile()
+
+    # Keyed by the bare provider so the override still applies when --model overrides the
+    # default identifier (the gateway model is a ChatOpenAI subclass → provider "openai").
+    assert captured["key"] == "openai"
+    gp = captured["profile"].general_purpose_subagent
+    # Spoken-length summary, not deepagents' "complete answer" default.
+    assert "summary" in gp.system_prompt and "transcript" in gp.system_prompt
+    assert "subtask" in gp.description and "summary" in gp.description
+    # Don't pin a model/tools: the subagent must inherit the gateway-bound model + sandboxed tools.
+    assert gp.enabled is None
+
 
+def test_build_graph_registers_the_gp_subagent_profile(monkeypatch):
+    # The override only takes effect if build_graph actually registers it before create_deep_agent.
+    keys: list[str] = []
+    monkeypatch.setattr(
+        deepagents, "register_harness_profile", lambda key, profile: keys.append(key)
+    )
+
+    brain.build_graph("k", CascadeConfig(files=False), tools=[], mcp_tools=[])
 
-def test_spec_interrupt_on_is_the_passed_mapping():
-    # Mirrors the caller's write tools verbatim, so the subagent's mutations also prompt. Passing
-    # a distinct mapping proves it isn't hardcoded (kills a "return a fixed dict" mutant).
-    io = {"write_file": True, "edit_file": True, "execute": True}
-    assert general_purpose_subagent(io)["interrupt_on"] == io
-    assert general_purpose_subagent({"write_file": True})["interrupt_on"] == {"write_file": True}
+    assert keys == ["openai"]
 
 
-def test_graph_kwargs_wires_one_gated_gateway_bound_subagent(monkeypatch, tmp_path):
-    # --files binds exactly one subagent: gateway-bound (no model) with every mutating tool gated.
+def test_profile_override_lands_on_the_auto_added_subagent(monkeypatch, tmp_path):
+    # End-to-end: the registered profile must reach deepagents' auto-added general-purpose
+    # subagent — proving "openai" matches the gateway model's resolved provider. The task tool's
+    # description embeds each subagent's description, so we read it back from the compiled graph.
     monkeypatch.chdir(tmp_path)
-    subs = brain._graph_kwargs(CascadeConfig(files=True))["subagents"]
-    assert isinstance(subs, list) and len(subs) == 1
-    spec = subs[0]
-    assert spec["name"] == "general-purpose"
-    assert "model" not in spec  # inherits the gateway-bound model
-    assert spec["interrupt_on"] == {"write_file": True, "edit_file": True, "execute": True}
+    graph = brain.build_graph("k", CascadeConfig(files=True), tools=[], mcp_tools=[])
+
+    task_tool = graph.nodes["tools"].bound.tools_by_name["task"]
+    assert subagents._GP_DESCRIPTION in task_tool.description
+
+
+def test_graph_kwargs_on_gates_writes_without_declaring_a_subagent():
+    # --files binds the gating + checkpointer but no explicit subagent: the gateway-bound GP
+    # subagent is auto-added and inherits this interrupt_on (see the surfacing test below).
+    kw = brain._graph_kwargs(CascadeConfig(files=True))
+    assert "subagents" not in kw
+    assert kw["interrupt_on"] == {"write_file": True, "edit_file": True, "execute": True}
+    assert "checkpointer" in kw and "backend" in kw
 
 
-def test_graph_kwargs_off_binds_no_subagents():
-    assert "subagents" not in brain._graph_kwargs(CascadeConfig(files=False))
+def test_graph_kwargs_off_is_empty():
+    assert brain._graph_kwargs(CascadeConfig(files=False)) == {}
 
 
 def test_tool_label_task_is_working_on_a_subtask():
@@ -50,26 +83,19 @@ def test_tool_label_task_is_working_on_a_subtask():
 
 
 def _delegating_graph(model: BaseChatModel, root: str):
-    """A real deepagents graph that binds a gated general-purpose subagent (mirrors the gated
-    write graph). Inline literals get bidirectional typing; no return annotation so pyright
-    accepts it as build_streamer's graph (same shape as the gated-write tests)."""
+    """A real deepagents graph that gates writes but declares NO subagent — exercising the
+    auto-added general-purpose subagent and its inherited top-level interrupt_on (mirrors the
+    gated write graph). Inline literals get bidirectional typing; no return annotation so
+    pyright accepts it as build_streamer's graph (same shape as the gated-write tests)."""
     from deepagents import create_deep_agent
     from deepagents.backends import FilesystemBackend
-    from deepagents.middleware.subagents import SubAgent
     from langgraph.checkpoint.memory import InMemorySaver
 
-    spec: SubAgent = {
-        "name": "general-purpose",
-        "description": "delegate a focused subtask and return a summary",
-        "system_prompt": "be a focused helper; return a concise summary",
-        "interrupt_on": {"write_file": True, "edit_file": True},
-    }
     return create_deep_agent(
         model=model,
         backend=FilesystemBackend(root_dir=root, virtual_mode=True),
         interrupt_on={"write_file": True, "edit_file": True},
         checkpointer=InMemorySaver(),
-        subagents=[spec],
         system_prompt="be a live agent",
     )
 
@@ -103,8 +129,10 @@ def _delegate_then_write(reply: str) -> FakeChatModel:
 
 
 def test_subagent_write_surfaces_through_the_parent_gate_and_is_approved(tmp_path):
-    # The DECISIVE M2 invariant (the resolved spike): a subagent's write pauses through OUR parent
-    # approval loop (build_streamer -> _stream_gated -> _pending_writes -> approver). Approved, it lands.
+    # The DECISIVE M2 invariant (the resolved spike): a write delegated to the AUTO-ADDED
+    # general-purpose subagent pauses through OUR parent approval loop (build_streamer ->
+    # _stream_gated -> _pending_writes -> approver) purely via the inherited top-level
+    # interrupt_on. Approved, it lands.
     asked: list[tuple[str, dict]] = []
     graph = _delegating_graph(_delegate_then_write("Saved it via the helper."), str(tmp_path))
     streamer = brain.build_streamer(

From dcc9d602f1889d9d6c34b87d88dde4b49f525ce4 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 23 Jun 2026 16:01:38 +0000
Subject: [PATCH 2/2] Trim _graph_kwargs docstring to keep brain.py at the
 500-line gate

The origin/main merge added a two-line module-docstring note, nudging
brain.py to 501 lines and failing the max-file-length gate in CI.
Condense the _graph_kwargs docstring (no logic change) back to 500.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LXmmmJ53yfuc3CKemRQi7x
---
 aai_cli/agent_cascade/brain.py | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/aai_cli/agent_cascade/brain.py b/aai_cli/agent_cascade/brain.py
index 9f81eab8..df4b4c07 100644
--- a/aai_cli/agent_cascade/brain.py
+++ b/aai_cli/agent_cascade/brain.py
@@ -217,11 +217,10 @@ def _graph_kwargs(
 ) -> dict[str, object]:
     """Extra ``create_deep_agent`` kwargs that turn on real-cwd files + write-gating.
 
-    Empty when ``--files`` is off, so the graph is built exactly as before. When on: a real-cwd
-    backend, ``interrupt_on`` pausing only the mutating tools for human approval, and an
-    in-memory checkpointer (interrupt/resume needs one). ``backend_factory`` is the test seam. No
-    ``subagents`` key: deepagents auto-adds a general-purpose subagent that inherits this
-    ``interrupt_on`` (so a delegated write surfaces at the same parent gate — see ``subagents.py``).
+    Empty when ``--files`` is off, so the graph is built as before. When on: a real-cwd backend,
+    ``interrupt_on`` gating only the mutating tools, an in-memory checkpointer (interrupt/resume
+    needs one), and ``backend_factory`` as the test seam. No ``subagents`` key: deepagents
+    auto-adds a general-purpose subagent that inherits this ``interrupt_on`` (see ``subagents.py``).
     """
     if not config.files:
         return {}