From f91f2437a6aeb0ae0c4e147f75b04cd6e0123a95 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 23 Jun 2026 15:42:13 +0000 Subject: [PATCH 1/2] Rely on deepagents' auto-added general-purpose subagent MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The live cascade hand-built the general-purpose `task` subagent as an explicit `subagents=[…]` spec, restating `interrupt_on` and the model/tools omissions. But deepagents already auto-adds a general-purpose subagent and derives its `interrupt_on` from the top-level `create_deep_agent(interrupt_on=…)` we pass — so the subagent inherits the write-gating, gateway-bound model, and sandboxed toolset without us declaring it. A delegated write still surfaces at the parent approval gate (the HITL invariant), now verified against the *auto-added* subagent. The only thing worth customizing is its prose: deepagents' default subagent prompt asks for a "complete answer", but a live voice turn wants a short, spoken-length summary. `subagents.py` now collapses to a harness profile (`GeneralPurposeSubagentProfile`) carrying just that prompt + description, registered under the gateway model's provider so the override survives a `--model` change. Drops the explicit subagent spec and the redundant per-subagent `interrupt_on`. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01LXmmmJ53yfuc3CKemRQi7x --- aai_cli/AGENTS.md | 2 +- aai_cli/agent_cascade/brain.py | 9 ++- aai_cli/agent_cascade/subagents.py | 59 +++++++++----- tests/test_agent_cascade_subagents.py | 110 ++++++++++++++++---------- 4 files changed, 114 insertions(+), 66 deletions(-) diff --git a/aai_cli/AGENTS.md b/aai_cli/AGENTS.md index 7efd884a..25354894 100644 --- a/aai_cli/AGENTS.md +++ b/aai_cli/AGENTS.md @@ -153,7 +153,7 @@ heavily-reworked commands with long bodies; small commands keep the inline - **`streaming/`** + `client.stream_audio` — v3 realtime API. Event callbacks run on the SDK reader thread and guard against `BrokenPipeError` (`stdio.silence_stdout()`) so a closed pipe never dumps a thread traceback. - **`core/sync_stt.py`** + **`core/signals.py`** + `commands/dictate/` — `assembly dictate`: headless dictation over the **Sync STT API** (`Environment.sync_base`, one POST `/transcribe` per utterance with the required `X-AAI-Model: u3-sync-pro` header; 80 ms–120 s of PCM/WAV). It needs no terminal: recording starts immediately and `dictate_exec._record` polls `signals.stop_on_terminate` between ~100 ms mic chunks for a SIGTERM, which finishes the utterance (clean exit 0) — so a hotkey tool like Hammerspoon can launch it as a background task and `kill -TERM`/`task:terminate()` to transcribe. SIGINT (Ctrl-C) still cancels (exit 130). Both boundaries (the stop latch, mic, HTTP) are injectable, so the suite never needs a real signal or microphone (`tests/test_dictate_exec.py` scripts the SIGTERM latch). Contrast `signals.terminate_as_interrupt` (used by `stream`/`agent`/`speak`), which routes SIGTERM into the *cancel* path instead. - **`agent/`** — full-duplex voice agent (mic in, TTS out via `voices.py`). -- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): the engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (on by default; `--no-files` opts out) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`, and binds one gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool; spec in `agent_cascade/subagents.py`, omitting `model`/`tools` so it inherits both) for delegating a focused subtask. The subagent's own `interrupt_on` mirrors `_WRITE_TOOLS`, and a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated. +- **`agent_cascade/`** + `commands/agent_cascade/` — `assembly agent-cascade`: the same live terminal conversation as `assembly agent`, but **client-orchestrated** — `engine.run_cascade` wires Streaming STT → the LLM Gateway → streaming TTS itself instead of talking to the Voice Agent endpoint, mirroring what the `agent-cascade` `assembly init` template does server-side. **Sandbox-only** (streaming TTS has no prod host; guarded via `tts.session.require_available`). Reuses the agent slice's `DuplexAudio`/`AgentRenderer` and `core.client.stream_audio`/`core.llm.complete`/`tts.session.synthesize`; the three network legs are injected through `engine.CascadeDeps` (the `tts/session.py` seam) so the cascade — greeting, clause-level streaming TTS, barge-in, history window — is unit-tested against fakes with no sockets/mic/speaker. The LLM leg is a deepagents graph (`brain.py`) streamed token-by-token via `brain.build_streamer` (`graph.stream(stream_mode="messages")`): the engine buffers `SpeechDelta`s, flushes complete clauses with `text.pop_clauses` (soft-separator clauses gated by `engine._MIN_CLAUSE_CHARS`), and synthesizes each clause with **streaming TTS** (`tts.session.synthesize(on_audio=…)`) so audio starts on the first frame instead of after the whole reply. The reply runs on a throwaway producer thread feeding a `queue.Queue` the worker drains under a monotonic deadline (the wall-clock backstop that replaced `_complete_within`), and an abandoned-on-timeout graph leg's langchain `ThreadPoolExecutor` worker is detached (`_detach_executor_threads_since`) so it can't wedge interpreter exit. A `ToolNotice` surfaces the "Searching the web…" affordance and drops any unspoken preamble. Under `-v` (`debuglog.active()`) `brain._stream_graph` logs each accumulated assistant line, tool call, and tool result as it streams. **Front-end:** an interactive mic session in human mode runs a **voice-only Textual TUI** (`agent_cascade/tui.py`, `LiveAgentApp`) by default — there's no text input (you can't type to it), just a transcript + an animated voice bar tracking listening/thinking/speaking. It uses its own `banner` wordmark, `messages` widgets, and `tui_status.voicebar_markup`/`VOICE_FRAMES` — all modules that now live in `agent_cascade/`; the blocking `run_cascade` runs on a worker thread and reaches the UI through a `_TuiRenderer` (the `engine.Renderer` protocol) that hops each call onto the UI thread, and a quit calls `DuplexAudio.close` to end the mic iterator and unblock that worker. `_exec._should_use_tui` gates it: file/sample input, `--json`/`-o text`, and a non-TTY all fall back to the plain `AgentRenderer` line output. **`--files`** (on by default; `--no-files` opts out) swaps the brain's in-memory backend for a real-cwd, sandbox-capable `SandboxedShellBackend` (`aai_cli/agent_cascade/sandbox.py`): file ops behave as before (traversal-blocked `virtual_mode`), and because it implements `SandboxBackendProtocol` deepagents binds a *functional* `execute` that runs commands OS-sandboxed in the real cwd — `sandbox-exec` (SBPL) on macOS, `bwrap` on Linux, refused (never an unconfined fallback) on any other platform or with the sandbox binary missing; the OS sandbox blocks the network, confines writes to cwd (+ the temp dir), and read-denies credential stores (`~/.ssh`/`~/.aws`/…, `.env*`, `.claude/`). The policy renderers are pure and the subprocess/capability boundaries injected, so the suite asserts *what we'd run* with no real sandbox. `write_file`/`edit_file`/`execute` are gated via `interrupt_on` + an `InMemorySaver`; `brain._stream_gated` detects the post-stream interrupt (`graph.get_state(config).interrupts`), asks an injected `Approver`, and resumes with `Command(resume=…)`, bracketing the human wait in `ApprovalPause` events so `engine._consume` suspends its reply deadline (`risk.py` surfaces a shell-risk warning on the prompt). The voice TUI supplies the approver via `agent_cascade.modals.ApprovalScreen` (`y`/`a`/`n`), which can *also* be resolved hands-free by voice: while a write awaits approval, `_consume` arms `_awaiting_approval` and `engine.on_turn` routes the next final transcript to `app.submit_voice_approval` → `ApprovalScreen.try_voice`, which applies `spoken_approval.spoken_decision` (an unambiguous affirmative approves, anything else rejects — fail-safe; destructive `risk.py`-flagged commands ignore the spoken answer and require a keypress). Headless runs auto-deny (`_exec._deny_writes`). `--files` also turns on durable per-project memory via deepagents' `MemoryMiddleware` (`memory=["./.deepagents/AGENTS.md"]`), distinct from the in-session `InMemorySaver`. The gateway-bound, sandbox-backed general-purpose subagent (deepagents' `task` tool) for delegating a focused subtask is **auto-added by deepagents** — we don't declare it. We only override its prose for a voice turn (a spoken-length summary, not the SDK's "complete answer" default) via a harness profile keyed by the gateway model's provider (`subagents.register_gp_subagent_profile`, called from `build_graph` so the deepagents import stays lazy — and kept off `brain.py`, which sits at the 500-line gate). It inherits the gateway-bound model, the sandboxed toolset, *and* the top-level `interrupt_on` (deepagents' `graph.py` merges the top-level config into the auto-added subagent), so a delegated `write_file`/`edit_file`/`execute` surfaces at the *parent* `get_state().interrupts` with no per-subagent restatement (so `_pending_writes` gates it too — verified by a HITL spike, locked in `tests/test_agent_cascade_subagents.py`). Reads (incl. `grep`) stay ungated. - **`tts/`** + `commands/speak.py` — `assembly speak` synthesizes text to speech over the sandbox streaming-TTS WebSocket (`streaming-tts.sandbox000.…`). **Sandbox-only:** `session.is_available()` is false in production (empty `Environment.streaming_tts_host`), so the command exits 2 with a `--sandbox` hint. `session.synthesize` drives a Begin→Generate→Flush→Audio→Terminate protocol with an injectable `connect` for hermetic tests (mirrors `agent/session.py`); `audio.py` plays the PCM (default) or writes a WAV (`--out`). The single-voice default-playback path **streams**: `synthesize`'s `on_audio(chunk, sample_rate)` callback is wired to `audio.PcmPlayer.feed`, so speech starts on the first Audio frame (it opens the device lazily, since the rate is only known at Begin) instead of after the whole text — the win for a long `--url` page. `--out` (needs the full buffer) and the multi-voice dialogue path (`synthesize_dialogue` → `_output_audio` → buffered `play_pcm`) stay buffered; `synthesize` still returns the complete PCM for the summary regardless. - **`code_gen/`** — backs `--show-code` on `transcribe`/`stream`/`agent`: builds a ready-to-run Python SDK script from exactly the flags passed (no API key needed; generated code reads `ASSEMBLYAI_API_KEY`). - **`auth/`** — browser-assisted `assembly login` via AMS + **Stytch B2B OAuth discovery** (`discovery.py`, `flow.py`, `loopback.py`, `ams.py`). Not Stytch Connected Apps. diff --git a/aai_cli/agent_cascade/brain.py b/aai_cli/agent_cascade/brain.py index ead1d7fe..aadb306f 100644 --- a/aai_cli/agent_cascade/brain.py +++ b/aai_cli/agent_cascade/brain.py @@ -218,20 +218,19 @@ def _graph_kwargs( Empty when ``--files`` is off, so the graph is built exactly as before. When on: a real-cwd backend, ``interrupt_on`` pausing only the mutating tools for human approval, and an - in-memory checkpointer (interrupt/resume needs one). ``backend_factory`` is the test seam. + in-memory checkpointer (interrupt/resume needs one). ``backend_factory`` is the test seam. No + ``subagents`` key: deepagents auto-adds a general-purpose subagent that inherits this + ``interrupt_on`` (so a delegated write surfaces at the same parent gate — see ``subagents.py``). """ if not config.files: return {} from langgraph.checkpoint.memory import InMemorySaver - from aai_cli.agent_cascade.subagents import general_purpose_subagent - return { "backend": backend_factory(), "interrupt_on": dict.fromkeys(_WRITE_TOOLS, True), "checkpointer": InMemorySaver(), "memory": ["./.deepagents/AGENTS.md"], - "subagents": [general_purpose_subagent(dict.fromkeys(_WRITE_TOOLS, True))], } @@ -270,7 +269,9 @@ def build_graph( from aai_cli.agent_cascade.mcp_tools import load_mcp_tools from aai_cli.agent_cascade.model import build_model + from aai_cli.agent_cascade.subagents import register_gp_subagent_profile + register_gp_subagent_profile() model = build_model( api_key, model=config.model, max_tokens=config.max_tokens, extra=config.llm_extra ) diff --git a/aai_cli/agent_cascade/subagents.py b/aai_cli/agent_cascade/subagents.py index 0d571610..2eeb5dae 100644 --- a/aai_cli/agent_cascade/subagents.py +++ b/aai_cli/agent_cascade/subagents.py @@ -1,34 +1,53 @@ -"""The general-purpose subagent for ``assembly live --files`` (deepagents' ``task`` tool). +"""Tune deepagents' auto-added general-purpose subagent for ``assembly live`` (the ``task`` tool). -One subagent the live agent delegates a focused multi-step subtask to. It OMITS ``model`` (so it -inherits the AssemblyAI gateway-bound model — never a ``provider:model`` string) and ``tools`` (so -it inherits the main sandboxed toolset, keeping its ``execute`` OS-confined). Its ``interrupt_on`` -mirrors the main agent's write tools, so the subagent's own mutations prompt through the same -approval loop (verified to surface at the parent gate — see the HITL regression test). +deepagents auto-adds a ``general-purpose`` subagent and derives its ``interrupt_on`` from the +top-level ``create_deep_agent(interrupt_on=…)``, so we don't declare the subagent ourselves — it +inherits the gateway-bound model, the sandboxed toolset, *and* the write-gating, and a delegated +write still surfaces at the *parent* approval gate (locked by ``tests/test_agent_cascade_subagents.py``). +The only thing we override is its prose: the SDK's default subagent prompt asks for a "complete +answer", but a live voice turn wants a short, spoken-length summary. We set that (and the +description) through a harness profile, keeping this off ``brain.py`` (which sits at the 500-line gate). """ from __future__ import annotations -_SYSTEM_PROMPT = ( +_GP_SYSTEM_PROMPT = ( "You are a focused coworker handling one delegated subtask in the user's project. Work in the " "current directory, use the available tools to research or make a contained change, and return " "a concise, spoken-length summary of what you did or found — not a transcript." ) +_GP_DESCRIPTION = ( + "Delegate a focused multi-step subtask — research, gather context, or implement a " + "contained change — and get back a short summary. Keeps the main voice turn lean." +) + +# The harness-profile registry is keyed by model provider/identifier; the gateway model is a +# ChatOpenAI subclass, so its provider is "openai". We register under the bare provider (not +# provider:model) so the override still applies when --model overrides the default identifier. +# Safe to scope this broadly: brain.build_graph is the *only* create_deep_agent call in the CLI. +_GP_PROFILE_MODEL_PROVIDER = "openai" -def general_purpose_subagent(interrupt_on: dict[str, bool]) -> dict[str, object]: - """The ``task`` subagent spec: gateway-bound (no ``model``), full sandboxed tools (no ``tools``), - with ``interrupt_on`` mirroring the caller's write tools so its mutations stay gated. +def register_gp_subagent_profile() -> None: + """Override the auto-added general-purpose subagent's prompt + description for a voice turn. - ``interrupt_on`` is a parameter (not a local constant) so this module needn't import - ``brain._WRITE_TOOLS`` — that would be a circular import, since ``brain`` imports this. + Registers a harness profile that swaps in a spoken-length summary prompt (instead of + deepagents' "complete answer" default) and our short description. The subagent keeps + inheriting the gateway-bound model, the sandboxed toolset, and the top-level ``interrupt_on``. + Idempotent — re-registers the same profile under the same key; ``brain.build_graph`` calls it + once per graph build (the deepagents import stays lazy here, off the startup path). """ - return { - "name": "general-purpose", - "description": ( - "Delegate a focused multi-step subtask — research, gather context, or implement a " - "contained change — and get back a short summary. Keeps the main voice turn lean." + from deepagents import ( + GeneralPurposeSubagentProfile, + HarnessProfile, + register_harness_profile, + ) + + register_harness_profile( + _GP_PROFILE_MODEL_PROVIDER, + HarnessProfile( + general_purpose_subagent=GeneralPurposeSubagentProfile( + system_prompt=_GP_SYSTEM_PROMPT, description=_GP_DESCRIPTION + ) ), - "system_prompt": _SYSTEM_PROMPT, - "interrupt_on": interrupt_on, - } + ) diff --git a/tests/test_agent_cascade_subagents.py b/tests/test_agent_cascade_subagents.py index 19baf998..f0db500a 100644 --- a/tests/test_agent_cascade_subagents.py +++ b/tests/test_agent_cascade_subagents.py @@ -1,48 +1,81 @@ -"""Tests for the general-purpose subagent spec (`assembly live --files` task tool).""" +"""Tests for the live agent's general-purpose subagent (`assembly live` task tool). + +deepagents auto-adds a `general-purpose` subagent (its `task` tool) and derives that subagent's +`interrupt_on` from the top-level `create_deep_agent(interrupt_on=…)`, so we don't declare the +subagent ourselves. We only register a harness profile that overrides its prose for a voice turn +(`brain._register_gp_subagent_profile`). These tests guard both halves: the profile override and +the inherited write-gating (a delegated write must still surface at the *parent* approval gate). +""" from __future__ import annotations +import deepagents from langchain_core.language_models.chat_models import BaseChatModel from langchain_core.messages import AIMessage -from aai_cli.agent_cascade import brain +from aai_cli.agent_cascade import brain, subagents from aai_cli.agent_cascade.config import CascadeConfig -from aai_cli.agent_cascade.subagents import general_purpose_subagent from tests._cascade_fakes import FakeChatModel -def test_spec_has_required_keys_and_omits_model_and_tools(): - spec = general_purpose_subagent({"write_file": True, "edit_file": True, "execute": True}) - assert spec["name"] == "general-purpose" - assert isinstance(spec["description"], str) and spec["description"] - assert isinstance(spec["system_prompt"], str) and spec["system_prompt"] - # AssemblyAI-only invariant: no provider:model string — must inherit the gateway-bound model. - assert "model" not in spec - # Full-tools path: tools omitted so the subagent inherits the sandboxed main toolset. - assert "tools" not in spec +def test_register_gp_subagent_profile_overrides_prompt_and_description(monkeypatch): + # Capture the registration instead of reading the process-global registry (other tests + # populate it, and `register_harness_profile` is imported inside the helper, so patching + # the module attribute is picked up at call time). + captured: dict[str, object] = {} + monkeypatch.setattr( + deepagents, + "register_harness_profile", + lambda key, profile: captured.update(key=key, profile=profile), + ) + + subagents.register_gp_subagent_profile() + + # Keyed by the bare provider so the override still applies when --model overrides the + # default identifier (the gateway model is a ChatOpenAI subclass → provider "openai"). + assert captured["key"] == "openai" + gp = captured["profile"].general_purpose_subagent + # Spoken-length summary, not deepagents' "complete answer" default. + assert "summary" in gp.system_prompt and "transcript" in gp.system_prompt + assert "subtask" in gp.description and "summary" in gp.description + # Don't pin a model/tools: the subagent must inherit the gateway-bound model + sandboxed tools. + assert gp.enabled is None + +def test_build_graph_registers_the_gp_subagent_profile(monkeypatch): + # The override only takes effect if build_graph actually registers it before create_deep_agent. + keys: list[str] = [] + monkeypatch.setattr( + deepagents, "register_harness_profile", lambda key, profile: keys.append(key) + ) + + brain.build_graph("k", CascadeConfig(files=False), tools=[], mcp_tools=[]) -def test_spec_interrupt_on_is_the_passed_mapping(): - # Mirrors the caller's write tools verbatim, so the subagent's mutations also prompt. Passing - # a distinct mapping proves it isn't hardcoded (kills a "return a fixed dict" mutant). - io = {"write_file": True, "edit_file": True, "execute": True} - assert general_purpose_subagent(io)["interrupt_on"] == io - assert general_purpose_subagent({"write_file": True})["interrupt_on"] == {"write_file": True} + assert keys == ["openai"] -def test_graph_kwargs_wires_one_gated_gateway_bound_subagent(monkeypatch, tmp_path): - # --files binds exactly one subagent: gateway-bound (no model) with every mutating tool gated. +def test_profile_override_lands_on_the_auto_added_subagent(monkeypatch, tmp_path): + # End-to-end: the registered profile must reach deepagents' auto-added general-purpose + # subagent — proving "openai" matches the gateway model's resolved provider. The task tool's + # description embeds each subagent's description, so we read it back from the compiled graph. monkeypatch.chdir(tmp_path) - subs = brain._graph_kwargs(CascadeConfig(files=True))["subagents"] - assert isinstance(subs, list) and len(subs) == 1 - spec = subs[0] - assert spec["name"] == "general-purpose" - assert "model" not in spec # inherits the gateway-bound model - assert spec["interrupt_on"] == {"write_file": True, "edit_file": True, "execute": True} + graph = brain.build_graph("k", CascadeConfig(files=True), tools=[], mcp_tools=[]) + + task_tool = graph.nodes["tools"].bound.tools_by_name["task"] + assert subagents._GP_DESCRIPTION in task_tool.description + + +def test_graph_kwargs_on_gates_writes_without_declaring_a_subagent(): + # --files binds the gating + checkpointer but no explicit subagent: the gateway-bound GP + # subagent is auto-added and inherits this interrupt_on (see the surfacing test below). + kw = brain._graph_kwargs(CascadeConfig(files=True)) + assert "subagents" not in kw + assert kw["interrupt_on"] == {"write_file": True, "edit_file": True, "execute": True} + assert "checkpointer" in kw and "backend" in kw -def test_graph_kwargs_off_binds_no_subagents(): - assert "subagents" not in brain._graph_kwargs(CascadeConfig(files=False)) +def test_graph_kwargs_off_is_empty(): + assert brain._graph_kwargs(CascadeConfig(files=False)) == {} def test_tool_label_task_is_working_on_a_subtask(): @@ -50,26 +83,19 @@ def test_tool_label_task_is_working_on_a_subtask(): def _delegating_graph(model: BaseChatModel, root: str): - """A real deepagents graph that binds a gated general-purpose subagent (mirrors the gated - write graph). Inline literals get bidirectional typing; no return annotation so pyright - accepts it as build_streamer's graph (same shape as the gated-write tests).""" + """A real deepagents graph that gates writes but declares NO subagent — exercising the + auto-added general-purpose subagent and its inherited top-level interrupt_on (mirrors the + gated write graph). Inline literals get bidirectional typing; no return annotation so + pyright accepts it as build_streamer's graph (same shape as the gated-write tests).""" from deepagents import create_deep_agent from deepagents.backends import FilesystemBackend - from deepagents.middleware.subagents import SubAgent from langgraph.checkpoint.memory import InMemorySaver - spec: SubAgent = { - "name": "general-purpose", - "description": "delegate a focused subtask and return a summary", - "system_prompt": "be a focused helper; return a concise summary", - "interrupt_on": {"write_file": True, "edit_file": True}, - } return create_deep_agent( model=model, backend=FilesystemBackend(root_dir=root, virtual_mode=True), interrupt_on={"write_file": True, "edit_file": True}, checkpointer=InMemorySaver(), - subagents=[spec], system_prompt="be a live agent", ) @@ -103,8 +129,10 @@ def _delegate_then_write(reply: str) -> FakeChatModel: def test_subagent_write_surfaces_through_the_parent_gate_and_is_approved(tmp_path): - # The DECISIVE M2 invariant (the resolved spike): a subagent's write pauses through OUR parent - # approval loop (build_streamer -> _stream_gated -> _pending_writes -> approver). Approved, it lands. + # The DECISIVE M2 invariant (the resolved spike): a write delegated to the AUTO-ADDED + # general-purpose subagent pauses through OUR parent approval loop (build_streamer -> + # _stream_gated -> _pending_writes -> approver) purely via the inherited top-level + # interrupt_on. Approved, it lands. asked: list[tuple[str, dict]] = [] graph = _delegating_graph(_delegate_then_write("Saved it via the helper."), str(tmp_path)) streamer = brain.build_streamer( From dcc9d602f1889d9d6c34b87d88dde4b49f525ce4 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 23 Jun 2026 16:01:38 +0000 Subject: [PATCH 2/2] Trim _graph_kwargs docstring to keep brain.py at the 500-line gate The origin/main merge added a two-line module-docstring note, nudging brain.py to 501 lines and failing the max-file-length gate in CI. Condense the _graph_kwargs docstring (no logic change) back to 500. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01LXmmmJ53yfuc3CKemRQi7x --- aai_cli/agent_cascade/brain.py | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/aai_cli/agent_cascade/brain.py b/aai_cli/agent_cascade/brain.py index 9f81eab8..df4b4c07 100644 --- a/aai_cli/agent_cascade/brain.py +++ b/aai_cli/agent_cascade/brain.py @@ -217,11 +217,10 @@ def _graph_kwargs( ) -> dict[str, object]: """Extra ``create_deep_agent`` kwargs that turn on real-cwd files + write-gating. - Empty when ``--files`` is off, so the graph is built exactly as before. When on: a real-cwd - backend, ``interrupt_on`` pausing only the mutating tools for human approval, and an - in-memory checkpointer (interrupt/resume needs one). ``backend_factory`` is the test seam. No - ``subagents`` key: deepagents auto-adds a general-purpose subagent that inherits this - ``interrupt_on`` (so a delegated write surfaces at the same parent gate — see ``subagents.py``). + Empty when ``--files`` is off, so the graph is built as before. When on: a real-cwd backend, + ``interrupt_on`` gating only the mutating tools, an in-memory checkpointer (interrupt/resume + needs one), and ``backend_factory`` as the test seam. No ``subagents`` key: deepagents + auto-adds a general-purpose subagent that inherits this ``interrupt_on`` (see ``subagents.py``). """ if not config.files: return {}