feat(cli): add lk agent session for headless text-mode agent runs#857
feat(cli): add lk agent session for headless text-mode agent runs#857toubatbrian wants to merge 18 commits into
lk agent session for headless text-mode agent runs#857Conversation
Introduces a three-process model (ephemeral CLI command, detached singleton daemon, agent subprocess) that drives a Python/JS agent over TCP using the lk.agent.session protobuf protocol, with no audio/CGO dependency: - `lk agent session start <file>`: re-execs the lk binary as a detached daemon bound to a fixed loopback port (singleton), which spawns the agent and applies text mode; rejects start if a session already runs. - `lk agent session say "..."`: streams a user turn and renders the agent reply, tool calls/outputs, and handoffs to the terminal. - `lk agent session end`: tears down the daemon and agent. The CLI<->daemon control protocol reuses pkg/ipc length-prefixed framing over the same TCP port, disambiguated from agent connections by a magic preamble. The headless renderer covers all ChatItem variants plus the FunctionToolsExecuted event. Drops the now-unnecessary U1000 file-ignore directives added while the helpers were unused. Co-authored-by: Cursor <cursoragent@cursor.com>
Tools that return no string (e.g. handoff tools returning an Agent) produced a bare "✓ " line. Suppress the output line when the summarized output is empty for successful calls; error outputs still render. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the env-gated branch at the top of main() with a dedicated, hidden `lk agent session daemon` subcommand (mirroring the existing hidden `generate-fish-completion` command). `start` now re-execs the binary into that subcommand instead of setting LK_SESSION_DAEMON=1, so the daemon has its own entrypoint dispatched by the CLI framework rather than special-casing main(). Re-exec of the same binary is retained (a separate binary can't be located reliably after `go install`); runtime params still flow through the LK_SESSION_* env vars. Co-authored-by: Cursor <cursoragent@cursor.com>
A registered subcommand is always invokable (Hidden only drops it from help), so a stray `lk agent session daemon` previously spawned a half-configured daemon (random port, empty project dir) that exited silently. Guard the entrypoint on the inherited readiness pipe that `start` always provides: without it, return a clear error directing the user to `lk agent session start`. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Sorry for late question, but now that |
|
From my understanding, With console, you don’t really get that separation, since it starts an interactive terminal UI. There isn’t a command-line-friendly way to distinguish between starting a session and sending input to that session, which is especially important for AI agents using a bash tool. cc @theomonnom in case you have more thoughts on the goal/scope of this feature. |
|
@toubatbrian got it, makes sense |
|
Is this supposed to be used in a way where the agent starts the room (if so, what's the command for that? is it just starting the agent via |
It's reuse the console mode, so lk agent session starts the actual agent in a subprocess and communicate via tcp. |
* test(cli): add e2e test for the agent session lifecycle Adds an opt-in end-to-end test that drives the real `lk agent session` start/say/end flow against a minimal one-file echo agent (testdata/echo-agent), asserting the model echoes a token back and that the detached daemon exits afterward. The fixture is a uv project so the daemon's `uv run python` auto-syncs deps; its __main__ dispatches console mode to the TCP console directly since cli.run_app() doesn't expose --connect-addr on released agents. Includes a GitHub Actions workflow that runs the test on Linux and Windows, triggered by workflow_dispatch and pushes to any repo branch. Gated behind LIVEKIT_API_KEY so it skips without credentials. * fix(cli): use a readiness file instead of an inherited pipe fd The session daemon spawn passed the readiness pipe to the detached child via cmd.ExtraFiles (fd 3), but os/exec's ExtraFiles is unsupported on Windows, so daemon.Start() failed with "fork/exec ...: not supported by windows" and the session never started there. Replace the inherited fd with a temp readiness file: the daemon writes its status atomically (write + rename) and `start` polls it until it sees a status, the daemon process exits, or a timeout slightly past the daemon's own agent-connect deadline. Works identically on Linux and Windows.
The console path links the vendored PortAudio cgo package, which needs the pa_src git submodule checked out and (on Linux) libasound2-dev for the ALSA host API. Add submodules: recursive to checkout and a Linux-only apt step.
pkg/apm/webrtc adds -fms-extensions to #cgo windows CXXFLAGS; Go's cgo flag allowlist rejects it unless CGO_CXXFLAGS_ALLOW permits it, matching what .goreleaser.yaml already does for Windows. test.yaml dodged this by excluding Windows from its matrix.
windows-latest's default mingw GCC can't compile pkg/apm/webrtc's MSVC SEH (__try/__except). The repo only builds Windows via zig/clang (.goreleaser.yaml), so install zig and point CC/CXX at it for the Windows arm.
The Windows link failed with 'fork/exec zig.exe: The filename or extension is too long' (Win error 206). The cgo build links ~560 object files whose paths overflow Windows' command-line limit. Go writes link args to a @response file when they're too long, but only after probing that the linker accepts @file -- and that probe runs only argv[0] of $CC. With CC="zig cc -target ...", argv[0] is bare 'zig' (no 'cc' subcommand), so the probe fails, Go skips the response file, and the argv overflows. Build zcc.exe/zxx.exe launchers that forward to 'zig cc'/'zig c++' so $CC is a single executable, the probe passes, and Go uses a response file.
…aser Drop the single-binary zig launcher wrappers in favor of goreleaser's direct CC=zig cc -target x86_64-windows-gnu / CXX=zig c++ ... and carry over CGO_CXXFLAGS=-fno-sanitize=all, so the Windows e2e build uses the same toolchain config as the release cross-build.
The Windows cgo link of ~560 webrtc/portaudio objects overflows the native command-line limit, so build like goreleaser does -- on Linux with zig. Split the Windows arm into cross-build-windows (cross-compiles lk.exe and the e2e test binary on ubuntu with zig + setup-cross.sh) and e2e-windows (downloads them and runs natively). buildLK now honors LK_SESSION_E2E_BIN so the test drives the prebuilt lk instead of rebuilding on the Windows runner.
Summary
Adds
lk agent session start|say|end— a headless, text-mode way to drive a LiveKit agent (Python or JS) straight from the terminal, with no audio/CGO dependency (it lives under the default tag-free build, not theconsoleaudio build).It uses a three-process model that mirrors the existing
lk agent consoleplumbing:start/say/end) — short-lived, talks to the daemon and exits.lkbinary re-exec'd into a hidden daemon mode (gated by an env var, never exposed as a subcommand). It binds a fixed loopback TCP port to enforce a single active session, spawns the agent, and applies text mode.lk.agent.sessionprotobuf protocol.The CLI↔daemon control protocol reuses
pkg/ipclength-prefixed framing on the same TCP port, disambiguated from agent connections by a 4-byte magic preamble. The headless renderer (session_render.go) prints user turns, agent replies, tool calls/outputs, and handoffs.Command running / IO example
Notes
startwhile a session is live is rejected (a session is already running on 127.0.0.1:<port>).consoletag. This drops the temporary//lint:file-ignore U1000directives that were added while the shared spawn/detect helpers were unused.TODO(node)/TODO(audio)placeholders mark the follow-up surfaces (JS agent detection, audio mode).Test plan
go build ./...(default) andCGO_ENABLED=1 go build -tags console ./...go vet -tags console ./cmd/lk/,gofmtcleanstart → say (tool call) → say (handoff/end_call) → endagainstbasic_agent.py(see IO example above)TODO(node))