Skip to content

feat(cli): add lk agent session for headless text-mode agent runs#857

Open
toubatbrian wants to merge 18 commits into
mainfrom
feat/agent-session-daemon
Open

feat(cli): add lk agent session for headless text-mode agent runs#857
toubatbrian wants to merge 18 commits into
mainfrom
feat/agent-session-daemon

Conversation

@toubatbrian

@toubatbrian toubatbrian commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds lk agent session start|say|end — a headless, text-mode way to drive a LiveKit agent (Python or JS) straight from the terminal, with no audio/CGO dependency (it lives under the default tag-free build, not the console audio build).

It uses a three-process model that mirrors the existing lk agent console plumbing:

  1. Ephemeral CLI command (start/say/end) — short-lived, talks to the daemon and exits.
  2. Detached singleton daemon — the lk binary re-exec'd into a hidden daemon mode (gated by an env var, never exposed as a subcommand). It binds a fixed loopback TCP port to enforce a single active session, spawns the agent, and applies text mode.
  3. Agent subprocess — the user's agent, connected over the lk.agent.session protobuf protocol.

The CLI↔daemon control protocol reuses pkg/ipc length-prefixed framing on the same TCP port, disambiguated from agent connections by a 4-byte magic preamble. The headless renderer (session_render.go) prints user turns, agent replies, tool calls/outputs, and handoffs.

Command running / IO example

$ lk agent session start examples/voice_agents/basic_agent.py
Detected Python agent (basic_agent.py in .../examples/voice_agents)
Session started. Use `lk agent session say "..."` to talk, `lk agent session end` to stop.

$ lk agent session say "what's the weather in San Francisco?"

  ● You
    what's the weather in San Francisco?

  ● function_tool: lookup_weather
    ✓ sunny with a temperature of 70 degrees.

  ● Agent
    The weather in San Francisco is sunny with a temperature of 70 degrees. Want to know the forecast for any other city?

$ lk agent session say "thanks, that's all"

  ● You
    thanks, that's all

  ● function_tool: end_call
    ✓ say goodbye to the user

  ● Agent
    Goodbye! Have a great day!

$ lk agent session end
Session ended.

Notes

  • Singleton enforcement: a second start while a session is live is rejected (a session is already running on 127.0.0.1:<port>).
  • No CGO/audio: builds in the default tag-free binary; the audio pipeline stays behind the console tag. This drops the temporary //lint:file-ignore U1000 directives that were added while the shared spawn/detect helpers were unused.
  • TODO(node) / TODO(audio) placeholders mark the follow-up surfaces (JS agent detection, audio mode).

Test plan

  • go build ./... (default) and CGO_ENABLED=1 go build -tags console ./...
  • go vet -tags console ./cmd/lk/, gofmt clean
  • End-to-end start → say (tool call) → say (handoff/end_call) → end against basic_agent.py (see IO example above)
  • JS agent run (pending TODO(node))

Introduces a three-process model (ephemeral CLI command, detached
singleton daemon, agent subprocess) that drives a Python/JS agent over
TCP using the lk.agent.session protobuf protocol, with no audio/CGO
dependency:

- `lk agent session start <file>`: re-execs the lk binary as a detached
  daemon bound to a fixed loopback port (singleton), which spawns the
  agent and applies text mode; rejects start if a session already runs.
- `lk agent session say "..."`: streams a user turn and renders the
  agent reply, tool calls/outputs, and handoffs to the terminal.
- `lk agent session end`: tears down the daemon and agent.

The CLI<->daemon control protocol reuses pkg/ipc length-prefixed framing
over the same TCP port, disambiguated from agent connections by a magic
preamble. The headless renderer covers all ChatItem variants plus the
FunctionToolsExecuted event. Drops the now-unnecessary U1000 file-ignore
directives added while the helpers were unused.

Co-authored-by: Cursor <cursoragent@cursor.com>
Tools that return no string (e.g. handoff tools returning an Agent)
produced a bare "✓ " line. Suppress the output line when the summarized
output is empty for successful calls; error outputs still render.

Co-authored-by: Cursor <cursoragent@cursor.com>
theomonnom
theomonnom approved these changes Jun 3, 2026

@theomonnom theomonnom left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh stamped the wrong PR

Comment thread cmd/lk/main.go Outdated
toubatbrian and others added 2 commits June 3, 2026 15:46
Replace the env-gated branch at the top of main() with a dedicated,
hidden `lk agent session daemon` subcommand (mirroring the existing
hidden `generate-fish-completion` command). `start` now re-execs the
binary into that subcommand instead of setting LK_SESSION_DAEMON=1, so
the daemon has its own entrypoint dispatched by the CLI framework rather
than special-casing main(). Re-exec of the same binary is retained
(a separate binary can't be located reliably after `go install`);
runtime params still flow through the LK_SESSION_* env vars.

Co-authored-by: Cursor <cursoragent@cursor.com>
A registered subcommand is always invokable (Hidden only drops it from
help), so a stray `lk agent session daemon` previously spawned a
half-configured daemon (random port, empty project dir) that exited
silently. Guard the entrypoint on the inherited readiness pipe that
`start` always provides: without it, return a clear error directing the
user to `lk agent session start`.

Co-authored-by: Cursor <cursoragent@cursor.com>
@toubatbrian toubatbrian requested a review from theomonnom June 3, 2026 22:52
@rektdeckard

Copy link
Copy Markdown
Member

Sorry for late question, but now that console feature is compiled by default on all platforms, how different is this from just using console in text mode?

@toubatbrian

Copy link
Copy Markdown
Contributor Author

From my understanding, lk agent session gives users/agents a way to send multiple independent commands to a session.

With console, you don’t really get that separation, since it starts an interactive terminal UI. There isn’t a command-line-friendly way to distinguish between starting a session and sending input to that session, which is especially important for AI agents using a bash tool.

cc @theomonnom in case you have more thoughts on the goal/scope of this feature.

@rektdeckard

Copy link
Copy Markdown
Member

@toubatbrian got it, makes sense

@u9g

u9g commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Is this supposed to be used in a way where the agent starts the room (if so, what's the command for that? is it just starting the agent via npm start?) and then the agent uses a different terminal to send text messages to the agent via lk agent session?

@toubatbrian

Copy link
Copy Markdown
Contributor Author

Is this supposed to be used in a way where the agent starts the room (if so, what's the command for that? is it just starting the agent via npm start?) and then the agent uses a different terminal to send text messages to the agent via lk agent session?

It's reuse the console mode, so lk agent session starts the actual agent in a subprocess and communicate via tcp.

u9g and others added 13 commits June 22, 2026 16:06
* test(cli): add e2e test for the agent session lifecycle

Adds an opt-in end-to-end test that drives the real `lk agent session`
start/say/end flow against a minimal one-file echo agent (testdata/echo-agent),
asserting the model echoes a token back and that the detached daemon exits
afterward. The fixture is a uv project so the daemon's `uv run python`
auto-syncs deps; its __main__ dispatches console mode to the TCP console
directly since cli.run_app() doesn't expose --connect-addr on released agents.

Includes a GitHub Actions workflow that runs the test on Linux and Windows,
triggered by workflow_dispatch and pushes to any repo branch. Gated behind
LIVEKIT_API_KEY so it skips without credentials.

* fix(cli): use a readiness file instead of an inherited pipe fd

The session daemon spawn passed the readiness pipe to the detached child via
cmd.ExtraFiles (fd 3), but os/exec's ExtraFiles is unsupported on Windows, so
daemon.Start() failed with "fork/exec ...: not supported by windows" and the
session never started there.

Replace the inherited fd with a temp readiness file: the daemon writes its
status atomically (write + rename) and `start` polls it until it sees a status,
the daemon process exits, or a timeout slightly past the daemon's own
agent-connect deadline. Works identically on Linux and Windows.
The console path links the vendored PortAudio cgo package, which needs the
pa_src git submodule checked out and (on Linux) libasound2-dev for the ALSA
host API. Add submodules: recursive to checkout and a Linux-only apt step.
pkg/apm/webrtc adds -fms-extensions to #cgo windows CXXFLAGS; Go's cgo flag
allowlist rejects it unless CGO_CXXFLAGS_ALLOW permits it, matching what
.goreleaser.yaml already does for Windows. test.yaml dodged this by excluding
Windows from its matrix.
windows-latest's default mingw GCC can't compile pkg/apm/webrtc's MSVC SEH
(__try/__except). The repo only builds Windows via zig/clang (.goreleaser.yaml),
so install zig and point CC/CXX at it for the Windows arm.
The Windows link failed with 'fork/exec zig.exe: The filename or
extension is too long' (Win error 206). The cgo build links ~560 object
files whose paths overflow Windows' command-line limit. Go writes link
args to a @response file when they're too long, but only after probing
that the linker accepts @file -- and that probe runs only argv[0] of
$CC. With CC="zig cc -target ...", argv[0] is bare 'zig' (no 'cc'
subcommand), so the probe fails, Go skips the response file, and the
argv overflows.

Build zcc.exe/zxx.exe launchers that forward to 'zig cc'/'zig c++' so
$CC is a single executable, the probe passes, and Go uses a response
file.
…aser

Drop the single-binary zig launcher wrappers in favor of goreleaser's
direct CC=zig cc -target x86_64-windows-gnu / CXX=zig c++ ... and carry
over CGO_CXXFLAGS=-fno-sanitize=all, so the Windows e2e build uses the
same toolchain config as the release cross-build.
The Windows cgo link of ~560 webrtc/portaudio objects overflows the
native command-line limit, so build like goreleaser does -- on Linux
with zig. Split the Windows arm into cross-build-windows (cross-compiles
lk.exe and the e2e test binary on ubuntu with zig + setup-cross.sh) and
e2e-windows (downloads them and runs natively). buildLK now honors
LK_SESSION_E2E_BIN so the test drives the prebuilt lk instead of
rebuilding on the Windows runner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants