perf: baseline recursion benches by Oppen · Pull Request #718 · yetanotherco/lambda_vm

Oppen · 2026-06-25T21:31:56Z

Summary

Baseline benchmarking/profiling infrastructure for the naive recursion verifier
guest (the STARK verifier compiled to RISC-V and run inside the VM).

The bulk of this branch — the recursion guest itself, the VmVerifyingKey
preprocessed-commitment caching, and the deserialize-only / per-step / page-count
diagnostics — is Nicole Graus' work (19 commits). This PR adds work on top of
that foundation (4 commits, Mario Rugiero):

Changes on top of Nicole's work

Make the verifier guest build no_std. ecsm/executor/stark/prover
now compile for the riscv64im-lambda-vm-elf target: ecsm gains
no_std + alloc (numeric deps behind a std feature, core::error::Error),
the executor VM modules are un-gated from std (the PRINT syscall stays
std-only), and prover-only trace-building code (CpuOperation::from_log,
collect_ecsm_ops, collect_bitwise_from_keccak, …) is gated behind prove
so the verifier guest never compiles it. sysinfo becomes optional
(disk-spill only) so guests drop its std pull, and the obsolete
keccak-patched [patch.crates-io] hack is removed from both guests.
Resolve histogram PCs via the addr2line crate (dev-dependency, 0.27)
instead of printing raw addresses for the user to pipe through the addr2line
binary. Each histogram now prints a per-function summary (cycles folded
over each function's PCs across the full histogram) plus the per-address
detail, via a shared print_pc_histogram helper.
Add a multi-query recursion histogram. The PC-histogram test is
parameterized on ProofOptions; blowup_factor and fri_number_of_queries
are coupled (the query count is JBR-derived from blowup for a fixed security
target), so the single- and multi-query tests differ only in that config:
- test_recursion_pc_histogram — blowup=2, 1 query (cheap baseline)
- test_recursion_pc_histogram_multiquery — blowup=8, 128-bit → ~73 queries
- Fixing this surfaced a stale 6.7 MiB MAX_PRIVATE_INPUT_SIZE hardcoded in
  vm/memory.rs that rejected the larger multi-query inner-proof blob; it now
  re-exports the canonical 64 MiB constant from constants (same fix
  perf-integrate needed after rebasing onto main).
Vkey plumbing fix: thread the optional decode/page commitments through
VmAirs::new_with_vkey / verify_with_options_with_vkey, and consult the
vkey's cached decode/register/per-page roots during AIR construction
(precedence: explicit caller param → vkey → recompute). This also fixed the
vkey tamper-detection tests.
/profile_recursion CI workflow (mirrors the /bench comment trigger): a
/profile_recursion comment from a repo member runs both histogram configs in
parallel via a matrix on the self-hosted bench runner and posts a single
combined per-function profile comment, parsed by
.github/scripts/aggregate_recursion_histogram.py.

Measured (local)

Config	Total cycles	Top function
deserialize-only (blowup=2, 1q)	21.4M	TLSF `alloc` 23.5%
recursion single-query (blowup=2, 1q)	118M	`keccak::keccak_p` 22%
recursion multi-query (blowup=8, ~73q)	3.27B	`keccak::keccak_p` 37%, verifier `reconstruct_deep_composition` 10%

As FRI queries scale, per-query verifier work (Keccak transcript hashing,
deep-composition reconstruction, extension-field mul) comes to dominate the
fixed setup cost (decode/allocator).

Testing

make test is green except the GPU/CUDA tests (no local libcuda).
All three histogram diagnostics (#[ignore]d) run and were executed locally;
numbers above.
CI matrix / combined-comment path is validated end-to-end on captured output,
but its GitHub trigger only exercises once a /profile_recursion comment is
posted (done on this PR).

Draft for now — flipping out of draft once the CI profiling comment lands and
the run is reviewed.

Oppen · 2026-06-25T21:32:04Z

/profile_recursion

Oppen · 2026-06-25T21:52:24Z

Note: `/profile_recursion` won't run until this merges

The /profile_recursion comment above is currently inert, and manual
workflow_dispatch also fails (HTTP 404: workflow profile-recursion.yml not found on the default branch).

This is a GitHub Actions constraint, not a bug in the workflow: both
issue_comment and workflow_dispatch triggers are resolved from the copy of
the workflow file on the default branch (main). profile-recursion.yml is
new in this PR and not yet on main, so neither trigger can find it.

We can't land it via a small separate PR either — the workflow runs
test_recursion_pc_histogram / test_recursion_pc_histogram_multiquery, which
depend on the no_std verifier-guest build and the multi-query plumbing
introduced in this PR. On main alone the job would fail at the test step.

So the profiling will start working after this PR merges to main. At that
point either path triggers it:

comment /profile_recursion on any PR, or
gh workflow run profile-recursion.yml --ref feat/recursion-guest (dispatch
reads the trigger config from main but runs against the given ref).

Until then, the per-function profiles can be reproduced locally:

cargo test -p lambda-vm-prover --lib test_recursion_pc_histogram -- --ignored --nocapture
cargo test -p lambda-vm-prover --lib test_recursion_pc_histogram_multiquery -- --ignored --nocapture

(numbers are in the PR description).

Oppen · 2026-06-26T15:48:53Z

/ai-review

Oppen · 2026-06-26T15:48:57Z

/bench

claude · 2026-06-26T15:53:32Z

                        .iter()
                        .find(|(pb, _)| *pb == config.page_base)
                        .map(|(_, c)| *c)
+                        .or_else(|| vkey.map(|vk| vk.pages[index]))


Medium — panic on malformed vkey. vk.pages[index] indexes by the full page_configs position, assuming vk.pages.len() == page_configs.len(). In the recursion path the vkey is decoded from the (untrusted) private-input blob and passed straight through verify_with_options_with_vkey → new_with_vkey with no length validation. A vkey whose pages Vec is shorter than the reconstructed page_configs causes an out-of-bounds panic instead of a clean verification failure — i.e. a panic on attacker-influenced input in a verifier path, rather than the graceful "Fiat-Shamir rejects it" the docstring promises.

Prefer a non-panicking fallback that degrades to recomputation:

Suggested change

.or_else(|| vkey.map(|vk| vk.pages[index]))

.or_else(|| vkey.and_then(|vk| vk.pages.get(index).copied()))

claude · 2026-06-26T15:53:34Z


+use alloc::vec;
+use alloc::vec::Vec;
+#[cfg(feature = "prove")]


Minor/consistency: this gates std::collections::HashMap behind #[cfg(feature = "prove")], but prove does not imply std (only parallel/debug-checks/instruments pull in std). A --no-default-features --features prove build would fail to compile here. page.rs handles the same pattern with #[cfg(feature = "prove")] use hashbrown::HashMap; — worth aligning so the gating is genuinely no_std-safe rather than relying on prove and std always co-occurring in practice.

claude · 2026-06-26T15:53:53Z

Review summary

Reviewed PR #718 (baseline recursion benches + no_std plumbing). The bulk is mechanical and low-risk: no_std+alloc gating across ecsm/stark/executor/prover (std→core, use alloc::*), prove-gating of trace-builder code, dep swaps (serde_cbor→minicbor-serde, optional sysinfo, addr2line dev-dep), the bench guests, and the /profile_recursion workflow. These look correct and self-consistent.

Safety/correctness

Medium — panic on malformed vkey (prover/src/lib.rs, new_with_vkey page loop): vk.pages[index] assumes vk.pages.len() == page_configs.len(), but the vkey is decoded from the untrusted private-input blob and threaded through verify_with_options_with_vkey with no length check. A short pages Vec panics instead of failing verification cleanly. Inline suggestion to use .get(index) and fall back to recomputation. Impact is limited (DoS / failed verify, not a soundness break — Fiat-Shamir still binds the real commitments), but a verifier shouldn't panic on attacker-influenced input.
Low — no_std gating inconsistency (prover/src/tables/register.rs): std::collections::HashMap gated on prove rather than std; harmless under current feature combos but inconsistent with page.rs (hashbrown). Inline note.

Things I checked that are fine

options.rs libm shim (sqrt/log2/ceil) is numerically equivalent to the std f64 methods.
MAX_PRIVATE_INPUT_SIZE bump 6.7 MiB → 64 MiB: the num_private_input_pages bound in verify_with_options scales with it and stays finite (~256 pages), so no DoS regression.
vkey precedence (caller param → vkey → recompute) for decode/register/keccak_rc/pages is consistent between new_with_vkey and precomputed_commitment_cached, and the Fiat-Shamir tamper-detection argument holds.
Bench-guest unsafe (volatile private-input read, ecall asm, allocator init) is standard for these guests and scoped to test/bench code.

No blocking issues; the vkey index panic is the one worth addressing.

- Make crypto/stark and executor compileable without std, gated by a new default-on std feature - Add a prove cargo feature to lambda-vm-prover so the verify path can compile without pulling in the executor crate

- Build sampled flamegraph - Build PC histogram w/addr2line symbol resolution - Per-step cycle tracker for recursion - SP1 host+guest crate that compiles lambda-vm's `verify_with_options` - Cache preprocessed commitments for page.table, bitwise, decode, register and keccak_rc - Deserialize only guest - Verifier guest w/parameterized blowup+FRI queries, instanced for (2,1) & (73,8) - /profile_recursion CI workflow

Oppen · 2026-06-26T23:04:15Z

Split into functional PRs. Also this carried an optimization that doesn't belong to the bringup.

claude Bot reviewed Jun 26, 2026

View reviewed changes

nicole-graus added 2 commits June 26, 2026 15:37

feat: lambda-vm-prover no_std guest builds

2ba8a9b

- Make crypto/stark and executor compileable without std, gated by a new default-on std feature - Add a prove cargo feature to lambda-vm-prover so the verify path can compile without pulling in the executor crate

Oppen force-pushed the feat/recursion-guest branch from f0b1140 to e0f419c Compare June 26, 2026 18:39

Oppen closed this Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: baseline recursion benches#718

perf: baseline recursion benches#718
Oppen wants to merge 2 commits into
mainfrom
feat/recursion-guest

Oppen commented Jun 25, 2026

Uh oh!

Oppen commented Jun 25, 2026

Uh oh!

Oppen commented Jun 25, 2026

Uh oh!

Oppen commented Jun 26, 2026

Uh oh!

Oppen commented Jun 26, 2026

Uh oh!

claude Bot Jun 26, 2026

Uh oh!

claude Bot Jun 26, 2026

Uh oh!

claude Bot commented Jun 26, 2026

Uh oh!

Oppen commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	.or_else(\|\| vkey.map(\|vk\| vk.pages[index]))
	.or_else(\|\| vkey.and_then(\|vk\| vk.pages.get(index).copied()))

Uh oh!

Conversation

Oppen commented Jun 25, 2026

Summary

Changes on top of Nicole's work

Measured (local)

Testing

Uh oh!

Oppen commented Jun 25, 2026

Uh oh!

Oppen commented Jun 25, 2026

Note: /profile_recursion won't run until this merges

Uh oh!

Oppen commented Jun 26, 2026

Uh oh!

Oppen commented Jun 26, 2026

Uh oh!

claude Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Jun 26, 2026

Review summary

Uh oh!

Oppen commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Note: `/profile_recursion` won't run until this merges