Skip to content

perf: baseline recursion benches#718

Closed
Oppen wants to merge 2 commits into
mainfrom
feat/recursion-guest
Closed

perf: baseline recursion benches#718
Oppen wants to merge 2 commits into
mainfrom
feat/recursion-guest

Conversation

@Oppen

@Oppen Oppen commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Baseline benchmarking/profiling infrastructure for the naive recursion verifier
guest (the STARK verifier compiled to RISC-V and run inside the VM).

The bulk of this branch — the recursion guest itself, the VmVerifyingKey
preprocessed-commitment caching, and the deserialize-only / per-step / page-count
diagnostics — is Nicole Graus' work (19 commits). This PR adds work on top of
that foundation (4 commits, Mario Rugiero):

Changes on top of Nicole's work

  • Make the verifier guest build no_std. ecsm/executor/stark/prover
    now compile for the riscv64im-lambda-vm-elf target: ecsm gains
    no_std + alloc (numeric deps behind a std feature, core::error::Error),
    the executor VM modules are un-gated from std (the PRINT syscall stays
    std-only), and prover-only trace-building code (CpuOperation::from_log,
    collect_ecsm_ops, collect_bitwise_from_keccak, …) is gated behind prove
    so the verifier guest never compiles it. sysinfo becomes optional
    (disk-spill only) so guests drop its std pull, and the obsolete
    keccak-patched [patch.crates-io] hack is removed from both guests.
  • Resolve histogram PCs via the addr2line crate (dev-dependency, 0.27)
    instead of printing raw addresses for the user to pipe through the addr2line
    binary. Each histogram now prints a per-function summary (cycles folded
    over each function's PCs across the full histogram) plus the per-address
    detail, via a shared print_pc_histogram helper.
  • Add a multi-query recursion histogram. The PC-histogram test is
    parameterized on ProofOptions; blowup_factor and fri_number_of_queries
    are coupled (the query count is JBR-derived from blowup for a fixed security
    target), so the single- and multi-query tests differ only in that config:
    • test_recursion_pc_histogram — blowup=2, 1 query (cheap baseline)
    • test_recursion_pc_histogram_multiquery — blowup=8, 128-bit → ~73 queries
    • Fixing this surfaced a stale 6.7 MiB MAX_PRIVATE_INPUT_SIZE hardcoded in
      vm/memory.rs that rejected the larger multi-query inner-proof blob; it now
      re-exports the canonical 64 MiB constant from constants (same fix
      perf-integrate needed after rebasing onto main).
  • Vkey plumbing fix: thread the optional decode/page commitments through
    VmAirs::new_with_vkey / verify_with_options_with_vkey, and consult the
    vkey's cached decode/register/per-page roots during AIR construction
    (precedence: explicit caller param → vkey → recompute). This also fixed the
    vkey tamper-detection tests.
  • /profile_recursion CI workflow (mirrors the /bench comment trigger): a
    /profile_recursion comment from a repo member runs both histogram configs in
    parallel via a matrix on the self-hosted bench runner and posts a single
    combined per-function profile comment, parsed by
    .github/scripts/aggregate_recursion_histogram.py.

Measured (local)

Config Total cycles Top function
deserialize-only (blowup=2, 1q) 21.4M TLSF alloc 23.5%
recursion single-query (blowup=2, 1q) 118M keccak::keccak_p 22%
recursion multi-query (blowup=8, ~73q) 3.27B keccak::keccak_p 37%, verifier reconstruct_deep_composition 10%

As FRI queries scale, per-query verifier work (Keccak transcript hashing,
deep-composition reconstruction, extension-field mul) comes to dominate the
fixed setup cost (decode/allocator).

Testing

  • make test is green except the GPU/CUDA tests (no local libcuda).
  • All three histogram diagnostics (#[ignore]d) run and were executed locally;
    numbers above.
  • CI matrix / combined-comment path is validated end-to-end on captured output,
    but its GitHub trigger only exercises once a /profile_recursion comment is
    posted (done on this PR).

Draft for now — flipping out of draft once the CI profiling comment lands and
the run is reviewed.

@Oppen

Oppen commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

/profile_recursion

@Oppen

Oppen commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Note: /profile_recursion won't run until this merges

The /profile_recursion comment above is currently inert, and manual
workflow_dispatch also fails (HTTP 404: workflow profile-recursion.yml not found on the default branch).

This is a GitHub Actions constraint, not a bug in the workflow: both
issue_comment and workflow_dispatch triggers are resolved from the copy of
the workflow file on the default branch (main). profile-recursion.yml is
new in this PR and not yet on main, so neither trigger can find it.

We can't land it via a small separate PR either — the workflow runs
test_recursion_pc_histogram / test_recursion_pc_histogram_multiquery, which
depend on the no_std verifier-guest build and the multi-query plumbing
introduced in this PR. On main alone the job would fail at the test step.

So the profiling will start working after this PR merges to main. At that
point either path triggers it:

  • comment /profile_recursion on any PR, or
  • gh workflow run profile-recursion.yml --ref feat/recursion-guest (dispatch
    reads the trigger config from main but runs against the given ref).

Until then, the per-function profiles can be reproduced locally:

cargo test -p lambda-vm-prover --lib test_recursion_pc_histogram -- --ignored --nocapture
cargo test -p lambda-vm-prover --lib test_recursion_pc_histogram_multiquery -- --ignored --nocapture

(numbers are in the PR description).

@Oppen

Oppen commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

/ai-review

@Oppen

Oppen commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

/bench

Comment thread prover/src/lib.rs
.iter()
.find(|(pb, _)| *pb == config.page_base)
.map(|(_, c)| *c)
.or_else(|| vkey.map(|vk| vk.pages[index]))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — panic on malformed vkey. vk.pages[index] indexes by the full page_configs position, assuming vk.pages.len() == page_configs.len(). In the recursion path the vkey is decoded from the (untrusted) private-input blob and passed straight through verify_with_options_with_vkeynew_with_vkey with no length validation. A vkey whose pages Vec is shorter than the reconstructed page_configs causes an out-of-bounds panic instead of a clean verification failure — i.e. a panic on attacker-influenced input in a verifier path, rather than the graceful "Fiat-Shamir rejects it" the docstring promises.

Prefer a non-panicking fallback that degrades to recomputation:

Suggested change
.or_else(|| vkey.map(|vk| vk.pages[index]))
.or_else(|| vkey.and_then(|vk| vk.pages.get(index).copied()))


use alloc::vec;
use alloc::vec::Vec;
#[cfg(feature = "prove")]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor/consistency: this gates std::collections::HashMap behind #[cfg(feature = "prove")], but prove does not imply std (only parallel/debug-checks/instruments pull in std). A --no-default-features --features prove build would fail to compile here. page.rs handles the same pattern with #[cfg(feature = "prove")] use hashbrown::HashMap; — worth aligning so the gating is genuinely no_std-safe rather than relying on prove and std always co-occurring in practice.

@claude

claude Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review summary

Reviewed PR #718 (baseline recursion benches + no_std plumbing). The bulk is mechanical and low-risk: no_std+alloc gating across ecsm/stark/executor/prover (std→core, use alloc::*), prove-gating of trace-builder code, dep swaps (serde_cborminicbor-serde, optional sysinfo, addr2line dev-dep), the bench guests, and the /profile_recursion workflow. These look correct and self-consistent.

Safety/correctness

  • Medium — panic on malformed vkey (prover/src/lib.rs, new_with_vkey page loop): vk.pages[index] assumes vk.pages.len() == page_configs.len(), but the vkey is decoded from the untrusted private-input blob and threaded through verify_with_options_with_vkey with no length check. A short pages Vec panics instead of failing verification cleanly. Inline suggestion to use .get(index) and fall back to recomputation. Impact is limited (DoS / failed verify, not a soundness break — Fiat-Shamir still binds the real commitments), but a verifier shouldn't panic on attacker-influenced input.
  • Low — no_std gating inconsistency (prover/src/tables/register.rs): std::collections::HashMap gated on prove rather than std; harmless under current feature combos but inconsistent with page.rs (hashbrown). Inline note.

Things I checked that are fine

  • options.rs libm shim (sqrt/log2/ceil) is numerically equivalent to the std f64 methods.
  • MAX_PRIVATE_INPUT_SIZE bump 6.7 MiB → 64 MiB: the num_private_input_pages bound in verify_with_options scales with it and stays finite (~256 pages), so no DoS regression.
  • vkey precedence (caller param → vkey → recompute) for decode/register/keccak_rc/pages is consistent between new_with_vkey and precomputed_commitment_cached, and the Fiat-Shamir tamper-detection argument holds.
  • Bench-guest unsafe (volatile private-input read, ecall asm, allocator init) is standard for these guests and scoped to test/bench code.

No blocking issues; the vkey index panic is the one worth addressing.

- Make crypto/stark and executor compileable without std, gated by a new default-on std feature
- Add a prove cargo feature to lambda-vm-prover so the verify path can compile without pulling in the executor crate
- Build sampled flamegraph
- Build PC histogram w/addr2line symbol resolution
- Per-step cycle tracker for recursion
- SP1 host+guest crate that compiles lambda-vm's `verify_with_options`
- Cache preprocessed commitments for page.table, bitwise, decode,
  register and keccak_rc
- Deserialize only guest
- Verifier guest w/parameterized blowup+FRI queries, instanced for (2,1)
  & (73,8)
- /profile_recursion CI workflow
@Oppen Oppen force-pushed the feat/recursion-guest branch from f0b1140 to e0f419c Compare June 26, 2026 18:39
@Oppen

Oppen commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

Split into functional PRs. Also this carried an optimization that doesn't belong to the bringup.

@Oppen Oppen closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants