Skip to content

refactor(stark): unify & clean up the commitment layer#691

Open
diegokingston wants to merge 14 commits into
mainfrom
refactor/commitment-module
Open

refactor(stark): unify & clean up the commitment layer#691
diegokingston wants to merge 14 commits into
mainfrom
refactor/commitment-module

Conversation

@diegokingston

@diegokingston diegokingston commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

What

Cleans up and unifies the Merkle-tree / bit-reversed commitment layer of the STARK prover, which was duplicated and entangled inside prover.rs. The commitment how (leaf layout, bit-reverse permutation, Keccak hashing, tree build) now lives in a dedicated module; the prover only orchestrates when to commit.

Commits

  1. refactor(stark): extract Merkle commitment into a commitment module — moves keccak_leaves_* + commit_* out of prover.rs into crypto/stark/src/commitment.rs, behind a single grouped core (keccak_leaves_bit_reversed_grouped / commit_bit_reversed). Keeps both serializations (as_bytes / write_bytes_be) and FieldElementPairBackend as-is.
  2. perf(stark): row-pair the trace commitment — the trace LDE now uses the same row-pair leaf layout (ROWS_PER_LEAF = 2) the composition poly already used: leaf i hashes bit-reversed rows 2i, 2i+1 (a FRI query's value + its symmetric counterpart), so one Merkle path authenticates both. ⚠️ This is a proof-format change (smaller trace authentication paths), not byte-identical — verifier updated to match (verify_opening_pair).
  3. refactor(stark): prover cleanup (A+D)par helpers at 7 sites, ROWS_PER_LEAF const, error propagation, doc/log fixes.
  4. docs(stark): fix commitment leaf-layout docs after the trace pairing.
  5. refactor(prover): dedup commit pipeline (B) — extract commit_plain<C> + spill_tree<C> on IsStarkProver, collapsing the near-duplicate main-trace / aux-trace plain-commit code.

Correctness

  • cargo test -p stark --lib: 128/128 · with --features disk-spill: 133/133.
  • Commits 1, 3, 4, 5 are byte-identical to the prior proof output (the test suite is the oracle). Commit 2 is the deliberate proof-format change (validated by the full prove→verify roundtrip suite).
  • clippy clean on default / disk-spill / instruments; builds on the combined feature set.
  • One profiling-only delta: under instruments, the aux commit's timing bucket now includes its (tiny) disk spill — matching what the main path already measured.

Follow-ups (not in this PR)

  • cuda: the GPU trace leaf+tree builders (gpu_lde) still emit 1-row leaves and must migrate to the row-pair layout before cuda builds pass. The default (non-cuda) build is fully verified here.
  • C (separate branch): decompose multi_prove into named phase helpers; move self-less utilities off the IsStarkProver trait.

Unifies the two near-identical bit-reversed leaf hashers (per-row + per-row-pair)
into one keccak_leaves_bit_reversed_grouped(columns, rows_per_leaf), and the two
near-identical commit fns (commit_columns_bit_reversed + commit_composition_polynomial)
into commit_bit_reversed(columns, rows_per_leaf), in a new crypto/stark/src/commitment.rs.
Removes them from the IsStarkProver trait (they used no self). prover.rs re-exports
the named leaf hashers for the math-cuda GPU parity test. Byte-identical (stark 128/128).
Every FRI query opens a value and its symmetric counterpart (LDE positions
2*iota, 2*iota+1). The composition-poly commitment already grouped that pair into
one leaf; the trace (main/aux/precomputed) committed one row per leaf and opened
TWO leaves (proof + proof_sym) per query. Commit the trace with rows_per_leaf=2
too, so each query opens ONE leaf/path; drop the now-redundant proof_sym from
PolynomialOpenings (also shrinks the composition opening, which stored the path
twice). verify_opening_pair now reconstructs the paired leaf and verifies once
(mirrors verify_composition_poly_opening); the dead verify_opening is removed.

Halves trace Merkle authentication-path data per query (smaller proofs + less
verifier hashing). stark 128/128 (prove+verify). NOTE: proof FORMAT change (not
byte-identical). cuda follow-up: the GPU trace leaf+tree builders
(gpu_lde::try_expand_leaf_and_tree_batched_keep/_ext3_keep + math-cuda kernels)
still build 1-row leaves and must switch to the row-pair pattern (the GPU
composition builder already pairs) or cuda proofs will fail verification.
The trace commitment now uses the row-pair leaf layout (ROWS_PER_LEAF=2), same as
composition; rows_per_leaf=1 is only kept for the GPU parity test. Update the
module/const/wrapper docs that still described the pre-pairing per-row trace.
Extract two helpers on IsStarkProver, collapsing the near-duplicate
main-trace and aux-trace commit code:

- spill_tree<C>: the identical-except-label disk-spill block, shared by
  the main / preprocessed-split / aux commit sites (4 call sites).
- commit_plain<C>: commit_bit_reversed + spill_tree + TableCommit::plain,
  shared by the main-trace (non-preprocessed None arm) and aux-trace
  plain-commit paths.

Proof output is byte-identical (stark 128/128, +disk-spill 133/133).
The only behavioral delta is in the instruments profiling feature: the
aux commit's timing bucket now includes its (tiny) disk spill, matching
what the main path already measured.

clippy clean on default / disk-spill / instruments; builds on the
combined feature set.
@diegokingston diegokingston changed the title Refactor/commitment module refactor(stark): unify & clean up the commitment layer Jun 19, 2026
The trace Merkle commitment moved to a row-pair leaf layout
(ROWS_PER_LEAF=2), but the 5 preprocessed-table commitment computers
(bitwise/keccak_rc/page/decode/register) still built a 1-row-per-leaf
tree manually, so the prover's row-pair precomputed root no longer
matched the computed/hardcoded one -> PrecomputedCommitmentMismatch at
prove time.

- Route all 5 compute_*commitment fns through the shared
  stark::commitment::commit_bit_reversed(.., ROWS_PER_LEAF), dropping the
  manual bit-reverse + columns2rows + 1-row BatchedMerkleTree::build.
- Regenerate the hardcoded bitwise/keccak_rc/zero_page static commitments
  for the row-pair layout (via compute_static_commitments).
- cargo fmt (prover.rs + touched tables).

Verified: static_commitments drift tests 5/5, stark 128/128, clippy +
fmt clean, no PrecomputedCommitmentMismatch. Full ELF prove/verify runs
in CI (guest artifacts absent locally).
…layout

The compile-time decode-commitment const for sub.elf shifted with the
row-pair preprocessed commitment; regenerated via commitment_from_elf
(the print_decode_commitment_for_sub regen path).
…air value

Regenerated from CI's sub.elf (local riscv toolchain unavailable);
removed the temporary print instrumentation.

@MauroToscano MauroToscano left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be merged, it breaks CUDA

@MauroToscano

Copy link
Copy Markdown
Contributor

/review-ai

@github-actions

Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

I reviewed the paired-leaf commitment changes across CPU, verifier, proof openings, CUDA leaf/tree paths, and preprocessed table recomputation. The prover/verifier index space and leaf byte ordering appear consistent. Static review only; I did not build or run tests per instructions.

@claude

claude Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review: commitment-layer unification

Reviewed the diff focusing on correctness of the proof-format change, safety, and the refactor. No bugs found — the change is internally consistent and well covered.

Correctness of the trace row-pair change (commit 2) — verified consistent
The deliberate proof-format change lines up cleanly across all three sides:

  • Leaf layout: keccak_leaves_bit_reversed_grouped(.., 2) ⇒ leaf i = cols[br(2i)] ‖ cols[br(2i+1)].
  • Prover open: proof = tree.get_proof_by_pos(challenge), evaluations = gather(br(2·challenge)), evaluations_sym = gather(br(2·challenge+1)).
  • Verifier verify_opening_pair: reconstructs evaluations ‖ evaluations_sym and verifies once at iota — exactly the committed leaf, and now identical to the pre-existing verify_composition_poly_opening path.

proof_sym removal is clean (no remaining references), and the wasteful proof.clone() in the composition open path is gone. Preprocessed-table static commitments and the SUB_DECODE_COMMITMENT_BLOWUP_2 test vector were regenerated to match ROWS_PER_LEAF=2, consistent with the prover's preprocessed-split commit and the verifier checks on precomputed/aux roots.

GPU path
Trace commit uses the _keep variants migrated to num_leaves = lde_size / ROWS_PER_LEAF, with new base/ext3 row-pair leaf-parity tests against the CPU helpers. The non-keep with_merkle_tree/with_leaf_hash variants correctly retain rows_per_leaf=1 (test-only callers). alloc_merkle_nodes requires num_leaves >= 2 (lde_size >= 4); below that it returns None and falls back to CPU — fine for real traces. (Note: the PR description's cuda follow-up claim that the GPU still emits 1-row leaves looks stale — this PR already migrates the GPU trace path.)

Refactor
The commitment module, par helpers, commit_plain/spill_tree dedup, and From<FFTError> propagation are behavior-preserving. par_map_collect preserves index order (correct for LDE columns / DEEP composition). keccak_leaves_bit_reversed (rows_per_leaf=1) is retained but still exercised by GPU parity tests, so not dead.

Minor (Low) — stale cross-crate doc comments reference now-renamed/moved CPU functions: crypto/math-cuda/src/merkle.rs:6 and src/lde.rs:598 and kernels/keccak.cu:163 point at commit_columns_bit_reversed in prover.rs (now commit_bit_reversed in commitment.rs); tests/comp_poly_tree.rs:3,86 reference the removed commit_composition_polynomial. Worth a quick rename pass so the parity docs stay accurate.

@github-actions

Copy link
Copy Markdown

AI Review

PR #691 · 19 changed files

Findings

Status Sev Location Finding Found by
confirmed low crypto/math-cuda/src/lde.rs:604 Dead/inconsistent public LDE+Keccak wrappers still build per-row leaves kimi
openrouter/moonshotai/kimi-k2.7-code
glm
openrouter/z-ai/glm-5.2
confirmed low crypto/math-cuda/src/merkle.rs:6 Stale doc comment references removed prover function kimi
openrouter/moonshotai/kimi-k2.7-code
confirmed low crypto/math-cuda/src/merkle.rs:32 Missing rows_per_leaf vs num_rows guard in public keccak leaf helpers kimi
openrouter/moonshotai/kimi-k2.7-code

Status column reflects the verdict from the verifier: deepseek-verifier (openrouter/deepseek/deepseek-v4-pro).

AI-001: Dead/inconsistent public LDE+Keccak wrappers still build per-row leaves
  • Status: confirmed
  • Severity: low
  • Location: crypto/math-cuda/src/lde.rs:604
  • Found by: kimi:openrouter/moonshotai/kimi-k2.7-code, glm:openrouter/z-ai/glm-5.2
  • Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
  • Rejected by: -

Claim

The public entry points coset_lde_batch_base_into_with_leaf_hash (line 604), coset_lde_batch_base_into_with_merkle_tree (line 632), coset_lde_batch_ext3_into_with_leaf_hash (line 862), and coset_lde_batch_ext3_into_with_merkle_tree (line 887) are not called anywhere in the repository and still pass rows_per_leaf=1. The rest of the prover now commits row-pairs (ROWS_PER_LEAF=2), so these wrappers would produce Merkle trees that are incompatible with verify_opening_pair if they were ever used again. They should be removed or updated to row-pair semantics.

Evidence

grep across the repo shows the only callers of these four symbols are their own definitions and doc-comments in crypto/math-cuda/src/lde.rs; gpu_lde.rs exclusively calls the _keep variants with rows_per_leaf=2. The PR added the rows_per_leaf parameter to the inner function and wired 2 only into the _keep wrappers, leaving the four non-_keep wrappers pinned to 1.

Suggested fix

Either delete the four unused non-_keep wrappers, or expose rows_per_leaf on them (and have callers pass ROWS_PER_LEAF) so the public API matches the prover's row-pair commitment. If kept as a per-row test path, document explicitly that they are test-only and not the prover's commitment layout.

AI-003: Stale doc comment references removed prover function
  • Status: confirmed
  • Severity: low
  • Location: crypto/math-cuda/src/merkle.rs:6
  • Found by: kimi:openrouter/moonshotai/kimi-k2.7-code
  • Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
  • Rejected by: -

Claim

The module doc comment says the GPU code matches commit_columns_bit_reversed at crypto/stark/src/prover.rs, but that function was removed and replaced by commit_bit_reversed in crypto/stark/src/commitment.rs. The stale reference will confuse future readers.

Evidence

Line 6 of crypto/math-cuda/src/merkle.rs contains the old name. prover.rs no longer defines commit_columns_bit_reversed.

Suggested fix

Update the comment to reference stark::commitment::commit_bit_reversed with rows_per_leaf=2.

AI-004: Missing rows_per_leaf vs num_rows guard in public keccak leaf helpers
  • Status: confirmed
  • Severity: low
  • Location: crypto/math-cuda/src/merkle.rs:32
  • Found by: kimi:openrouter/moonshotai/kimi-k2.7-code
  • Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
  • Rejected by: -

Claim

keccak_leaves_base and keccak_leaves_ext3 accept rows_per_leaf but only assert it is 1 or 2 and that num_rows is a power of two. They do not assert num_rows >= rows_per_leaf. With rows_per_leaf=2 and num_rows=1 (a valid power of two), the output buffer length becomes 0 and keccak_launch_cfg(0) produces a zero-grid launch. This is a malformed input path exposed by the newly added rows_per_leaf parameter.

Evidence

Functions assert rows_per_leaf == 1 || rows_per_leaf == 2 and num_rows.is_power_of_two(), then allocate (num_rows / rows_per_leaf) * 32 bytes and call keccak_launch_cfg(num_rows / rows_per_leaf). For num_rows=1, rows_per_leaf=2 this creates a zero-length device allocation and zero-block grid.

Suggested fix

Add assert!(num_rows >= rows_per_leaf, "num_rows must be at least rows_per_leaf") to both keccak_leaves_base and keccak_leaves_ext3.

Reviewer Lanes

Lane Model Prompt Status Findings
glm openrouter/z-ai/glm-5.2 general success 2
kimi openrouter/moonshotai/kimi-k2.7-code general success 3
minimax minimax/MiniMax-M3 general success 0
moonmath zro/minimax-m3 general success 0
nemotron openrouter/nvidia/nemotron-3-ultra-550b-a55b general success 0

Verification Lanes

Lane Model Status Confirmed Rejected Uncertain
deepseek-verifier openrouter/deepseek/deepseek-v4-pro success 3 1 0

Native Codex and Claude reviews run separately and post their own comments. They are not included in this structured provenance report.

Discarded candidates (1) — rejected by the verifier
  • keccak_leaves_bit_reversed (per-row CPU helper) is retained only for GPU parity tests but is otherwise unused by the prover (crypto/stark/src/commitment.rs:115, found by glm:openrouter/z-ai/glm-5.2) — The function keccak_leaves_bit_reversed at commitment.rs:115 is explicitly documented: 'The prover no longer commits per-row... this stays a named public function only so the GPU parity tests in dependent crates can compare the per-row code path.' It is actively used by keccak_leaves_base_matches_cpu and keccak_leaves_ext3_matches_cpu tests in crypto/math-cuda/tests/keccak_leaves.rs. The finding's own evidence acknowledges this documented purpose. Having a named, public, intentionally-kept test utility that is in fact used by tests is not a defect.

Raw lane outputs, candidates, final issues, and model metrics are uploaded as workflow artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants