perf(stark): fuse composition half-extension onto coset_lde_full (precomputed twiddles)#700
Conversation
decompose_and_extend_d2's extend_half_to_lde did iFFT(g²) → coefficient Polynomial → evaluate_polynomial_on_lde_domain(g) as two separate FFTs with an intermediate coefficient allocation per half. Replace with a single fused coset_lde_full: iFFT(n) → coset reshift g²→g → forward FFT(2n=lde_size). The weights (g⁻ʲ/n, folding the 1/n iFFT normalization and the net g²→g shift) and the inverse twiddles (size lde_size/2) are precomputed once per domain in LdeTwiddles (the forward FFT reuses the existing fwd twiddles), and threaded through prove_rounds_2_to_4 → round_2 → decompose_and_extend_d2 — no per-call recomputation. This path is now production (degree-3 tables use the 2-part decompose_and_extend_d2 after #699). Byte-identical: test_decompose_and_extend_d2_matches_original (decompose output == original break_in_parts path), a new formula test, stark 130/130, real VM proof (fib_iterative_1200k) prove+verify OK, clippy + fmt clean.
|
/bench 5 |
Benchmark — ethrex 20 transfers (median of 3)Table parallelism: auto (cores / 3)
Commit: 7e9b048 · Baseline: cached · Runner: self-hosted bench |
|
/bench |
|
/bench |
|
/bench-growth |
|
/review-ai |
Codex Code ReviewFindings
No build or test commands were run, per the static-review constraints. |
|
Reviewed the fused composition half-extension. Looks correct and well-tested. Correctness (verified): The fused path runs an unnormalized iFFT ( Tests: Wiring: all callers of One Low-severity note left inline about |
AI ReviewPR #700 · 2 changed files Findings
Status column reflects the verdict from the verifier: deepseek-verifier (openrouter/deepseek/deepseek-v4-pro). AI-003: Unconditional `comp_inv`/`comp_fwd` twiddles add ~3·lde_size field elements per `LdeTwiddles` in release builds, even when only used for `number_of_parts == 2`
Claim The new Evidence
Suggested fix Either gate Reviewer Lanes
Verification Lanes
Native Codex and Claude reviews run separately and post their own comments. They are not included in this structured provenance report. Discarded candidates (6) — rejected by the verifier
Raw lane outputs, candidates, final issues, and model metrics are uploaded as workflow artifacts. |
|
/bench |
What
Follow-up to #699. Now that degree-3 tables take the 2-part
decompose_and_extend_d2path, fuse its innerextend_half_to_lde.It previously did the g²→g coset extension as two separate FFTs with an intermediate coefficient
Polynomial:Replaced with a single fused
coset_lde_full: iFFT(n) → coset reshift g²→g → forward FFT(2n) in one pass, no intermediate coefficient Vec.Precomputation
The weights
g⁻ʲ/n(folding the 1/n iFFT normalization and the net g²→g shift) and the inverse twiddles (sizelde_size/2) are precomputed once per domain inLdeTwiddles(the forward FFT reuses the existingfwdtwiddles), and threaded throughprove_rounds_2_to_4 → round_2 → decompose_and_extend_d2 → extend_half_to_lde. No per-call recomputation (previously the weights + both twiddle sets were rebuilt on each of the 2 halves × every table).Byte-identical
Pure refactor of how the half-extension is computed — output unchanged.
test_decompose_and_extend_d2_matches_original:decompose_and_extend_d2's output equals the originalinterpolate_offset_fft + break_in_parts + evaluate_polynomial_on_lde_domainpath.composition_extend_half_fused_matches_reference: the fused weights/twiddles reproduce the reference iFFT(g²)→FFT(g) byte-for-byte across sizes.starklib 130/130; real VM proof (fib_iterative_1200k) prove+verify OK; clippy + fmt clean.