perf(ecsm): replace num-bigint with crypto-bigint#716
Draft
Oppen wants to merge 6 commits into
Draft
Conversation
AffinePoint coordinates, the scalar k, and all quotients in the witness
are now fixed-size crypto-bigint types (U256/U512/U1024) instead of
heap-allocated BigUint/BigInt.
- curve.rs: AffinePoint{x,y} and lambda are U256; k256 bridge uses
to_be_bytes()/from_be_slice(); msb_position and schedule use
bits_vartime()/bit_vartime()
- lib.rs: p(), n() return U256; prepare() uses U256; to_le_32() removed
(callers use .to_le_bytes() directly)
- witness.rs: quotient computation uses U1024 intermediates (needed
because 3p² exceeds U512); shifted_quotient takes explicit pos/neg
U1024 terms and an offset of r²; carries and limb arrays unchanged
- tests: rewritten without num-bigint; reference_field.rs uses
DynResidue for Fp arithmetic; prover ecdas_tests.rs adapted to U256
Collaborator
Author
|
/bench |
Benchmark — ethrex 20 transfers (median of 3)Table parallelism: auto (cores / 3)
Commit: cb14979 · Baseline: built from main · Runner: self-hosted bench |
- p() and n() are now const fn (using from_be_hex) - P_SQ (p²) and THREE_P_SQ (3p²) are U1024 constants, eliminating the runtime square_wide + widening that happened on every compute_witness call - shifted_quotient takes pos/neg as U1024; callers widen via w1024() only where needed (products stay as U512, only sums that could overflow 2^512 are widened before addition) - Test p_sq_constants_match_computed verifies P_SQ and THREE_P_SQ against runtime computation of p.square_wide()
…Fp params - crypto-bigint 0.5.5 → 0.7.5, k256 0.13 → 0.14.0-rc.14 - mul_wide → widening_mul, square_wide → widening_square - to_le_bytes() now returns EncodedUint; add .into() at call sites - EncodedPoint/FieldElement no longer at k256 root; bridge in curve.rs replaced with Sec1Point / FromSec1Point / ToSec1Point - DynResidue/DynResidueParams → FixedMontyForm/FixedMontyParams in 0.7.5 - Fp in curve.rs and reference_field.rs now use const_monty_params! macro, computing Montgomery reduction constants (R², R³, mod_neg_inv) at compile time rather than on every field operation
…_quotient The shifted quotient r + num/p has num < p² < 2^512 and r ≤ 3p < 2^258. Using I576 = Int<9> (9 × 64-bit limbs) rather than U1024 (16 limbs) halves the limb count for the division, reducing schoolbook div work from 16-limb to 9-limb arithmetic. - shifted_quotient now takes pos/neg as Uint<9> and computes num = pos - neg directly as a signed I576, divides by p as NonZero<Int<5>> (p < 2^320), then adds r_offset (p or 3p as Uint<5>) to produce the positive quotient - All intermediate sums that can exceed 2^512 (2λyA, 3xA²) are now computed in Uint<9> to avoid overflow; U512 is only used for single products < p² - P_SQ / THREE_P_SQ constants removed (no longer needed) - R_BYTES (3p, 33 LE bytes) used directly to construct the Uint<5> offset
- Hoist every per-step modulus operand to a compile-time const: p, 3p, p as NonZero<Int<5>>, the U512 divisor, and the zero-extended p/3p/b limb arrays. build_step(s) now takes no runtime modulus parameters. - Merge shifted_quotient + to_le_33 into a direct [u8;33] return, dropping the intermediate U512 round-trip. - Replace Fermat pow(p-2) inversion with crypto-bigint safegcd invert(). - Materialize p/n as const P and const N (single compile-time eval). - Shrink the convolution/term/ext limb arrays from i128 to i32 (terms stay below 2^25; ~64x headroom), cutting build_step live limb working set ~4x (~8KiB to ~2KiB). Public carry arrays stay [i64;64]. - Use checked_div_rem_vartime (witness gen is already variable-time).
…rsions) - batch_invert: acc *= / inv *= instead of x = x * y - msb_position: drop redundant u32 cast on bits_vartime() - drop useless .into() on checked_div_rem quotient (already Int<9>)
Collaborator
Author
|
/bench |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
num-bigint/num-traitswithcrypto-bigint(already a transitive dep viaelliptic-curve) in theecsmcrateAffinePointcoordinates and the scalarkare nowU256(fixed-size, no heap allocation)U1024intermediates — necessary because the shifted-quotient offset3p²exceeds U512num-bigint; test-onlyreference_field.rsusesDynResiduefor modular Fp arithmeticTest plan
cargo test -p ecsm— 15/15 passmake test— all non-CUDA tests pass (CUDA failures are pre-existing on this machine)/benchon CI to measure impact