IxVM kernel: unblock ByteArray.utf8DecodeChar?_utf8EncodeChar_append#451
Merged
Conversation
76a4267 to
6bf777f
Compare
Aiur op `unconstrained_big_uint_div_mod(a: KLimbs, b: KLimbs) -> (KLimbs, KLimbs)`:
runtime computes (q, r) via `num_bigint::BigUint::div_rem` and inserts
both result lists into `memory[10]` content-addressed with multiplicity 0;
no constraints emitted by the op itself. Wired through
bytecode/execute/constraints/trace + FFI decoder +
Source/Typed/Simple/Concrete/Bytecode stages + Meta syntax +
Compiler/{Check, Concretize, Simple, Match, Lower, Layout}. Lean reference
evaluators (Interpret/SourceEval/BytecodeEval) are stubbed with TODOs;
src/aiur/execute.rs is the authoritative impl.
Primitive.lean: `klimbs_div_mod` now reads (q, r) from the op then verifies
`klimbs_normalize(q*b + r) == klimbs_normalize(a)` and (when b != 0) the
strict bound `r < b` via `klimbs_le(klimbs_succ(r), b) == 1`. Drops
recursive `klimbs_div_mod_go`: its 2^24-step memo growth on (2^32, 256)
was the klimbs OOM driver.
`lake exe check '_private.Init.Data.String.Decode.0.ByteArray.utf8DecodeChar?.helper₃'`
with `ulimit -v 20000000`: previously OOM (~1.2 GB single allocation
inside klimbs_div_mod_go); now passes at 40_381_441_414 FFT.
The `r < b` check was the variant-shopping axis (all three sound):
* `klimbs_le(r,b)==1 ∧ klimbs_eq(r,b)==0`: 40_381_441_414 FFT
* `klimbs_le(klimbs_succ(r), b) == 1`: 40_381_437_756 FFT (this commit)
* `klimbs_le(b, r) == 0`: 40_381_440_143 FFT
Collapse N consecutive `expr_inst1` rewalks of a residual `cod` into one `expr_inst_many` per spine step on the ORIGINAL `dom`, with a growing innermost-first subs list. `collect_spine` once, `k_infer` the head once, then `k_infer_app_spine_loop` peels the Forall telescope arg-by-arg. On non-Forall load (only reachable when the head_ty itself wasn't a Forall syntactically), materialise via `whnf` on the concretised t (= prior subs applied to original t) — post-whnf dom/cod live in the OUTER context, so the descent restarts with subs_rev = [a]. `lake exe check`: * Nat.add_comm: 55_504_714 → 54_670_728 FFT (-1.5%) * `_private....utf8DecodeChar?.helper₃`: 40_381_437_756 → 38_696_601_688 FFT (-4.2%) k_infer_only retains the per-arg expr_inst1 path; the fusion is local to k_infer for now.
`list_lookup(list, idx)` previously walked the list inline (Cons + match + recurse). Replacing the body with `head(load(list_drop(list, idx)))` lets `list_drop`'s memo dedup sublist pointers — content-addressed intermediates collapse across all lookups that pass through them. `lake exe check`: * Nat.add_comm: 54_670_728 → 54_350_107 FFT (-0.6%) * `_private....utf8DecodeChar?.helper₃`: 38_696_601_688 → 37_793_682_359 FFT (-2.3%)
`klimbs_to_ctor_form` recursively unfolded a Nat literal `n` into the full ctor chain `Nat.succ^n(Nat.zero)`. For large `n` the recursion created O(n) memo entries (one per `klimbs_dec`, one per `klimbs_normalize`, one per the recursive `klimbs_to_ctor_form` call) and OOM'd the per-fn function_queries before completing. Rust's `nat_to_constructor` (src/ix/kernel/whnf.rs:1664-1687) does the right thing: produce `Nat.succ(Lit(n-1))` — a single ctor wrap whose predecessor stays a `Lit(Nat)`. Subsequent iota reductions re-trigger expansion only as needed. Mirror that semantics in Aiur. `lake exe check '_private.Init.Data.String.Decode.0.ByteArray.utf8DecodeChar?.<bomb>'` with `ulimit -v 20000000`: * assemble₂._proof_1: OOM → 11_936_582_540 FFT * assemble₃._proof_3: OOM → 11_934_210_486 FFT * assemble₃._proof_4: OOM → 11_968_289_150 FFT * assemble₄_eq_some_iff_..._proof_1_13: OOM → 8_510_124_580 FFT * helper₃: 37_793_684_978 → 16_870_284_540 FFT (-55.4%) `lake exe check 'ByteArray.utf8DecodeChar?_utf8EncodeChar_append'` with `ulimit -v 22000000`: passes at 43_544_984_712 FFT (the full theorem, which was the original UTF-8 blocker). Tests/Ix/IxVM.lean: 18 small FFT pin increases — every const whose WHNF path now eagerly synthesizes a `Lit(n-1)` predecessor (one extra `mk_nat_lit` per ctor-form coercion instead of zero in the old recursive body that walked the whole chain anyway).
6bf777f to
65ff3f5
Compare
ByteArray.utf8DecodeChar?_utf8EncodeChar_append
gabriel-barrett
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Unblocks the full UTF-8 round-trip theorem
ByteArray.utf8DecodeChar?_utf8EncodeChar_append(the original blocker) — kernel-checks at 43_544_995_360 FFT underulimit -v 22000000. Along the way, drops cost across the rest of the Nat-heavy kernel surface. Five commits, each independent, each measured.Commits
1.
a25e94e— nativeunconstrained_big_uint_div_modAiur opNew Aiur op
unconstrained_big_uint_div_mod(a: KLimbs, b: KLimbs) -> (KLimbs, KLimbs). The runtime computes(q, r)vianum_bigint::BigUint::div_remand inserts both result lists intomemory[10]content-addressed with multiplicity 0 — no constraints emitted by the op itself. Wired through bytecode / execute / constraints / trace + FFI decoder + Source / Typed / Simple / Concrete / Bytecode stages + Meta syntax + Compiler / {Check, Concretize, Simple, Match, Lower, Layout}. Lean reference evaluators (Interpret,SourceEval,BytecodeEval) are stubbed with TODOs;src/aiur/execute.rsis the authoritative impl.Kernel/Primitive.lean::klimbs_div_modnow reads(q, r)from the op then verifiesklimbs_normalize(q*b + r) == klimbs_normalize(a)and (whenb != 0) the strict boundr < bviaklimbs_le(klimbs_succ(r), b) == 1. Drops recursiveklimbs_div_mod_go: its 2^24-step memo growth on(2^32, 256)was the klimbs OOM driver for the UTF-8 decode helpers.Soundness on prover-supplied bytes: every limb byte flows through
u8_mulinsideklimbs_mul(q, b)andu8_addinsideklimbs_add(q*b, r), both of which push to theu8_range_checkchannel (src/aiur/gadgets/bytes2.rs), so a byte > 255 fails the gadget's range check. Trailing junk limbs that mul doesn't touch are caught by the post-normalize equality.r < bvariant shopping (all three sound, measured onhelper₃):klimbs_le(r,b)==1 ∧ klimbs_eq(r,b)==0klimbs_le(klimbs_succ(r), b) == 1(this PR)klimbs_le(b, r) == 0Spread is ~3.6K FFT (noise-tier) but the succ-then-le variant is the consistent winner.
Per-commit deltas vs
main:Nat.add_commhelper₃2.
1141f2d—k_inferApp-spine fusionCollapses N consecutive
expr_inst1rewalks of a residualcodinto oneexpr_inst_manyper spine step on the ORIGINALdom, with a growing innermost-first subs list.collect_spineonce,k_inferthe head once, thenk_infer_app_spine_looppeels the Forall telescope arg-by-arg.On non-Forall
load(only reachable whenhead_tyitself wasn't a Forall syntactically), materialise viawhnfon the concretisedt(= prior subs applied to originalt) — post-whnfdom/codlive in the OUTER context, so the descent restarts withsubs_rev = [a].k_infer_onlyretains the per-argexpr_inst1path; the fusion is local tok_inferfor now (k_infer_onlysaw a marginal-but-noisy delta when fused; not worth the extra surface area until there's a workload that exercises it more).Per-commit deltas vs the previous commit:
Nat.add_commhelper₃3.
1c148db— routelist_lookupthroughlist_droplist_lookup(list, idx)previously walked the list inline (Cons+ match + recurse). Replacing the body withhead(load(list_drop(list, idx)))letslist_drop's memo dedup sublist pointers — content-addressed intermediates collapse across all lookups that pass through them.Per-commit deltas vs the previous commit:
Nat.add_commhelper₃4.
345999a— fixklimbs_to_ctor_formto single-step expansionklimbs_to_ctor_formrecursively unfolded a Nat literalninto the full ctor chainNat.succ^n(Nat.zero). For largenthe recursion created O(n) memo entries (one perklimbs_dec, one perklimbs_normalize, one per the recursiveklimbs_to_ctor_formcall) and OOM'd the per-fnfunction_queriesbefore completing — the second OOM driver behind the UTF-8 round-trip blocker.Rust's
nat_to_constructor(src/ix/kernel/whnf.rs:1664-1687) does the right thing: produceNat.succ(Lit(n-1))— a single ctor wrap whose predecessor stays aLit(Nat). Subsequent iota reductions re-trigger expansion only as needed. This commit mirrors that semantics in Aiur.This is the commit that unblocks the full UTF-8 round-trip theorem.
Per-commit deltas vs the previous commit:
Nat.add_commhelper₃ByteArray.utf8DecodeChar?_utf8EncodeChar_append5.
65ff3f5— re-pinTests/Ix/IxVM.leanFFT costsPin-only commit reflecting the cumulative improvement across the prior four.
lake test -- --ignored ixvmgreen.Cumulative impact
Cumulative
lake exe checkdeltas,main→ HEAD:Nat.add_commNat.decLeNat.sub_le_of_le_addArray.append_assocVector.appendIxVMPrim.nat_gcd_litIxVMPrim.nat_land_lithelper₃(_private....utf8DecodeChar?.helper₃)ByteArray.utf8DecodeChar?_utf8EncodeChar_appendTest plan
lake buildcleanlake test -- --ignored ixvmgreen (incl. all 41 kernel-check FFT pins)lake exe check 'ByteArray.utf8DecodeChar?_utf8EncodeChar_append'passes at 43_544_995_360 FFT (ulimit -v 22000000)