spec: update constraint ID system by erik-3milabs · Pull Request #570 · yetanotherco/lambda_vm

erik-3milabs · 2026-04-28T13:22:15Z

This PR introduces content-derived constraint IDs, i.e., the ID of a constraint is derived from its content, rather than its index in the constraint list. This

permit addition, removing and reorganizing constraints without changing any IDs, and
enforces ID updates upon constraint modification

i.e., two constraints with the same ID are 100-ε% certain to be identical.

Note: the ID is derived in a way that updating a variable name does not lead to a name update. This is at the expense of an ID update when variables are reordered.

Before:

After:

claude · 2026-04-28T13:25:07Z

Review

This PR adds stable, content-derived constraint IDs to the spec rendering pipeline: an FNV-1a hash of each constraint's structure is encoded into a short 4-char human-readable tag (e.g. BRH-1J3K). It also restructures the constraint table (separate tag/counter columns, per-constraint dividers) and adds short code fields to all chip TOML files.

The approach is sound. Three bugs found:

Medium — Crash on empty variable groups (src.typ:191–197)

.sum() without a default: on an empty array returns none in Typst, not an empty dict. If any chip has a variable group with zero entries, the inner .sum() returns none, causing a crash on the outer .sum(). Fix: add default: (:) to both calls.

Low — `z-fill` panics for ≥100 constraints in one group (chip.typ:310)

"0" * (2 - s.len()) produces a negative repeat count when the counter hits 100+. Typst panics on negative string repeat. Fix: calc.max(0, 2 - s.len()).

Low — Hash input ambiguity from unescaped separator (src.typ:182)

The chip name, group, and type are joined with , without escaping. A comma in any field makes two distinct inputs hash identically, potentially causing ID collisions. Current names are safe, but it's a latent fragility. Fix: use a separator that cannot appear in those values (e.g. \x00).

github-actions · 2026-04-28T13:26:32Z

Codex Code Review

No actionable issues found in the PR diff.

The changes are limited to the Typst spec rendering/tagging logic and chip metadata. I did not find security-relevant runtime changes, VM behavior changes, or significant performance concerns.

I could not run a full Typst/shiroa render because neither typst nor shiroa is installed in this environment.

github-actions · 2026-04-28T13:41:21Z

Codex Code Review

Findings:

Low - spec/tooling/chip.py:990 ignores the new TOML code field. It sets self.code = data.get("name", ""), so the validator checks name twice and never validates code. Invalid code values would pass Python tooling, and any tooling using Chip.code will see the full chip name instead of the short code added in this PR. Use data.get("code", self.name) and report self.code in the identifier error.

No security issues found in the PR diff. I did not run a Typst render because typst is not installed in this environment.

claude · 2026-04-28T13:42:30Z

Review: spec/constraint_id

Overview

Replaces sequential constraint indices with content-derived IDs using FNV-1a hashing. Variables are identified by (chip, group, position, type) so renaming a variable doesn't change constraint IDs, but reordering does. The 4-character Base32 ID (20 bits of entropy) is stable across reorganisation and signals stale IDs when a constraint's semantics change. TOML chips all gain a code shorthand field.

Bugs

[High] chip.py reads wrong key for code — self.code = data.get("name", "") always resolves to the chip name; the new code field (e.g. "BRH") is never read or validated. See inline comment on line 990.

[Medium] tag excluded from constraint digest — digestable_constraint omits the tag field for interaction and template constraints. Two constraints calling different tables with identical inputs/outputs/iters receive the same ID, breaking the core invariant. See inline comment on lines 210-213.

[Low] Wrong variable in error message — reporter.asserts(self.code.isidentifier(), f"Invalid identifier: {self.name!r}") prints the name, not the code. See inline comment on line 994.

FNV-1a implementation

The two-limb 64-bit multiplication is mathematically correct: carry propagation from lo into hi is handled, and the hash[1] * prime[1] term (≥ 2⁶⁴) is correctly discarded. No issues here.

Minor observations

CONSTRAINT_ID_CHAR_COUNT = 4 gives 20 bits of ID space. With ~50 constraints per chip the birthday-collision probability is negligible (~0.1%), so this is fine for the current scale.
bytes-to-hex's z-fill silently produces wrong output for strings longer than 2 chars, but since all inputs are single bytes (0–255) this can't trigger in practice.

RobinJadoul

Very brief look only, so far; haven't properly looked at the canonicalization/serialization of the constraints yet

RobinJadoul · 2026-06-01T09:50:55Z

+      .map(cat => (str(cat): replace_variable_with_ID(c.at(cat))))
+      .sum(default: (:))
+
+    repr(id_tagged)


This may depend a bit too much on typst internals for canonicalization
Maybe json.encode can work, but then there's still a potential ordering issue

Agreed. I reworked it. Let me know if you agree that it is better now.

It's definitely better, the encoding for dicts feels a bit weird, but not necessarily problematically so (unless we suddenly could have vars named :)

I updated the encoding. Among others, I included domain separators. Let me know what you think!

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

RobinJadoul · 2026-06-24T13:32:33Z

@@ -1,4 +1,5 @@
 name = "SHA256ROUND"
+code = "SHR"


Hmm, maybe dangerous to confuse with shift right?

good catch. Let's make it... SHD

Suggested change

code = "SHR"

code = "SHD"

(please apply if you agree)

RobinJadoul · 2026-06-24T14:06:29Z

+    }
+
+    let flattened_type = lower(flatten_vartype(var.type))    
+    let input = (chip, group, str(idx), flattened_type).join("\x00")


variables. e.g. for #670 I had to re-order some (virtual) variables that depended on each other in SHA256. So I'm tempted to think that variables should maybe be tied to their name rather than position.
I think we discussed a while back that the De Bruijn index idea was originally mostly for iters and summations

But all that would need, I think, it just replacing the str(idx) by the name of the variable here, and the rest would keep working

RobinJadoul · 2026-06-24T14:09:09Z

+    if type(lisp) == array {
+      "(" + lisp.map(replace_variable_with_ID).join(",") + ")"
+    } else {
+      variable_to_ID.at(str(lisp), default: str(lisp))


Not sure, I think it just felt a bit weird as a canonicalization step

RobinJadoul · 2026-06-24T14:12:59Z

+  let LOG_ID_RADIX = 5
+  let RADIX = calc.pow(2, LOG_ID_RADIX)
+  assert(ID_CHAR_SET.len() == RADIX, message: "ID_CHAR_SET <> RADIX mismatch")
+  assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "ID_CHAR_LEN and RADIX incompatible")


Suggested change

assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "ID_CHAR_LEN and RADIX incompatible")

assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "digest size too small for configured ID entropy")

RobinJadoul · 2026-06-24T14:22:37Z

+  }
+
+  // Recursively flatten nested object
+  let flatten(obj, terminator: str, braces: ("(", ")"), sep: ",") = {


Perhaps for naming

Suggested change

let flatten(obj, terminator: str, braces: ("(", ")"), sep: ",") = {

let flatten(obj, leaf_transform: str, braces: ("(", ")"), sep: ",") = {

RobinJadoul · 2026-06-24T14:26:22Z

+      .replace("\n", "")
+      .replace(" ", "")


Are we expecting whitespace to be generated?

RobinJadoul · 2026-06-24T14:29:13Z

+      .map(cat => (str(cat): replace_variable_with_ID(c.at(cat))))
+      .sum(default: (:))
+
+    repr(id_tagged)


It's definitely better, the encoding for dicts feels a bit weird, but not necessarily problematically so (unless we suddenly could have vars named :)

erik-3milabs · 2026-06-25T14:40:10Z

/ai-review

github-actions · 2026-06-25T14:42:36Z

Codex Code Review

Findings:

Low spec/src.typ:137: The new FNV-1a implementation does not compute FNV-1a correctly. hi is computed using lo after lo = lo * prime.at(0), so the cross term uses the multiplied low limb instead of the original low limb. If these IDs are meant to be FNV-derived/stable across implementations, keep the pre-multiply lo in a temp and use that for lo * prime_hi.
Low spec/chip.typ:307: lookup_group and the group parameter to tag are now dead. This also leaves existing prefix fields in spec/src/*.toml orphaned, for example spec/src/cpu.toml:288. Either wire prefix into the new tag format or remove the unused helper/argument and stale TOML fields.

No safety/security, VM semantics, GPU, or performance issues found in the PR diff.

claude · 2026-06-25T14:43:36Z

+    lo = lo * prime.at(0)
+    hi = lo * prime.at(1) + hi * prime.at(0)


Low / correctness — this doesn't implement FNV-1a as documented. The high limb is computed from the already-multiplied lo.

For hash = hi·2³² + lo and prime = ph·2³² + pl, the high word should be L·ph + hi·pl where L = (old_lo ⊕ b). But lo is overwritten on the previous line (lo = lo * prime.at(0)), so this line computes (L·pl)·ph + hi·pl instead of L·ph + hi·pl.

Fix by computing hi before overwriting lo:

Suggested change

lo = lo * prime.at(0)

hi = lo * prime.at(1) + hi * prime.at(0)

// hash := hash × FNV_prime

let new_lo = lo * prime.at(0)

hi = lo * prime.at(1) + hi * prime.at(0)

lo = new_lo

Note the low limb lo (and thus the 24-bit constraint ID, which only reads the 3 low bytes) is unaffected — the low 32 bits of a modular product depend only on the low operands. The bug only corrupts the high limb, which feeds the full-8-byte variable IDs (bytes-to-hex(nchf(...))) and therefore propagates into constraint IDs. Output stays deterministic and content-derived, so the feature still works; it's just not the FNV-1a it claims to be.

erik-3milabs · 2026-06-25T14:43:36Z

@claude

claude · 2026-06-25T14:43:49Z

Review summary

Reviewed the diff (a spec-only PR: Typst rendering changes + content-derived constraint IDs). No safety/security or VM/proving concerns — the ID scheme lives entirely in the Typst doc-rendering path and doesn't touch the prover (chip.py only stores the new code field, it doesn't compute IDs).

Findings

Low (correctness): FNV-1a doesn't match its documentation — the high limb is computed from the post-multiplication lo, so it isn't true FNV-1a. The low limb (and thus the 24-bit constraint ID) is correct, but the buggy high word propagates into variable IDs. Output stays deterministic and content-derived, so the feature works; flagged inline with a suggested fix.
Nit (consistency): code defaults differ between tools — chip.py falls back to "" while src.typ uses chip.at("code", default: chip.name). Harmless today since Python doesn't use code, but worth aligning.

Looks good

.sum(default: 0) / .sum(default: (:)) additions correctly guard empty-array sums.
Excluding desc/ref/constraint from the ID hash is sound: constraint is human-readable markup while the semantic poly field is retained, so IDs still reflect constraint meaning.
digest_to_id reads 3 bytes (24 bits) for a 20-bit ID — consistent with MIN_DIGEST_BYTES.

Nothing blocking.

erik-3milabs requested a review from RobinJadoul April 28, 2026 13:22

erik-3milabs self-assigned this Apr 28, 2026

erik-3milabs added the spec Updates and improvements to the spec document label Apr 28, 2026