spec: update constraint ID system#570
Conversation
ReviewThis PR adds stable, content-derived constraint IDs to the spec rendering pipeline: an FNV-1a hash of each constraint's structure is encoded into a short 4-char human-readable tag (e.g. The approach is sound. Three bugs found: Medium — Crash on empty variable groups (src.typ:191–197)
Low —
|
Codex Code ReviewNo actionable issues found in the PR diff. The changes are limited to the Typst spec rendering/tagging logic and chip metadata. I did not find security-relevant runtime changes, VM behavior changes, or significant performance concerns. I could not run a full Typst/shiroa render because neither |
Codex Code ReviewFindings:
No security issues found in the PR diff. I did not run a Typst render because |
Review: spec/constraint_idOverviewReplaces sequential constraint indices with content-derived IDs using FNV-1a hashing. Variables are identified by (chip, group, position, type) so renaming a variable doesn't change constraint IDs, but reordering does. The 4-character Base32 ID (20 bits of entropy) is stable across reorganisation and signals stale IDs when a constraint's semantics change. TOML chips all gain a Bugs[High] [Medium] [Low] Wrong variable in error message — FNV-1a implementationThe two-limb 64-bit multiplication is mathematically correct: carry propagation from Minor observations
|
RobinJadoul
left a comment
There was a problem hiding this comment.
Very brief look only, so far; haven't properly looked at the canonicalization/serialization of the constraints yet
99898d2 to
a27e78f
Compare
| .map(cat => (str(cat): replace_variable_with_ID(c.at(cat)))) | ||
| .sum(default: (:)) | ||
|
|
||
| repr(id_tagged) |
There was a problem hiding this comment.
This may depend a bit too much on typst internals for canonicalization
Maybe json.encode can work, but then there's still a potential ordering issue
There was a problem hiding this comment.
Agreed. I reworked it. Let me know if you agree that it is better now.
There was a problem hiding this comment.
It's definitely better, the encoding for dicts feels a bit weird, but not necessarily problematically so (unless we suddenly could have vars named :)
There was a problem hiding this comment.
I updated the encoding. Among others, I included domain separators. Let me know what you think!
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
a27e78f to
b657685
Compare
| @@ -1,4 +1,5 @@ | |||
| name = "SHA256ROUND" | |||
| code = "SHR" | |||
There was a problem hiding this comment.
Hmm, maybe dangerous to confuse with shift right?
There was a problem hiding this comment.
good catch. Let's make it... SHD
| code = "SHR" | |
| code = "SHD" |
(please apply if you agree)
| } | ||
|
|
||
| let flattened_type = lower(flatten_vartype(var.type)) | ||
| let input = (chip, group, str(idx), flattened_type).join("\x00") |
There was a problem hiding this comment.
variables. e.g. for #670 I had to re-order some (virtual) variables that depended on each other in SHA256. So I'm tempted to think that variables should maybe be tied to their name rather than position.
I think we discussed a while back that the De Bruijn index idea was originally mostly for iters and summations
But all that would need, I think, it just replacing the str(idx) by the name of the variable here, and the rest would keep working
| if type(lisp) == array { | ||
| "(" + lisp.map(replace_variable_with_ID).join(",") + ")" | ||
| } else { | ||
| variable_to_ID.at(str(lisp), default: str(lisp)) |
There was a problem hiding this comment.
Not sure, I think it just felt a bit weird as a canonicalization step
| let LOG_ID_RADIX = 5 | ||
| let RADIX = calc.pow(2, LOG_ID_RADIX) | ||
| assert(ID_CHAR_SET.len() == RADIX, message: "ID_CHAR_SET <> RADIX mismatch") | ||
| assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "ID_CHAR_LEN and RADIX incompatible") |
There was a problem hiding this comment.
| assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "ID_CHAR_LEN and RADIX incompatible") | |
| assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "digest size too small for configured ID entropy") |
| } | ||
|
|
||
| // Recursively flatten nested object | ||
| let flatten(obj, terminator: str, braces: ("(", ")"), sep: ",") = { |
There was a problem hiding this comment.
Perhaps for naming
| let flatten(obj, terminator: str, braces: ("(", ")"), sep: ",") = { | |
| let flatten(obj, leaf_transform: str, braces: ("(", ")"), sep: ",") = { |
| .replace("\n", "") | ||
| .replace(" ", "") |
There was a problem hiding this comment.
Are we expecting whitespace to be generated?
| .map(cat => (str(cat): replace_variable_with_ID(c.at(cat)))) | ||
| .sum(default: (:)) | ||
|
|
||
| repr(id_tagged) |
There was a problem hiding this comment.
It's definitely better, the encoding for dicts feels a bit weird, but not necessarily problematically so (unless we suddenly could have vars named :)
|
/ai-review |
Codex Code ReviewFindings:
No safety/security, VM semantics, GPU, or performance issues found in the PR diff. |
| lo = lo * prime.at(0) | ||
| hi = lo * prime.at(1) + hi * prime.at(0) |
There was a problem hiding this comment.
Low / correctness — this doesn't implement FNV-1a as documented. The high limb is computed from the already-multiplied lo.
For hash = hi·2³² + lo and prime = ph·2³² + pl, the high word should be L·ph + hi·pl where L = (old_lo ⊕ b). But lo is overwritten on the previous line (lo = lo * prime.at(0)), so this line computes (L·pl)·ph + hi·pl instead of L·ph + hi·pl.
Fix by computing hi before overwriting lo:
| lo = lo * prime.at(0) | |
| hi = lo * prime.at(1) + hi * prime.at(0) | |
| // hash := hash × FNV_prime | |
| let new_lo = lo * prime.at(0) | |
| hi = lo * prime.at(1) + hi * prime.at(0) | |
| lo = new_lo |
Note the low limb lo (and thus the 24-bit constraint ID, which only reads the 3 low bytes) is unaffected — the low 32 bits of a modular product depend only on the low operands. The bug only corrupts the high limb, which feeds the full-8-byte variable IDs (bytes-to-hex(nchf(...))) and therefore propagates into constraint IDs. Output stays deterministic and content-derived, so the feature still works; it's just not the FNV-1a it claims to be.
Review summaryReviewed the diff (a spec-only PR: Typst rendering changes + content-derived constraint IDs). No safety/security or VM/proving concerns — the ID scheme lives entirely in the Typst doc-rendering path and doesn't touch the prover ( Findings
Looks good
Nothing blocking. |
This PR introduces content-derived constraint IDs, i.e., the ID of a constraint is derived from its content, rather than its index in the constraint list. This
i.e., two constraints with the same ID are 100-ε% certain to be identical.
Note: the ID is derived in a way that updating a variable name does not lead to a name update. This is at the expense of an ID update when variables are reordered.
Before:

After:
