Skip to content

spec: update constraint ID system#570

Open
erik-3milabs wants to merge 25 commits into
spec/mainfrom
spec/constraint_id
Open

spec: update constraint ID system#570
erik-3milabs wants to merge 25 commits into
spec/mainfrom
spec/constraint_id

Conversation

@erik-3milabs

Copy link
Copy Markdown
Collaborator

This PR introduces content-derived constraint IDs, i.e., the ID of a constraint is derived from its content, rather than its index in the constraint list. This

  1. permit addition, removing and reorganizing constraints without changing any IDs, and
  2. enforces ID updates upon constraint modification

i.e., two constraints with the same ID are 100-ε% certain to be identical.

Note: the ID is derived in a way that updating a variable name does not lead to a name update. This is at the expense of an ID update when variables are reordered.

Before:
image

After:
image

@erik-3milabs erik-3milabs self-assigned this Apr 28, 2026
@erik-3milabs erik-3milabs added the spec Updates and improvements to the spec document label Apr 28, 2026
Comment thread spec/src.typ Outdated
Comment thread spec/chip.typ Outdated
Comment thread spec/src.typ Outdated
@claude

claude Bot commented Apr 28, 2026

Copy link
Copy Markdown

Review

This PR adds stable, content-derived constraint IDs to the spec rendering pipeline: an FNV-1a hash of each constraint's structure is encoded into a short 4-char human-readable tag (e.g. BRH-1J3K). It also restructures the constraint table (separate tag/counter columns, per-constraint dividers) and adds short code fields to all chip TOML files.

The approach is sound. Three bugs found:


Medium — Crash on empty variable groups (src.typ:191–197)

.sum() without a default: on an empty array returns none in Typst, not an empty dict. If any chip has a variable group with zero entries, the inner .sum() returns none, causing a crash on the outer .sum(). Fix: add default: (:) to both calls.

Low — z-fill panics for ≥100 constraints in one group (chip.typ:310)

"0" * (2 - s.len()) produces a negative repeat count when the counter hits 100+. Typst panics on negative string repeat. Fix: calc.max(0, 2 - s.len()).

Low — Hash input ambiguity from unescaped separator (src.typ:182)

The chip name, group, and type are joined with , without escaping. A comma in any field makes two distinct inputs hash identically, potentially causing ID collisions. Current names are safe, but it's a latent fragility. Fix: use a separator that cannot appear in those values (e.g. \x00).

@github-actions

Copy link
Copy Markdown

Codex Code Review

No actionable issues found in the PR diff.

The changes are limited to the Typst spec rendering/tagging logic and chip metadata. I did not find security-relevant runtime changes, VM behavior changes, or significant performance concerns.

I could not run a full Typst/shiroa render because neither typst nor shiroa is installed in this environment.

@erik-3milabs erik-3milabs marked this pull request as draft April 28, 2026 13:28
@erik-3milabs erik-3milabs marked this pull request as ready for review April 28, 2026 13:37
@github-actions

Copy link
Copy Markdown

Codex Code Review

Findings:

  • Low - spec/tooling/chip.py:990 ignores the new TOML code field. It sets self.code = data.get("name", ""), so the validator checks name twice and never validates code. Invalid code values would pass Python tooling, and any tooling using Chip.code will see the full chip name instead of the short code added in this PR. Use data.get("code", self.name) and report self.code in the identifier error.

No security issues found in the PR diff. I did not run a Typst render because typst is not installed in this environment.

Comment thread spec/tooling/chip.py Outdated
Comment thread spec/tooling/chip.py Outdated
Comment thread spec/src.typ Outdated
@claude

claude Bot commented Apr 28, 2026

Copy link
Copy Markdown

Review: spec/constraint_id

Overview

Replaces sequential constraint indices with content-derived IDs using FNV-1a hashing. Variables are identified by (chip, group, position, type) so renaming a variable doesn't change constraint IDs, but reordering does. The 4-character Base32 ID (20 bits of entropy) is stable across reorganisation and signals stale IDs when a constraint's semantics change. TOML chips all gain a code shorthand field.

Bugs

[High] chip.py reads wrong key for codeself.code = data.get("name", "") always resolves to the chip name; the new code field (e.g. "BRH") is never read or validated. See inline comment on line 990.

[Medium] tag excluded from constraint digestdigestable_constraint omits the tag field for interaction and template constraints. Two constraints calling different tables with identical inputs/outputs/iters receive the same ID, breaking the core invariant. See inline comment on lines 210-213.

[Low] Wrong variable in error messagereporter.asserts(self.code.isidentifier(), f"Invalid identifier: {self.name!r}") prints the name, not the code. See inline comment on line 994.

FNV-1a implementation

The two-limb 64-bit multiplication is mathematically correct: carry propagation from lo into hi is handled, and the hash[1] * prime[1] term (≥ 2⁶⁴) is correctly discarded. No issues here.

Minor observations

  • CONSTRAINT_ID_CHAR_COUNT = 4 gives 20 bits of ID space. With ~50 constraints per chip the birthday-collision probability is negligible (~0.1%), so this is fine for the current scale.
  • bytes-to-hex's z-fill silently produces wrong output for strings longer than 2 chars, but since all inputs are single bytes (0–255) this can't trigger in practice.

@RobinJadoul RobinJadoul left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very brief look only, so far; haven't properly looked at the canonicalization/serialization of the constraints yet

Comment thread spec/src.typ
Comment thread spec/chip.typ Outdated
Comment thread spec/chip.typ Outdated
Comment thread spec/src.typ Outdated
Comment thread spec/chip.typ Outdated
Comment thread spec/chip.typ Outdated
Comment thread spec/src.typ
Comment thread spec/src.typ Outdated
Comment thread spec/src.typ Outdated
Comment thread spec/src.typ Outdated
Comment thread spec/src.typ Outdated
.map(cat => (str(cat): replace_variable_with_ID(c.at(cat))))
.sum(default: (:))

repr(id_tagged)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may depend a bit too much on typst internals for canonicalization
Maybe json.encode can work, but then there's still a potential ordering issue

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I reworked it. Let me know if you agree that it is better now.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely better, the encoding for dicts feels a bit weird, but not necessarily problematically so (unless we suddenly could have vars named :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the encoding. Among others, I included domain separators. Let me know what you think!

Comment thread spec/src.typ Outdated
Comment thread spec/src.typ Outdated
Comment thread spec/src.typ Outdated
Base automatically changed from spec/main to main June 8, 2026 20:38
@RobinJadoul RobinJadoul changed the base branch from main to spec/main June 9, 2026 12:33
@erik-3milabs erik-3milabs requested a review from RobinJadoul June 24, 2026 12:06
Comment thread spec/src/sha256round.toml
@@ -1,4 +1,5 @@
name = "SHA256ROUND"
code = "SHR"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe dangerous to confuse with shift right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Let's make it... SHD

Suggested change
code = "SHR"
code = "SHD"

(please apply if you agree)

Comment thread spec/src.typ Outdated
}

let flattened_type = lower(flatten_vartype(var.type))
let input = (chip, group, str(idx), flattened_type).join("\x00")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables. e.g. for #670 I had to re-order some (virtual) variables that depended on each other in SHA256. So I'm tempted to think that variables should maybe be tied to their name rather than position.
I think we discussed a while back that the De Bruijn index idea was originally mostly for iters and summations

But all that would need, I think, it just replacing the str(idx) by the name of the variable here, and the rest would keep working

Comment thread spec/src.typ Outdated
if type(lisp) == array {
"(" + lisp.map(replace_variable_with_ID).join(",") + ")"
} else {
variable_to_ID.at(str(lisp), default: str(lisp))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I think it just felt a bit weird as a canonicalization step

Comment thread spec/src.typ Outdated
let LOG_ID_RADIX = 5
let RADIX = calc.pow(2, LOG_ID_RADIX)
assert(ID_CHAR_SET.len() == RADIX, message: "ID_CHAR_SET <> RADIX mismatch")
assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "ID_CHAR_LEN and RADIX incompatible")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "ID_CHAR_LEN and RADIX incompatible")
assert(ID_CHAR_LEN * LOG_ID_RADIX <= 64, message: "digest size too small for configured ID entropy")

Comment thread spec/src.typ Outdated
}

// Recursively flatten nested object
let flatten(obj, terminator: str, braces: ("(", ")"), sep: ",") = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for naming

Suggested change
let flatten(obj, terminator: str, braces: ("(", ")"), sep: ",") = {
let flatten(obj, leaf_transform: str, braces: ("(", ")"), sep: ",") = {

Comment thread spec/src.typ Outdated
Comment on lines +239 to +240
.replace("\n", "")
.replace(" ", "")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting whitespace to be generated?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

Comment thread spec/src.typ Outdated
.map(cat => (str(cat): replace_variable_with_ID(c.at(cat))))
.sum(default: (:))

repr(id_tagged)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely better, the encoding for dicts feels a bit weird, but not necessarily problematically so (unless we suddenly could have vars named :)

@erik-3milabs

Copy link
Copy Markdown
Collaborator Author

/ai-review

@github-actions

Copy link
Copy Markdown

Codex Code Review

Findings:

  • Low spec/src.typ:137: The new FNV-1a implementation does not compute FNV-1a correctly. hi is computed using lo after lo = lo * prime.at(0), so the cross term uses the multiplied low limb instead of the original low limb. If these IDs are meant to be FNV-derived/stable across implementations, keep the pre-multiply lo in a temp and use that for lo * prime_hi.

  • Low spec/chip.typ:307: lookup_group and the group parameter to tag are now dead. This also leaves existing prefix fields in spec/src/*.toml orphaned, for example spec/src/cpu.toml:288. Either wire prefix into the new tag format or remove the unused helper/argument and stale TOML fields.

No safety/security, VM semantics, GPU, or performance issues found in the PR diff.

Comment thread spec/src.typ Outdated
Comment on lines +137 to +138
lo = lo * prime.at(0)
hi = lo * prime.at(1) + hi * prime.at(0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low / correctness — this doesn't implement FNV-1a as documented. The high limb is computed from the already-multiplied lo.

For hash = hi·2³² + lo and prime = ph·2³² + pl, the high word should be L·ph + hi·pl where L = (old_lo ⊕ b). But lo is overwritten on the previous line (lo = lo * prime.at(0)), so this line computes (L·pl)·ph + hi·pl instead of L·ph + hi·pl.

Fix by computing hi before overwriting lo:

Suggested change
lo = lo * prime.at(0)
hi = lo * prime.at(1) + hi * prime.at(0)
// hash := hash × FNV_prime
let new_lo = lo * prime.at(0)
hi = lo * prime.at(1) + hi * prime.at(0)
lo = new_lo

Note the low limb lo (and thus the 24-bit constraint ID, which only reads the 3 low bytes) is unaffected — the low 32 bits of a modular product depend only on the low operands. The bug only corrupts the high limb, which feeds the full-8-byte variable IDs (bytes-to-hex(nchf(...))) and therefore propagates into constraint IDs. Output stays deterministic and content-derived, so the feature still works; it's just not the FNV-1a it claims to be.

@erik-3milabs

Copy link
Copy Markdown
Collaborator Author

@claude

@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review summary

Reviewed the diff (a spec-only PR: Typst rendering changes + content-derived constraint IDs). No safety/security or VM/proving concerns — the ID scheme lives entirely in the Typst doc-rendering path and doesn't touch the prover (chip.py only stores the new code field, it doesn't compute IDs).

Findings

  • Low (correctness): FNV-1a doesn't match its documentation — the high limb is computed from the post-multiplication lo, so it isn't true FNV-1a. The low limb (and thus the 24-bit constraint ID) is correct, but the buggy high word propagates into variable IDs. Output stays deterministic and content-derived, so the feature works; flagged inline with a suggested fix.

  • Nit (consistency): code defaults differ between tools — chip.py falls back to "" while src.typ uses chip.at("code", default: chip.name). Harmless today since Python doesn't use code, but worth aligning.

Looks good

  • .sum(default: 0) / .sum(default: (:)) additions correctly guard empty-array sums.
  • Excluding desc/ref/constraint from the ID hash is sound: constraint is human-readable markup while the semantic poly field is retained, so IDs still reflect constraint meaning.
  • digest_to_id reads 3 bytes (24 bits) for a 20-bit ID — consistent with MIN_DIGEST_BYTES.

Nothing blocking.

@yetanotherco yetanotherco deleted a comment from github-actions Bot Jun 25, 2026
@erik-3milabs erik-3milabs requested a review from RobinJadoul June 26, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spec Updates and improvements to the spec document

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants