Fix O(n^2) apply/revert; add text comparison functions#8
Conversation
Replace Array.concat in apply and revert with element-wise push-back to avoid quadratic allocation. Add lcs-length, edit-distance, and similarity functions for text comparison use cases.
There was a problem hiding this comment.
Build & Tests
Build: pass (ARM Linux, local).
Tests: all 54 pass — original 36 diff/apply/revert/unified tests plus 18 new tests for lcs-length, edit-distance, and similarity.
CI: Both macOS and Ubuntu pass (tests, lint, format check).
Findings
1. push-all! is correct and efficient
(Array.push-back! res @(Array.unsafe-nth xs j)) copies each element and pushes in-place. This replaces the old (set! res (Array.concat &[res (Array.copy xs)])) which allocated a fresh array on every hunk iteration. The semantics are identical — both produce an array of copies — but push-all! is amortized O(k) per hunk instead of O(accumulated_length).
2. lcs-length is correct
Computes the full diff and sums Array.length of all Eq hunks. This is mathematically correct — the LCS is exactly the set of equal elements in the diff output.
3. edit-distance is mathematically correct
Uses the identity edit_distance = |old| + |new| - 2 * |LCS|. Verified against all test cases.
4. similarity is correct
Dice coefficient: 2 * |LCS| / (|old| + |new|). Empty-array guard returns 1.0 (two empty arrays are identical). Range is [0.0, 1.0]. All correct.
5. Observation: edit-distance and similarity each independently recompute the diff (non-blocking)
edit-distance calls lcs-length which calls diff. similarity also calls lcs-length which calls diff. If a caller uses both on the same pair, the diff is computed twice. This is fine for the standalone API — each function is self-contained — but worth noting if someone later builds a "compute all metrics" function. Not a bug, just a design observation.
6. Test coverage is good
18 new tests cover: identical arrays, partial matches, completely different arrays, empty arrays (both, one-sided), and prefix relationships. Both directions tested for edit-distance. Similarity tests cover 0.0, 1.0, and the empty-array edge case.
7. No CHANGELOG in this repo
No CHANGELOG.md exists in the diff repo, so no update is expected.
Verdict: merge
Performance fix is correct and the new functions are well-designed with comprehensive tests. No bugs found.
Summary
applyandrevert: replacedArray.concat &[res ...](which allocates a fresh array each iteration) with apush-all!helper that appends elements in-place viaArray.push-back!.lcs-length: returns the length of the longest common subsequence of two arrays (derived from the diff).edit-distance: returns the number of insertions + deletions needed to transform one array into the other.similarity: returns a 0.0–1.0 ratio based on LCS length. Two empty arrays return 1.0.Why
applyandreverthad quadratic behaviour becauseArray.concatcopies the entire accumulator on each hunk. The new functions fill a gap for fuzzy matching and quality metrics on sequences.Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.