Skip to content

Fix O(n^2) apply/revert; add text comparison functions#8

Merged
hellerve merged 1 commit into
masterfrom
claude/perf-and-text-metrics
Jun 8, 2026
Merged

Fix O(n^2) apply/revert; add text comparison functions#8
hellerve merged 1 commit into
masterfrom
claude/perf-and-text-metrics

Conversation

@carpentry-agent

Copy link
Copy Markdown

Summary

  • Fix O(n^2) allocation in apply and revert: replaced Array.concat &[res ...] (which allocates a fresh array each iteration) with a push-all! helper that appends elements in-place via Array.push-back!.
  • Add lcs-length: returns the length of the longest common subsequence of two arrays (derived from the diff).
  • Add edit-distance: returns the number of insertions + deletions needed to transform one array into the other.
  • Add similarity: returns a 0.0–1.0 ratio based on LCS length. Two empty arrays return 1.0.
  • Tests for all three new functions.

Why

apply and revert had quadratic behaviour because Array.concat copies the entire accumulator on each hunk. The new functions fill a gap for fuzzy matching and quality metrics on sequences.


Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.

Replace Array.concat in apply and revert with element-wise push-back
to avoid quadratic allocation. Add lcs-length, edit-distance, and
similarity functions for text comparison use cases.

@carpentry-reviewer carpentry-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

Build: pass (ARM Linux, local).
Tests: all 54 pass — original 36 diff/apply/revert/unified tests plus 18 new tests for lcs-length, edit-distance, and similarity.
CI: Both macOS and Ubuntu pass (tests, lint, format check).

Findings

1. push-all! is correct and efficient

(Array.push-back! res @(Array.unsafe-nth xs j)) copies each element and pushes in-place. This replaces the old (set! res (Array.concat &[res (Array.copy xs)])) which allocated a fresh array on every hunk iteration. The semantics are identical — both produce an array of copies — but push-all! is amortized O(k) per hunk instead of O(accumulated_length).

2. lcs-length is correct

Computes the full diff and sums Array.length of all Eq hunks. This is mathematically correct — the LCS is exactly the set of equal elements in the diff output.

3. edit-distance is mathematically correct

Uses the identity edit_distance = |old| + |new| - 2 * |LCS|. Verified against all test cases.

4. similarity is correct

Dice coefficient: 2 * |LCS| / (|old| + |new|). Empty-array guard returns 1.0 (two empty arrays are identical). Range is [0.0, 1.0]. All correct.

5. Observation: edit-distance and similarity each independently recompute the diff (non-blocking)

edit-distance calls lcs-length which calls diff. similarity also calls lcs-length which calls diff. If a caller uses both on the same pair, the diff is computed twice. This is fine for the standalone API — each function is self-contained — but worth noting if someone later builds a "compute all metrics" function. Not a bug, just a design observation.

6. Test coverage is good

18 new tests cover: identical arrays, partial matches, completely different arrays, empty arrays (both, one-sided), and prefix relationships. Both directions tested for edit-distance. Similarity tests cover 0.0, 1.0, and the empty-array edge case.

7. No CHANGELOG in this repo

No CHANGELOG.md exists in the diff repo, so no update is expected.

Verdict: merge

Performance fix is correct and the new functions are well-designed with comprehensive tests. No bugs found.

@hellerve hellerve marked this pull request as ready for review June 8, 2026 20:48
@hellerve hellerve merged commit 4746f99 into master Jun 8, 2026
2 checks passed
@hellerve hellerve deleted the claude/perf-and-text-metrics branch June 8, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant