Skip to content

Fix multi-column ColumnIndex rebuild ordering via per-column radix sort + merge#914

Open
yihozhang wants to merge 4 commits into
mainfrom
yihozhang-better-index-order-fix
Open

Fix multi-column ColumnIndex rebuild ordering via per-column radix sort + merge#914
yihozhang wants to merge 4 commits into
mainfrom
yihozhang-better-index-order-fix

Conversation

@yihozhang

Copy link
Copy Markdown
Collaborator

Problem

The multi-column ColumnIndex rebuild (the rebuild_index covering all of a table's rebuildable value columns) produced incorrect subsets. It collected (value, row_id) pairs column-major — all of column 0's pairs, then column 1's, etc. — and sorted them only by value. Within a single value's group the rows then came out as column 0's rows (RowId-ascending) followed by column 1's rows (RowId-ascending), i.e. not globally RowId-ascending.

That violates the invariant that each key's subset is built from rows in ascending RowId order (the contract add_row_sorted and the offset consumers rely on), so a value present in more than one column could get a subset with out-of-order offsets. The single-column path was unaffected because its scan order is already RowId-ascending.

Solution

Rebuild each key's subset from a properly (Value, RowId)-sorted, de-duplicated pair list:

  1. Collect each column into its own contiguous block (still RowId-ascending from scan order).
  2. Value-only radix sort each block. Because the sort is stable and each block arrives RowId-ascending, the block ends up ordered by (Value, RowId) — with no RowId passes.
  3. Balanced tournament two-way merge of the sorted blocks, dropping duplicate pairs (a value appearing in several of a row's columns). Dedup composes through the merge tree since merging two deduped sorted runs leaves any shared pair adjacent.

Single- and multi-column rebuilds now share one code path; the previous global-sort-plus-per-value-group-re-sort is gone.

Why this shape

  • No single-column overhead. Fixing ordering by adding RowId passes to the shared radix sort regressed the (hot) single-column path ~1–2.6%, because those passes are wasted work when pairs already arrive RowId-ordered. Sorting per-column and merging keeps the single-column case at its original cost.
  • O(n log k) merge. A left-fold that merges one growing accumulator against each block in turn is O(n*k) in the column count k (the accumulator is re-copied every step). The balanced tournament merge halves the run count each round → O(n log k), which matters for wide tables. For the common two-column case it is the same single merge.

Performance

Benchmarked with scripts/bench.py against the prior per-group-sort implementation across the standard suite (conv1d, luminal-llama, python_array, cykjson, eggcc-extraction). Net change is within the measurement noise floor (≈ ±1% run-to-run on these inputs), with no consistent regression — the single-column hot path is unchanged and eggcc/cykjson trend slightly faster. The O(n log k) merge improvement is asymptotic and applies to wide multi-column tables, which the current benchmark suite does not exercise.

Testing

  • New regression test multi_column_column_index_rebuild_orders_each_value_by_row asserts each value's subset matches RowId-ascending table scan order after rebuilds.
  • All core-relations unit tests and the full egglog integration suite (723 tests) pass; cargo fmt --check and clippy -D warnings clean.

🤖 Generated with Claude Code

saulshanabrook and others added 3 commits June 2, 2026 22:56
Replace the global value sort plus per-value-group re-sort in the
multi-column ColumnIndex rebuild with: a value-only radix sort of each
column's (value, row) block (each block arrives RowId-ascending, so the
stable sort yields (Value, RowId) order), followed by a merge of the
sorted blocks that drops duplicate pairs. The single- and multi-column
cases now share one code path, and no value group is ever re-sorted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Merging the per-column sorted blocks by folding a single growing
accumulator against each block in turn is O(n*k) in the column count: the
accumulator is re-copied on every merge. Replace it with a balanced
(tournament) two-way merge -- merge adjacent runs pairwise, then merge the
results pairwise, halving the run count each round -- which is O(n log k).
For the common two-column case this is the same single merge as before;
the win shows up on wide tables with many covered columns.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@yihozhang yihozhang requested a review from a team as a code owner June 4, 2026 19:35
@yihozhang yihozhang requested review from ezrosent and removed request for a team June 4, 2026 19:35
@codspeed-hq

codspeed-hq Bot commented Jun 4, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 29 untouched benchmarks
⏩ 219 skipped benchmarks1


Comparing yihozhang-better-index-order-fix (a3ace91) with main (7d102db)

Open in CodSpeed

Footnotes

  1. 219 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov-commenter

codecov-commenter commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.59%. Comparing base (57108da) to head (a3ace91).
⚠️ Report is 68 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #914      +/-   ##
==========================================
- Coverage   87.17%   86.59%   -0.59%     
==========================================
  Files          88       89       +1     
  Lines       26106    26872     +766     
==========================================
+ Hits        22759    23269     +510     
- Misses       3347     3603     +256     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@oflatt

oflatt commented Jun 4, 2026

Copy link
Copy Markdown
Member

Does this fix #911 ? Or different issue?

@saulshanabrook

Copy link
Copy Markdown
Member

This I believe is instead of #899?

The tournament merge allocated a fresh Vec per pairwise merge and a
Vec<Vec<_>> per round. Instead, pack each round's merged runs contiguously
into a second buffer of the same size and ping-pong the two buffers,
reusing the moved-in `pairs` as one of them. The whole merge now uses a
single extra allocation regardless of column count (k-1 allocations -> 1),
with run boundaries tracked in a small inline array. merge2_into appends
with a dedup window scoped to its own output so adjacent runs packed in one
buffer are not merged together.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants