fix: scan prefix-sum correctness, binding hazards, and GPU dispatch optimizations by LessUp · Pull Request #21 · AICL-Lab/webgpu-sorting

LessUp · 2026-07-01T10:54:36Z

See commit message for full details. Fixes critical scan bugs (half-element add_block_prefixes, binding hazard, 512-block limit) and optimizes GPU dispatch.

…dispatch Critical correctness fixes: - Fix add_block_prefixes shader only processing half the elements (1 per thread instead of 2), which broke radix sort for arrays > 8192 elements. Now matches blelloch_scan's 2-elements-per-thread layout. - Fix scan_block_sums WebGPU binding hazard: the same blockSumsBuffer was bound as both read-only-storage (binding 0) and read_write storage (binding 1/2), which is a validation error. Each pipeline now has a dedicated bind group layout with only the bindings it uses. - Fix scan_block_sums 512-block limit: for arrays > ~4M elements the single-workgroup block-sum scan silently skipped excess blocks. Replaced with a recursive multi-level scan that handles arbitrarily large inputs. Architecture improvements: - ScanModule: three dedicated bind group layouts (scanLayout, blockSumsScanLayout, addPrefixesLayout) instead of one shared layout, eliminating all read-only/read-write binding conflicts. Performance optimizations: - BitonicSorter: batch all bitonic passes into a single command encoder with copyBufferToBuffer for uniform updates, reducing queue submissions from 100+ to 1 for large arrays. - RadixSorter: reuse zero-histogram buffer across passes instead of allocating a new Uint32Array per pass. - Benchmark: preallocate GPU buffers per target size so iterations measure steady-state sort performance (buffer reuse) not allocation overhead. - fillRandomUint32Array: fill in-place via subarray instead of allocating and copying per-chunk temporary arrays. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

LessUp merged commit fd98cb7 into master Jul 1, 2026
1 check failed

LessUp deleted the fix/scan-correctness-and-optimizations branch July 1, 2026 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: scan prefix-sum correctness, binding hazards, and GPU dispatch optimizations#21

fix: scan prefix-sum correctness, binding hazards, and GPU dispatch optimizations#21
LessUp merged 1 commit into
masterfrom
fix/scan-correctness-and-optimizations

LessUp commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

LessUp commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant