fix: scan prefix-sum correctness, binding hazards, and GPU dispatch optimizations#21
Merged
Merged
Conversation
…dispatch Critical correctness fixes: - Fix add_block_prefixes shader only processing half the elements (1 per thread instead of 2), which broke radix sort for arrays > 8192 elements. Now matches blelloch_scan's 2-elements-per-thread layout. - Fix scan_block_sums WebGPU binding hazard: the same blockSumsBuffer was bound as both read-only-storage (binding 0) and read_write storage (binding 1/2), which is a validation error. Each pipeline now has a dedicated bind group layout with only the bindings it uses. - Fix scan_block_sums 512-block limit: for arrays > ~4M elements the single-workgroup block-sum scan silently skipped excess blocks. Replaced with a recursive multi-level scan that handles arbitrarily large inputs. Architecture improvements: - ScanModule: three dedicated bind group layouts (scanLayout, blockSumsScanLayout, addPrefixesLayout) instead of one shared layout, eliminating all read-only/read-write binding conflicts. Performance optimizations: - BitonicSorter: batch all bitonic passes into a single command encoder with copyBufferToBuffer for uniform updates, reducing queue submissions from 100+ to 1 for large arrays. - RadixSorter: reuse zero-histogram buffer across passes instead of allocating a new Uint32Array per pass. - Benchmark: preallocate GPU buffers per target size so iterations measure steady-state sort performance (buffer reuse) not allocation overhead. - fillRandomUint32Array: fill in-place via subarray instead of allocating and copying per-chunk temporary arrays. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See commit message for full details. Fixes critical scan bugs (half-element add_block_prefixes, binding hazard, 512-block limit) and optimizes GPU dispatch.