fix(NODE-7606): restore serializer performance by removing generator-based iteration#890
fix(NODE-7606): restore serializer performance by removing generator-based iteration#890tadjik1 wants to merge 6 commits into
Conversation
…based iteration The iterative rewrite in 0231603 introduced a ~54% throughput regression on flat documents (flat_bson serialize: ~21 µs vs ~10 µs baseline). Root cause: serializeInto used a generator function (toKvPairs / yield*) to produce key-value pairs, driving the hot loop via Iterator.next(). For every field this paid: generator state-machine transitions, {value,done} heap allocations, and [key,value] pair arrays from Object.entries(). Fix: replace the generator with index-based iteration in SerializationFrame. makeFrame() pre-computes Object.keys() once per plain object and stores the array and a keyIndex counter on the frame. The hot loop reads keys[keyIndex++] and accesses the value directly via property lookup — no allocations per field. Arrays use keyIndex as the numeric index directly. Maps retain an explicit iterator since they lack numeric indexing. Cycle detection and toBSON() semantics are unchanged.
- Replace key.match(regexp) with regexp.test(key) — avoids allocating a match-result array on every key of every object - Guard ignoreKeys.has(key) with key[0] === '$' — all four ignored keys start with '$', so non-dollar keys skip the Set lookup entirely - Remove redundant typeof key === 'string' guard — keys are always strings - Cache value._bsontype in the BSON-type dispatch branch — eliminates 4-5 repeated property reads per BSON-typed value (Long, ObjectId, etc.)
The iterative deserializer rewrite in #886 left several costs that the recursive version did not have: - A per-field recompute of `isArray` (read currentFrame, branch, assign on every key) — replaced with a `currentIsArray` flag updated only when a frame is pushed or popped. - A `setValue` closure invoked once per field that re-derived the destination document from `currentFrame` on every call — replaced with a maintained `currentDest` and a module-level `assignValue`, so the common path is a direct property write with no closure call or branch. - A dead `globalUTFValidation` field written into every frame literal but never read — removed, shrinking each frame allocation (helps nested docs where frame churn dominates). `toPotentialDbRef` is also hoisted to module scope so it is not recreated per deserialize call. Behavior is unchanged; the proto-pollution guard, DBRef promotion, code-with-scope, raw, and array handling are all preserved.
Both the serializer and deserializer carried an explicit array as the frame stack (push/pop plus length/top bookkeeping). This replaces that array with a linked list threaded through a `prev` pointer on each frame: descending into a nested document points the new frame at its parent, and finishing a frame restores `currentFrame = frame.prev`. The deserializer's `nestedParsingStack` array is removed entirely, since it duplicated the `currentFrame` pointer. Suggested by @addaleax during the #886 review as a follow-up to try. Local interleaved A/B measurement (both builds in one process, min of 11 reps per shape) shows serialize consistently ~3% faster on flat/deep/full and deserialize neutral; the array-vs-linked difference is removed allocation and index bookkeeping per nested frame. Behavior is unchanged: deep-nesting, DBRef promotion, code-with-scope, raw, Map, and __proto__ handling all verified.
Commit a22387a dropped the `typeof key === 'string'` guard from the key validation block on the assumption that keys are always strings. That holds for plain objects and arrays, but Map keys can be non-string. Without the guard, a non-string Map key with checkKeys:true reached `key.includes('.')` and threw a TypeError from a different site than the baseline. Restoring the guard makes behavior byte-identical to the pre-regression implementation: non-string Map keys (unsupported by design, NODE-3027) fail in encodeUTF8Into exactly as before, regardless of checkKeys. The Set lookup optimization is unaffected — it remains gated by `key[0] === '$'`.
b6fb5f3 to
be49fe2
Compare
There was a problem hiding this comment.
Pull request overview
This PR restores BSON serializer performance by replacing generator-based key/value iteration with a lower-allocation frame iterator, and applies additional hot-path optimizations in both serialization and deserialization while preserving the iterative (non-recursive) parsing approach introduced in #886.
Changes:
- Replaced generator-based iteration (
toKvPairs) with amakeFrame()helper that precomputes keys once and advances with numeric indices; switched the frame stack from an array to a linked list (prev). - Reduced per-field overhead in the serializer (e.g.,
regexp.test, gated$ignoreKeys check, cached_bsontypelookup). - Reduced per-field overhead in the deserializer by maintaining
currentDest/currentIsArray, removing a dead frame field, and switching the parsing stack to a linked list.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/parser/serializer.ts | Removes generator/Object.entries iteration in favor of indexed frame iteration and linked-list frames; applies small per-field hot-path reductions. |
| src/parser/deserializer.ts | Switches nested frame stack to a linked list and reduces per-field work via maintained destination/array state and helper functions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| throw new BSONVersionError(); | ||
| } else if (value._bsontype === 'ObjectId') { | ||
| } | ||
| const bsontype = value._bsontype; |
There was a problem hiding this comment.
| const bsontype = value._bsontype; | |
| const bsontype = value[bsonType]; |
Might be a really good opportunity for this drive-by cleanup
There was a problem hiding this comment.
Thanks! Replaced with const tag = ... to avoid naming problems.
There was a problem hiding this comment.
else if ('_bsontype' in object && typeof object._bsontype === 'string') {
throw new BSONError(`BSON types cannot be serialized as a document`);
We have this error checking code, I don't think we can/should change it but it's an interesting limitation that's now lifted 😅 we can support { "_bsontype" : "str" } documents for those who like to have fun with their data schema....
There was a problem hiding this comment.
It's not even just necessarily a limitation – I think it would be a reasonable invariant to expect that bson.serialize() never throws if its argument is a return value of bson.deserialize(). I'm pretty sure you this means that for example in the driver, right now there are documents that you can read from the database but not copy/replace back into it.
There was a problem hiding this comment.
Yes another driver could totally insert a document that looks like the example I gave and our driver couldn't round trip it. so, bug ticket?
Are there issues with backwards compat 🤔 like if we lift the prevention of this is there the chance that it could be more likely to accidentally insert a document of this shape when you meant to just have a BSON primitive... need to think on that.
There was a problem hiding this comment.
Yeah, seems like a pre-existing issue - it was there before the PR. Worth a ticket though, I'll file one, thanks!
https://jira.mongodb.org/browse/NODE-7608
Per @addaleax's review suggestion on #890, dispatch on the public `bsonType` symbol (`value[bsonType]`) instead of the `_bsontype` string in serializeInto and the serializeLong/serializeMinMax helpers, so the tag read goes through the documented interface. The local is renamed to `tag` so it doesn't collide with the imported `bsonType` symbol. The checks that guard against duck-typed or foreign objects keep reading the `_bsontype` string: the root guard, the plain-object check, and the "unrecognized _bsontype" catch-all all need to fire for an object that carries a string `_bsontype` (and possibly a forged version symbol) but no `bsonType` symbol. Routing those through the symbol would let such objects slip past, as the existing serializer tests confirm. Output is unchanged, verified byte-identical against the recursive baseline (33 832 differential checks).
Per @addaleax's review suggestion on #890, dispatch on the public `bsonType` symbol (`value[bsonType]`) instead of the `_bsontype` string in serializeInto and the serializeLong/serializeMinMax helpers, so the tag read goes through the documented interface. The local is renamed to `tag` so it doesn't collide with the imported `bsonType` symbol.
e0fea98 to
0954119
Compare
nbbeeken
left a comment
There was a problem hiding this comment.
great work! linked lists are under utilized nice to see it lending a hand :)))
PavelSafronov
left a comment
There was a problem hiding this comment.
Changes look good, approving. Thanks!
Description
Summary of Changes
#886 (NODE-5937) made
serializeIntoiterative and, along the way, started producing each frame's key/value pairs from a generator (toKvPairs). Generators aren't free: every.next()saves and restores execution state and hands back a fresh{value, done}object, andObject.entries()builds a[key, value]array for each field on top of that. On a flat document with 145 fields that's hundreds of throwaway allocations perserialize, and it cost about 54% onflat_bson._serialize_bson.This PR drops the generator.
makeFrame()callsObject.keys()once when the frame is created and the loop walks them by index. Maps keep anentries()iterator since they have no random access, and arrays index straight intokeyIndex.A few smaller changes in the same hot loops:
regexp.test(key)replaceskey.match(regexp), and theignoreKeysSet lookup only runs whenkey[0] === '$'.value._bsontypeis read once per BSON-typed value instead of several times down the dispatch chain.isArrayrecompute becomes a maintainedcurrentIsArray; thesetValueclosure becomes a top-levelassignValue(dest, …)withcurrentDesttracked next tocurrentFrameso the destination is never recomputed per field; and a deadglobalUTFValidationframe field is removed.prevpointer per frame) instead of a separate array, which drops the array push/pop per nested frame (suggested by @addaleax in the fix(NODE-5937): rewrite deserializeObject as iterative, not recursive #886 review).What is the motivation for this change?
flat_bson._serialize_bsonnormalized throughput dropped from ~216 to ~99 after #886 merged. The perf alerting context on theperfbuild variant caught it.Benchmarks
rhel90-dbx-perf-large, Node v22.11.0, 10 000 iterations, 1 000 warmup.normalized_throughput, higher is better.2ea8dbb)0231603)flat_bson._serialize_bsondeep_bson._serialize_bsonfull_bson._serialize_bsonflat_bson._deserialize_bsondeep_bson._deserialize_bsonfull_bson._deserialize_bsonSerialize is back above baseline on all three document shapes. Flat and full deserialize sit within noise of baseline.
https://spruce.corp.mongodb.com/task/node_bson_perf_run_spec_benchmarks_patch_c5d1dd89d933556a9c9880152156d15caa0f293b_6a282aa58fce9500076b6fbc_26_06_09_15_00_55/trend-charts?execution=0
The only metric still under baseline is
deep_bson._deserialize_bson(−7.0%), and that's structural rather than something this PR introduced. deep_bson is 63 documents nested six deep, so the iterative deserializer allocates a frame object for each nested document where the old recursive code just leaned on the call stack. Those allocations are most of the gap. This PR nudges the number up (63.87 to 64.19) and makes deep serialize a lot faster, but it doesn't close it. We can probably claw the rest back with lighter or pooled frames; I left that for a follow-up, since what remains is small and inside the benchmark's run-to-run noise.Driver benchmarks
I also ran the driver's spec benchmark suite end to end with
node-mongodb-nativepointed at this branch (theperformance-testsvariant), comparing against the bson^7.2.0it currently ships, which predates #886 and is the known-good recursive code. Nothing regressed on the BSON-sensitive benchmarks. Serialize came out at parity or faster. Deserialize was at parity apart from a ~6% dip on the 1M-tiny-document aggregate, which is the same per-document setup cost from #886 noted above and not new here. The large-document bulk-insert benchmark bounced between −22% and +5% across runs, with the slow slot moving between it and the single-document variant; a micro-benchmark and the benchmark's own ~8% mainline CV confirm that's noise from the 2.73 MB document, not the serializer.1 run
2 run
Double check the following
npm run check:lint)type(NODE-xxxx)[!]: descriptionfeat(NODE-1234)!: rewriting everything in coffeescript