Skip to content

Remove global rmem page slab#392

Open
ianks wants to merge 1 commit into
msgpack:masterfrom
ianks:ractor-safe-rmem
Open

Remove global rmem page slab#392
ianks wants to merge 1 commit into
msgpack:masterfrom
ianks:ractor-safe-rmem

Conversation

@ianks

@ianks ianks commented Jun 10, 2026

Copy link
Copy Markdown

The notable change here is the removal of the msgpack_rmem_*` routines, which serve as a mechanism for efficiently providing chunks of memory for decoding work.

Why is it not ractor safe?

The old page-recycling slab is a process-global msgpack_rmem_t mutated through an unsynchronized bitmask, so parallel Ractors would race on it.

How did you address this?

Drop the global slab entirely and use plain xmalloc/xfree. Modern arena-based mallocs (i.e. jemalloc) are good at recycling and avoiding thread contention, so maintaining a custom slab allocator is not worth it.

Did you try alternatives?

Yes:

  • Per-instance slab on the factory - turned out to be slightly slower, and more memory intensive

No:

  • Lock-free concurrent linked list - too much work for something may not show much improvement over malloc(3)

Perf (local HTTP requests)

⚠️ TAKE THESE WITH A BIG GRAIN OF SALT, THE ERROR BARS ARE BIG
⚠️ THE GOAL OF THIS PR IS RACTOR COMPATIBILITY NOT PERF

Single-threaded, jemalloc 5.3, decode-heavy workload

On a realistic, "real" HTTP request benchmark, bare-xmalloc is maybe a touch faster, but mostly noise in the diff:

warm, paired (no-slab this PR vs slab, not ractor-safe) Δ 95% CI
wall time −0.1% −0.8% to +4.9%
CPU −0.3% −1.9% to +3.9%
allocations identical

RSS Impact

Memory usage seems fine as well, with the bare-xmalloc (this PR) having a higher peak as decay time increases (this is expected, and not a problem).

sustained decode slab (today) xmalloc (this PR)
dirty_decay_ms:10000 (default) 43 MiB 102 MiB
dirty_decay_ms:1000 33 MiB 34 MiB
peak (live working set) ~113 MiB ~114 MiB

Microbenchmarks (ruby --yjit)

Generated by /tmp/msgpack_format_yjit_pr_body.py from raw benchmark/ips output; table values were parsed, not hand-entered.

  • Ruby: ruby 4.0.4 (2026-05-12 revision b89eb1bcbf) +YJIT +PRISM [arm64-darwin25]
  • Command per ref: bundle exec rake compile && bundle exec ruby --yjit -Ilib -Iext <generated bench>
  • Benchmark config: benchmark/ips warmup 15s, measurement 60s per case
  • Started: 2026-06-15T22:14:36Z; finished: 2026-06-15T22:29:47Z
  • origin/master: 09c914d
  • PR branch: b82fefc
benchmark origin/master PR branch delta
pack-plain 5.729M i/s (±2.7%) 5.349M i/s (±1.3%) -6.6%
pack-structured 2.873M i/s (±2.8%) 2.787M i/s (±1.2%) -3.0%
pack-extended 2.077M i/s (±1.8%) 1.913M i/s (±2.1%) -7.9%
unpack-plain 4.809M i/s (±1.1%) 4.553M i/s (±1.4%) -5.3%
unpack-structured 1.100M i/s (±1.6%) 1.058M i/s (±1.2%) -3.8%
unpack-extended 1.523M i/s (±3.2%) 1.462M i/s (±2.6%) -4.0%
Raw benchmark output

origin/master

Warming up --------------------------------------
          pack-plain   500.901k i/100ms
     pack-structured   274.579k i/100ms
       pack-extended   201.774k i/100ms
        unpack-plain   437.897k i/100ms
   unpack-structured   108.509k i/100ms
     unpack-extended   149.952k i/100ms
Calculating -------------------------------------
          pack-plain      5.729M (± 2.7%) i/s -    343.618M in  60.030490s
     pack-structured      2.873M (± 2.8%) i/s -    172.436M in  60.066053s
       pack-extended      2.077M (± 1.8%) i/s -    124.696M in  60.046588s
        unpack-plain      4.809M (± 1.1%) i/s -    288.574M in  60.010534s
   unpack-structured      1.100M (± 1.6%) i/s -     66.082M in  60.098069s
     unpack-extended      1.523M (± 3.2%) i/s -     91.321M in  60.032914s

PR branch

Warming up --------------------------------------
          pack-plain   419.123k i/100ms
     pack-structured   261.892k i/100ms
       pack-extended   186.273k i/100ms
        unpack-plain   422.506k i/100ms
   unpack-structured   103.562k i/100ms
     unpack-extended   144.592k i/100ms
Calculating -------------------------------------
          pack-plain      5.349M (± 1.3%) i/s -    321.048M in  60.029030s
     pack-structured      2.787M (± 1.2%) i/s -    167.349M in  60.053708s
       pack-extended      1.913M (± 2.1%) i/s -    114.744M in  60.000081s
        unpack-plain      4.553M (± 1.4%) i/s -    273.361M in  60.055928s
   unpack-structured      1.058M (± 1.2%) i/s -     63.484M in  60.031311s
     unpack-extended      1.462M (± 2.6%) i/s -     87.767M in  60.056025s

@ianks ianks force-pushed the ractor-safe-rmem branch from cc53acc to d1c9fce Compare June 10, 2026 20:43
@ianks ianks changed the title Make the C extension Ractor-safe Make the C extension Ractor-safe by removing msgpack_rmem_* slab allocator Jun 10, 2026
@ianks ianks marked this pull request as ready for review June 10, 2026 22:06
@ianks ianks force-pushed the ractor-safe-rmem branch 3 times, most recently from bb0785b to b82fefc Compare June 11, 2026 02:46
@byroot

byroot commented Jun 13, 2026

Copy link
Copy Markdown
Member

So the memory arena is something I've always been dubious of the benefit of, so I wouldn't mind getting rid of it.

That being said, MessagePack needs way more changes than this to be usable in Ractors.

#390 is on my TODO list, I'll come around to it eventually.

@ianks ianks force-pushed the ractor-safe-rmem branch from b82fefc to 335b893 Compare June 15, 2026 22:53
The page-recycling slab was a process-global msgpack_rmem_t mutated through an unsynchronized bitmask, so concurrent packing or unpacking could race on it.

Drop the slab and serve rmem pages from xmalloc/xfree instead. Modern arena-based mallocs are good at recycling these allocations without maintaining process-global mutable state in msgpack-ruby.
@ianks

ianks commented Jun 15, 2026

Copy link
Copy Markdown
Author

@byroot Added micro-benchmarks to the PR body. On m4 pro with jemalloc, it's slighly faster to just plain xmalloc 🤷🏻

Also, I've removed all Ractor references from this PR, so it should be independently mergeable.

Hope you are well!

@ianks ianks force-pushed the ractor-safe-rmem branch from 335b893 to b0bc81c Compare June 15, 2026 22:55
@ianks ianks changed the title Make the C extension Ractor-safe by removing msgpack_rmem_* slab allocator Remove global rmem page slab Jun 15, 2026
@ianks

ianks commented Jun 15, 2026

Copy link
Copy Markdown
Author

That being said, MessagePack needs way more changes than this to be usable in Ractors.

After looking through the C code a bit more, I didn't find any remaining global mutable state that would make rb_ext_ractor_safe(true) obviously unsafe. Curious if you know of anything?

The non-shareable MessagePack::DefaultFactory is a separate ergonomics concern IME, which can be addressed in another PR

I'm tempted to add rb_ext_ractor_safe(true) back if you agree the rest of the C code Ractor-safe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants