fix(thread_pool): re-read outstanding work under lock before stopping by sgerbino · Pull Request #339 · cppalliance/capy

sgerbino · 2026-06-25T19:31:27Z

testJoinDrainsWork could intermittently fail: join() returned with posted tasks still queued and never run.

on_work_finished() decremented outstanding_work_ lock-free and decided to stop from that decrement alone. A worker could observe the count transiently reach zero, get preempted before taking the mutex, and then latch stop_ after more work had been posted and join() had begun waiting; join() woke and abandoned the still-outstanding work. The same hole strands a task that suspends and is resumed after the count briefly hits zero, since its run queue is empty while it is in flight.

Keep outstanding_work_ atomic and lock-free on the start path, but have the worker that drives the count to zero re-read it under mutex_ before latching stop_. The re-read observes any on_work_started() that landed in the window after the lock-free decrement, so work started before the decision is never stranded; work whose count is raised after the decision is post-drain and abandoned as before. join() still blocks until the count reaches zero.

Also correct the class example: a bare post() does not register outstanding work, so join() does not wait for it. Use run_async, which holds a work guard for the operation, and document the contract.

testJoinDrainsWork could intermittently fail: join() returned with posted tasks still queued and never run. on_work_finished() decremented outstanding_work_ lock-free and decided to stop from that decrement alone. A worker could observe the count transiently reach zero, get preempted before taking the mutex, and then latch stop_ after more work had been posted and join() had begun waiting; join() woke and abandoned the still-outstanding work. The same hole strands a task that suspends and is resumed after the count briefly hits zero, since its run queue is empty while it is in flight. Keep outstanding_work_ atomic and lock-free on the start path, but have the worker that drives the count to zero re-read it under mutex_ before latching stop_. The re-read observes any on_work_started() that landed in the window after the lock-free decrement, so work started before the decision is never stranded; work whose count is raised after the decision is post-drain and abandoned as before. join() still blocks until the count reaches zero. Also correct the class example: a bare post() does not register outstanding work, so join() does not wait for it. Use run_async, which holds a work guard for the operation, and document the contract.

codecov · 2026-06-25T19:37:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop-2@51116a0). Learn more about missing BASE report.

Additional details and impacted files

@@             Coverage Diff              @@
##             develop-2     #339   +/-   ##
============================================
  Coverage             ?   98.39%           
============================================
  Files                ?       83           
  Lines                ?     4234           
  Branches             ?        0           
============================================
  Hits                 ?     4166           
  Misses               ?       68           
  Partials             ?        0

Flag	Coverage Δ
linux	`98.39% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51116a0...572394a. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cppalliance-bot · 2026-06-25T19:38:02Z

An automated preview of the documentation is available at https://339.capy.prtest3.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-06-25 19:38:01 UTC

sgerbino merged commit 690ab36 into cppalliance:develop-2 Jun 25, 2026
37 checks passed

sgerbino deleted the fix/thread-pool-join-race branch June 25, 2026 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(thread_pool): re-read outstanding work under lock before stopping#339

fix(thread_pool): re-read outstanding work under lock before stopping#339
sgerbino merged 1 commit into
cppalliance:develop-2from
sgerbino:fix/thread-pool-join-race

sgerbino commented Jun 25, 2026

Uh oh!

codecov Bot commented Jun 25, 2026

Uh oh!

cppalliance-bot commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sgerbino commented Jun 25, 2026

Uh oh!

codecov Bot commented Jun 25, 2026

Codecov Report

Uh oh!

cppalliance-bot commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants