Skip to content

connection_pool: fix excessive idle CPU usage while waiting for requests#336

Open
ThCompiler wants to merge 2 commits into
tarantool:masterfrom
ThCompiler:master
Open

connection_pool: fix excessive idle CPU usage while waiting for requests#336
ThCompiler wants to merge 2 commits into
tarantool:masterfrom
ThCompiler:master

Conversation

@ThCompiler

@ThCompiler ThCompiler commented Apr 15, 2026

Copy link
Copy Markdown

While using ConnectionPool, it was found that it uses 100% cpu for its threads when at rest. After examining its operation, it was realized that the _request_process_loop runs in an infinite loop, without blocking on waiting for the next request. To reduce the load, I suggest using already implemented wait for the next element in the queue's from package queue .

To check the solution, I implemented a simple test that calculates the n-th element of the Fibonacci sequence using a ConnectionPool running in the background:

import queue
import time

from tarantool import Connection
from tarantool.connection_pool import ConnectionPool as OriginalConnectionPool

def fib(n):
    a = 0
    b = 1
    for _ in range(n):
        a, b = b, a + b
    return a


class ConnectionPool(OriginalConnectionPool):
    # Override of the standard pool loop, which loaded the processor at 100% 
    # with an infinite constant loop
    def _request_process_loop( # noqa: max-complexity=8
        self, key, unit, last_refresh,
    ):
        """
        Request process background loop for a pool server. Started in
        a separate thread, one thread per server.

        :param key: Result of
            :meth:`~tarantool.connection_pool._make_key`.
        :type key: :obj:`str`

        :param unit: Server metainfo.
        :type unit: :class:`~tarantool.connection_pool.PoolUnit`

        :param last_refresh: Time of last metainfo refresh.
        :type last_refresh: :obj:`float`
        """

        while unit.request_processing_enabled:
            try:
                task = unit.input_queue.get(timeout=self.refresh_delay)
            except queue.Empty:
                task = None

            if task:
                method = getattr(Connection, task.method_name)
                try:
                    resp = method(unit.conn, *task.args, **task.kwargs)
                except Exception as exc:
                    unit.output_queue.put(exc)
                else:
                    unit.output_queue.put(resp)

            now = time.time()
            if now - last_refresh > self.refresh_delay:
                self._refresh_state(key)
                last_refresh = time.time()

def test_bench_updated_pool(benchmark):
    pool = ConnectionPool(
        addrs=[
            {
                'host': '127.0.0.1',
                'port': 3301,
            },
            {
                'host': '127.0.0.1',
                'port': 3302,
            },
            {
                'host': '127.0.0.1',
                'port': 3303,
            },
        ],
        user='guest',
    )

    benchmark(fib, 1000000)

    pool.close()

def test_bench_original_pool(benchmark):
    pool = OriginalConnectionPool(
        addrs=[
            {
                'host': '127.0.0.1',
                'port': 3301,
            },
            {
                'host': '127.0.0.1',
                'port': 3302,
            },
            {
                'host': '127.0.0.1',
                'port': 3303,
            },
        ],
        user='guest',
    )

    benchmark(fib, 1000000)

    pool.close()

I ran the test with the changes and the old ConnectionPool behavior. So i got the following test results:
image

I also looked at the distribution of processor activity using py-spy for each of the tests:

  • for current behavior:
image
  • for new behavior:
image

As you can see from the py-spy snapshots, after the change, the processor is almost fully used for calculating Fibonacci sequence, while with the current behavior, most of the time is spent on _request_process_loop.

@bigbes bigbes added the full-ci label Jun 22, 2026

@bigbes bigbes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines -663 to +666
if not unit.input_queue.empty():
task = unit.input_queue.get()
try:
task = unit.input_queue.get(timeout=self.refresh_delay)
except queue.Empty:
task = None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a nice fix, let's add a CHANGELOG entry about its effect.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added entry about changes to CHANGELOG.

while unit.request_processing_enabled:
if not unit.input_queue.empty():
task = unit.input_queue.get()
try:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message codestyle should be something like

connection_pool: fix cpu overload if stale

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated commit message in accordance with code style.

@oleg-jukovec oleg-jukovec left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, update the commit message (see above) and add a new entry to the CHANGELOG.md.

@ThCompiler ThCompiler changed the title Fix cpu overload when working with connection_pool connection_pool: fix excessive idle CPU usage while waiting for requests Jun 22, 2026
@ThCompiler

Copy link
Copy Markdown
Author

I had to update the build os version for the plugin to 24.04, as it stopped supporting 20.04. 24.04 was selected based on a similar version update in #330.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants