Skip to content

CLDSRV-918: raise SDK request timeout in functional test client#6187

Merged
bert-e merged 2 commits into
development/9.3from
bugfix/CLDSRV-918-raise-test-request-timeout
Jun 23, 2026
Merged

CLDSRV-918: raise SDK request timeout in functional test client#6187
bert-e merged 2 commits into
development/9.3from
bugfix/CLDSRV-918-raise-test-request-timeout

Conversation

@tcarmet

@tcarmet tcarmet commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

The AWS SDK v3 migration introduced a 5-second request timeout in the functional test client where SDK v2 previously ran with its 120-second default. Test operations that get no response bytes until the server finishes working - a 50MB server-side part copy, 1000-key batch deletes - regularly exceed 5 seconds of socket silence under CI load, producing Connection timed out after 5000 ms failures on more than a dozen unrelated branches over the past 60 days, including one merge-queue hit.

Raising the timeout to 30 seconds restores the pre-migration behavior within mocha's 40-second global cap: a genuine hang still fails the test, surfacing as the SDK's informative TimeoutError instead of a generic mocha timeout.

Why this also bumps arsenal (ARSN-594, scality/Arsenal#2640):

Raising the timeout uncovered a latent server-crash bug. In @smithy/node-http-handler, a requestTimeout below 6s is registered through request.setTimeout(), which Node cleans up when the request completes. At 6s and above, registration is deferred by 3s; for any request lasting longer than 3s the timer then attaches directly to the socket (socket.setTimeout()) and is never removed when the request finishes. With a keep-alive agent, the pooled socket keeps that stale timer and its handler later destroys the connection out from under whichever request is using it - so the 30s config turns every >3s test request into a delayed, random mid-stream disconnect.

A client vanishing mid-stream then hit a latent bug in arsenal's retrieveData: its once() guard around the iteration callback was created per data-layer callback invocation rather than per iteration, so a double callback during the teardown race ran the iteration callback twice, raising Callback was already called as an uncaughtException and killing the server. This crashed the file-ft-tests suites deterministically (3 out of 3 runs, 200+ cascading ECONNREFUSED failures each). The arsenal fix makes the guard iteration-scoped; a client disconnect can no longer take the server down.

Reference for the previous 120s behavior:

@bert-e

bert-e commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Hello tcarmet,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e

bert-e commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Incorrect fix version

The Fix Version/s in issue CLDSRV-918 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.3.10

  • 9.4.0

Please check the Fix Version/s of CLDSRV-918, or the target
branch of this pull request.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
8483 2 8481 0
View the top 2 failed test(s) by shortest run time
should log correct objectPutRetention operation with all required fields::Server Access Logs - File Output With default signature should log correct objectPutRetention operation with all required fields
Stack Traces | 0.04s run time
Expected 3 log entries, got 4

4 !== 3
should log correct objectGetRetention operation with all required fields::Server Access Logs - File Output With default signature should log correct objectGetRetention operation with all required fields
Stack Traces | 0.041s run time
Expected 4 log entries, got 5

5 !== 4
View the full list of 2 ❄️ flaky test(s)
"after each" hook for "should fail if trying to overwrite a delete marker"::MPU with x-scal-s3-version-id header With default signature "after each" hook for "should fail if trying to overwrite a delete marker"

Flake rate in main: 100.00% (Passed 0 times, Failed 77 times)

Stack Traces | 0.013s run time
We encountered an internal error. Please try again.
"after each" hook for "should fail if trying to overwrite a delete marker"::MPU with x-scal-s3-version-id header With v4 signature "after each" hook for "should fail if trying to overwrite a delete marker"

Flake rate in main: 100.00% (Passed 0 times, Failed 67 times)

Stack Traces | 0.016s run time
We encountered an internal error. Please try again.

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

LGTM

Review by Claude Code

@bert-e

bert-e commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

LGTM

Review by Claude Code

@tcarmet tcarmet force-pushed the bugfix/CLDSRV-918-raise-test-request-timeout branch from 28c72f9 to f908622 Compare June 11, 2026 00:18
@tcarmet tcarmet changed the title CLDSRV-918: raise SDK request timeout for heavy-operation test suites CLDSRV-918: raise SDK request timeout in functional test client Jun 11, 2026
@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

LGTM

Review by Claude Code

@tcarmet tcarmet requested a review from a team June 11, 2026 17:33
@tcarmet tcarmet marked this pull request as ready for review June 11, 2026 17:37
@tcarmet tcarmet requested a review from a team June 11, 2026 17:38
@tcarmet tcarmet marked this pull request as draft June 11, 2026 18:48
Comment thread package.json Outdated
@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown
  • Arsenal dependency pinned to a commit hash instead of a tag (package.json). Tags 8.4.5 and 8.4.6 already exist — pin to a released tag instead.

    The timeout fix in config.js looks correct: 30s is well within mocha's 40s cap, the socketTimeout-to-requestTimeout rename aligns with the NodeHttpHandler API, and the constant avoids drift between the two handler configs.

    Review by Claude Code

Comment thread tests/functional/aws-node-sdk/test/support/config.js Outdated
Heavy server-side operations (multi-megabyte UploadPartCopy, 1000-key
batch deletes) send no response bytes until they complete and can
exceed the SDK request timeout under CI load, failing unrelated tests.
Raise it to 30s - ample headroom while staying under mocha's 40s global
timeout so a genuine hang still surfaces as the SDK TimeoutError.
@tcarmet tcarmet force-pushed the bugfix/CLDSRV-918-raise-test-request-timeout branch from 5a7ea24 to 362f2d9 Compare June 23, 2026 21:28
@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Incorrect fix version

The Fix Version/s in issue CLDSRV-918 contains:

  • 9.3.10

  • 9.4.0

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.3.11

  • 9.4.0

Please check the Fix Version/s of CLDSRV-918, or the target
branch of this pull request.

@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@tcarmet

tcarmet commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

/create_integration_branches

@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Integration data created

I have created the integration data for the additional destination branches.

The following branches will NOT be impacted:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1
  • development/9.2

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

The following options are set: create_integration_branches

@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following options are set: create_integration_branches

@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Conflict

A conflict has been raised during the creation of
integration branch w/9.4/bugfix/CLDSRV-918-raise-test-request-timeout with contents from bugfix/CLDSRV-918-raise-test-request-timeout
and development/9.4.

I have not created the integration branch.

Here are the steps to resolve this conflict:

 git fetch
 git checkout -B w/9.4/bugfix/CLDSRV-918-raise-test-request-timeout origin/development/9.4
 git merge origin/bugfix/CLDSRV-918-raise-test-request-timeout
 # <intense conflict resolution>
 git commit
 git push -u origin w/9.4/bugfix/CLDSRV-918-raise-test-request-timeout

The following options are set: create_integration_branches

@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following options are set: create_integration_branches

@tcarmet

tcarmet commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

/approve

@tcarmet tcarmet closed this Jun 23, 2026
@tcarmet tcarmet reopened this Jun 23, 2026
@tcarmet tcarmet marked this pull request as ready for review June 23, 2026 21:36
@scality scality deleted a comment from bert-e Jun 23, 2026
@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Build failed

The build for commit did not succeed in branch bugfix/CLDSRV-918-raise-test-request-timeout

The following options are set: approve, create_integration_branches

@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Build failed

The build for commit did not succeed in branch w/9.4/bugfix/CLDSRV-918-raise-test-request-timeout

The following options are set: approve, create_integration_branches

@bert-e

bert-e commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/9.3

  • ✔️ development/9.4

The following branches have NOT changed:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1
  • development/9.2

This pull request did not target the following hotfix branch(es) so they
were left untouched:

  • hotfix/7.70.45
  • hotfix/8.8.45
  • hotfix/9.0.7
  • hotfix/7.4.1
  • hotfix/7.10.15
  • hotfix/7.4.0
  • hotfix/7.4.5
  • hotfix/7.8.0
  • hotfix/7.4.2
  • hotfix/7.10.3
  • hotfix/7.4.3
  • hotfix/7.10.27
  • hotfix/7.10.8
  • hotfix/7.10.30
  • hotfix/7.4.7
  • hotfix/9.0.32
  • hotfix/7.4.6
  • hotfix/7.70.51
  • hotfix/9.2.24
  • hotfix/7.10.2
  • hotfix/7.10.49
  • hotfix/7.70.11
  • hotfix/7.70.21
  • hotfix/7.4.9
  • hotfix/7.10.1
  • hotfix/7.9.0
  • hotfix/7.6.0
  • hotfix/6.4.7
  • hotfix/7.2.0
  • hotfix/7.7.0
  • hotfix/7.70.73
  • hotfix/7.10.28
  • hotfix/7.4.10
  • hotfix/7.4.4
  • hotfix/7.10.0
  • hotfix/7.10.4
  • hotfix/7.4.8

Please check the status of the associated issue CLDSRV-918.

Goodbye tcarmet.

The following options are set: approve, create_integration_branches

@bert-e bert-e merged commit 4182382 into development/9.3 Jun 23, 2026
82 of 88 checks passed
@bert-e bert-e deleted the bugfix/CLDSRV-918-raise-test-request-timeout branch June 23, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants