MSOP-7727 added Redisques batch queue support by ZhengXinCN · Pull Request #770 · swisspost/gateleen

ZhengXinCN · 2026-06-24T09:39:10Z

added supports of batch queue from Redisques

Kusig · 2026-06-25T16:17:12Z

Please await review from @mcweba

mcweba · 2026-06-26T13:17:36Z

Could you please add some documentation what this feature is about? How is the batchQueue property used, etc. It is quite hard to make a review when not knowing what's it about.

Have you took a look at the gateleen-packing module? Isn't this what you are trying to achieve?

ZhengXinCN · 2026-06-29T02:03:41Z

@mcweba This looks like packing module, but it use for dequeue, vertx-redisques normally send queue items one by one, but now it supports send many queue items in a batch. yes I will add more documents

codecov · 2026-06-29T02:50:36Z

Codecov Report

❌ Patch coverage is 65.78947% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.67%. Comparing base (23051b2) to head (321e3d7).
⚠️ Report is 53 commits behind head on develop.

Files with missing lines	Patch %	Lines
...isspush/gateleen/queue/queuing/QueueProcessor.java	65.78%	11 Missing and 2 partials ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##             develop     #770      +/-   ##
=============================================
+ Coverage      49.35%   49.67%   +0.32%     
- Complexity      2016     2045      +29     
=============================================
  Files            244      244              
  Lines          12771    12893     +122     
  Branches        1368     1392      +24     
=============================================
+ Hits            6303     6405     +102     
- Misses          5882     5890       +8     
- Partials         586      598      +12

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mcweba · 2026-06-29T14:34:20Z

AI helped me to list potential problems with this new feature:

Potential Problems with Batched Queue Feature

1. Silent Data Loss

// Only first item's routing info used - rest silently discarded
JsonObject baseJsonObject = queueItems.getJsonObject(0).copy();

No warning if items have different URIs, methods, or headers
No validation that items are compatible
Debugging nightmare: "Why did my DELETE go to /users/1 as a PUT?"

2. Atomicity Issues

Scenarios

Merged request fails -> All N items go back to queue and retry together
Backend partially processes -> Gateleen sees success, but some items silently failed
One bad payload -> Entire batch rejected

No per-item success/failure tracking!

3. Error Attribution Impossible

If the backend returns 400 Bad Request:

Which of the N payloads was invalid?
Logs show one error, not which original queue item caused it
Retry will fail repeatedly with same bad item in the batch

4. Payload Format Assumptions

JsonObject payloadObject = new JsonObject(new String(Base64Unit.decodeBase64Safe(...)));
payloadArray.add(payloadObject);

Assumes all payloads are JSON objects - breaks if payload is a string, number, or array
Assumes payloads are valid JSON - no try-catch around decode
Binary payloads not supported

5. Expiry Check Flawed

if (ExpiryCheckHandler.isExpired(queuedRequest.getHeaders(), jsonRequest.getLong(QueueClient.QUEUE_TIMESTAMP))) {

Uses first item's timestamp for expiry check
If first item is fresh but others expired → expired items still get processed
If first item expired → fresh items get discarded too

6. Timeout Inheritance

String xTimeout = queuedRequest.getHeaders().get("x-timeout");

Uses first item's x-timeout header
Batch of 100 items might need more time than single item's timeout
No timeout scaling based on batch size

7. Circuit Breaker Distortion

performCircuitBreakerActions(queueName, queuedRequest, SUCCESS, state);

One success/failure recorded for N items
Statistics become meaningless (1 success could mean 1 or 1000 items)
Circuit may stay closed despite high individual failure rate

8. Monitoring Inaccuracy

if(monitoringHandler != null) {
    monitoringHandler.updateDequeue();  // Called once, not N times
}

Dequeue count underreported by factor of batch size
Throughput metrics completely wrong
Capacity planning based on bad data

9. Size/Limit Risks

No check on merged payload size
Could exceed HTTP body limits
Could cause OOM if batching thousands of items
Backend might reject oversized requests

10. Tight Backend Coupling

Backend must accept JSON arrays
Backend must handle batch semantics (all-or-nothing vs partial)
Changes how backend reports errors
Not a drop-in optimization - requires backend changes

11. Queue Semantics Violation

Normal queue contract: "Each item processed independently"
Batched queue: Items are coupled - success/failure/retry happens together. This breaks assumptions code might have about queue behavior.

12. No Batch-Level Error Handling

} catch (Exception exception) {
    log.error("Could not build batched request: {} ...", exception.getMessage());
    message.reply(new JsonObject().put(STATUS, ERROR).put(MESSAGE, exception.getMessage()));
    return;
}

If one item's payload can't be decoded, entire batch fails
No option to skip bad items and process the rest

Summary

The feature trades correctness and observability for throughput. It's only safe when:

All items are truly homogeneous (same destination, method, headers)
Backend is designed for bulk operations
You accept all-or-nothing batch semantics
Monitoring/alerting accounts for batch sizes

Honestly, I see more problems than benefits with this solution. Consuming single queue items over the EventBus should be fast. Why don't you just implement a custom queue processor (maybe in Houston/Eagle) to consume multiple queue items and process them as a batch?

mcweba

See last comment

Kusig · 2026-06-29T20:36:55Z

Thanks for helpful comments. For sure the technical issues must be taken in account. We discussed in deep about different solutions for this dedicated use case. For some cases it might really make sense where as for others it doesn't. And all the mentioned drawbacks must be properly described in the usage guide where no common solutions are possible.

In the case of about 1000's of messages per second and multiple Pods all receiving them, the receiption by item makes simply no more sense and causes a lot of non necessary network traffic (yes, we hit limitations there). It simply as well get's to a maximum number of messages which is much lower that when sending in batches in dedicated use case.

Just some first thoughts about:

10/11 and others) The client decides to consume this way and therefore must be changed and prepared anyway. If the client does not request it, it must not change at all the way the system behaves now.

1/2/ and others) Logging of invalid item belongs to the consuming client. It is totally foreseen that all items in the batch are retried. If a client can't deal with that, it should simply not use it :-)

For sure the problem with different methods or other differences must be resolved and clarified.
If all messages in a batch are expired, the batch should expire at all and be discarded. Therefore, the last message is relevant, not the first one.
The client can set the timeout on its registered listeners and therefore could deal perfectly with that itself and tune it as needed I think.

About the proposed alternative, this simply makes no difference, most of the mentioned problems remain and the message bus load remains high which we wan't to get down as well with this for certain dedicated well matching use cases.

Proper documentation of the remaining drawbacks is mandatory of course.

ZhengXinCN · 2026-06-30T03:21:56Z

@mcweba

In the use case I have for this case, there only PUT. (Do I need group them by method?)

2.1 that is by design
2.2 that is depends on endpoint controller, just return non 20X will reject all
2.3 I didn't see any problem

this can be improved at backend side
for single (normal) also do the same, I didn't see different
I can add this
I think this need adjust when enqueue, not dequeue
Can be improved
Can be improved
there have a max limit at Redisques side, user need take care about this (documentation missing)
This is already by design, backend needs change
The related service use batch queues, need adjuest
Can be improved at backend

mcweba · 2026-06-30T09:33:08Z

@ZhengXinCN

Your specific case may only have PUT requests. When we add such features to Gateleen it should work for all cases. Generally spoken when working on gateleen or vertx-redisques we/you really have to take a look at the whole picture. Changes may work for a specific case but break everything else. When this happens in production, we are screwed. For this point I would suggest to only allow queue items with the same http method in a batch.
Timeout handling is not done on enqueue but on dequeue. When there is a queue item for long time in the queue the processing of the queue item has to be cancelled when it's expired.

As @Kusig mentioned, we need a really good documentation including drawbacks and potential problems.

MSOP-7727 added Redisques batch queue support

3cfc2d1

ZhengXinCN requested a review from srudin June 24, 2026 09:39

Kusig requested changes Jun 24, 2026

View reviewed changes

Comment thread gateleen-queue/src/main/java/org/swisspush/gateleen/queue/queuing/QueueProcessor.java Outdated

Comment thread gateleen-queue/src/main/java/org/swisspush/gateleen/queue/queuing/QueueProcessor.java Outdated

srudin self-assigned this Jun 25, 2026

MSOP-7727 fixed comments

d7cca21

ZhengXinCN requested a review from Kusig June 25, 2026 06:19

Kusig requested review from hiddenalpha and mcweba June 25, 2026 16:15

Kusig approved these changes Jun 25, 2026

View reviewed changes

Xin Zheng added 2 commits June 29, 2026 09:16

MSOP-7727 added documents

5858c93

MSOP-7727 try to stable tests

321e3d7

ZhengXinCN requested a review from Kusig June 29, 2026 03:03

mcweba requested changes Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MSOP-7727 added Redisques batch queue support#770

MSOP-7727 added Redisques batch queue support#770
ZhengXinCN wants to merge 4 commits into
developfrom
MSOP-7727-handle-the-batch-queue-items

ZhengXinCN commented Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

Kusig commented Jun 25, 2026

Uh oh!

mcweba commented Jun 26, 2026 •

edited

Loading

Uh oh!

ZhengXinCN commented Jun 29, 2026

Uh oh!

codecov Bot commented Jun 29, 2026

Uh oh!

mcweba commented Jun 29, 2026

Uh oh!

mcweba left a comment

Uh oh!

Kusig commented Jun 29, 2026

Uh oh!

ZhengXinCN commented Jun 30, 2026

Uh oh!

mcweba commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ZhengXinCN commented Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

Kusig commented Jun 25, 2026

Uh oh!

mcweba commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZhengXinCN commented Jun 29, 2026

Uh oh!

codecov Bot commented Jun 29, 2026

Codecov Report

Uh oh!

mcweba commented Jun 29, 2026

Potential Problems with Batched Queue Feature

1. Silent Data Loss

2. Atomicity Issues

3. Error Attribution Impossible

4. Payload Format Assumptions

5. Expiry Check Flawed

6. Timeout Inheritance

7. Circuit Breaker Distortion

8. Monitoring Inaccuracy

9. Size/Limit Risks

10. Tight Backend Coupling

11. Queue Semantics Violation

12. No Batch-Level Error Handling

Summary

Uh oh!

mcweba left a comment

Choose a reason for hiding this comment

Uh oh!

Kusig commented Jun 29, 2026

Uh oh!

ZhengXinCN commented Jun 30, 2026

Uh oh!

mcweba commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mcweba commented Jun 26, 2026 •

edited

Loading