Skip to content

MSOP-7727 added Redisques batch queue support#770

Open
ZhengXinCN wants to merge 4 commits into
developfrom
MSOP-7727-handle-the-batch-queue-items
Open

MSOP-7727 added Redisques batch queue support#770
ZhengXinCN wants to merge 4 commits into
developfrom
MSOP-7727-handle-the-batch-queue-items

Conversation

@ZhengXinCN

Copy link
Copy Markdown
Collaborator

added supports of batch queue from Redisques

@ZhengXinCN ZhengXinCN requested a review from srudin June 24, 2026 09:39
@srudin srudin self-assigned this Jun 25, 2026
@ZhengXinCN ZhengXinCN requested a review from Kusig June 25, 2026 06:19
@Kusig Kusig requested review from hiddenalpha and mcweba June 25, 2026 16:15
@Kusig

Kusig commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@mcweba

mcweba commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Could you please add some documentation what this feature is about? How is the batchQueue property used, etc. It is quite hard to make a review when not knowing what's it about.

Have you took a look at the gateleen-packing module? Isn't this what you are trying to achieve?

@ZhengXinCN

Copy link
Copy Markdown
Collaborator Author

@mcweba This looks like packing module, but it use for dequeue, vertx-redisques normally send queue items one by one, but now it supports send many queue items in a batch. yes I will add more documents

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 65.78947% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.67%. Comparing base (23051b2) to head (321e3d7).
⚠️ Report is 53 commits behind head on develop.

Files with missing lines Patch % Lines
...isspush/gateleen/queue/queuing/QueueProcessor.java 65.78% 11 Missing and 2 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop     #770      +/-   ##
=============================================
+ Coverage      49.35%   49.67%   +0.32%     
- Complexity      2016     2045      +29     
=============================================
  Files            244      244              
  Lines          12771    12893     +122     
  Branches        1368     1392      +24     
=============================================
+ Hits            6303     6405     +102     
- Misses          5882     5890       +8     
- Partials         586      598      +12     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ZhengXinCN ZhengXinCN requested a review from Kusig June 29, 2026 03:03
@mcweba

mcweba commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

AI helped me to list potential problems with this new feature:

Potential Problems with Batched Queue Feature

1. Silent Data Loss

// Only first item's routing info used - rest silently discarded
JsonObject baseJsonObject = queueItems.getJsonObject(0).copy();

  • No warning if items have different URIs, methods, or headers
  • No validation that items are compatible
  • Debugging nightmare: "Why did my DELETE go to /users/1 as a PUT?"

2. Atomicity Issues

Scenarios

  • Merged request fails -> All N items go back to queue and retry together
  • Backend partially processes -> Gateleen sees success, but some items silently failed
  • One bad payload -> Entire batch rejected

No per-item success/failure tracking!

3. Error Attribution Impossible

If the backend returns 400 Bad Request:

  • Which of the N payloads was invalid?
  • Logs show one error, not which original queue item caused it
  • Retry will fail repeatedly with same bad item in the batch

4. Payload Format Assumptions

JsonObject payloadObject = new JsonObject(new String(Base64Unit.decodeBase64Safe(...)));
payloadArray.add(payloadObject);

  • Assumes all payloads are JSON objects - breaks if payload is a string, number, or array
  • Assumes payloads are valid JSON - no try-catch around decode
  • Binary payloads not supported

5. Expiry Check Flawed

if (ExpiryCheckHandler.isExpired(queuedRequest.getHeaders(), jsonRequest.getLong(QueueClient.QUEUE_TIMESTAMP))) {

  • Uses first item's timestamp for expiry check
  • If first item is fresh but others expired → expired items still get processed
  • If first item expired → fresh items get discarded too

6. Timeout Inheritance

String xTimeout = queuedRequest.getHeaders().get("x-timeout");

  • Uses first item's x-timeout header
  • Batch of 100 items might need more time than single item's timeout
  • No timeout scaling based on batch size

7. Circuit Breaker Distortion

performCircuitBreakerActions(queueName, queuedRequest, SUCCESS, state);

  • One success/failure recorded for N items
  • Statistics become meaningless (1 success could mean 1 or 1000 items)
  • Circuit may stay closed despite high individual failure rate

8. Monitoring Inaccuracy

if(monitoringHandler != null) {
    monitoringHandler.updateDequeue();  // Called once, not N times
}
  • Dequeue count underreported by factor of batch size
  • Throughput metrics completely wrong
  • Capacity planning based on bad data

9. Size/Limit Risks

  • No check on merged payload size
  • Could exceed HTTP body limits
  • Could cause OOM if batching thousands of items
  • Backend might reject oversized requests

10. Tight Backend Coupling

  • Backend must accept JSON arrays
  • Backend must handle batch semantics (all-or-nothing vs partial)
  • Changes how backend reports errors
  • Not a drop-in optimization - requires backend changes

11. Queue Semantics Violation

Normal queue contract: "Each item processed independently"
Batched queue: Items are coupled - success/failure/retry happens together. This breaks assumptions code might have about queue behavior.

12. No Batch-Level Error Handling

} catch (Exception exception) {
    log.error("Could not build batched request: {} ...", exception.getMessage());
    message.reply(new JsonObject().put(STATUS, ERROR).put(MESSAGE, exception.getMessage()));
    return;
}
  • If one item's payload can't be decoded, entire batch fails
  • No option to skip bad items and process the rest

Summary

The feature trades correctness and observability for throughput. It's only safe when:

  • All items are truly homogeneous (same destination, method, headers)
  • Backend is designed for bulk operations
  • You accept all-or-nothing batch semantics
  • Monitoring/alerting accounts for batch sizes

Honestly, I see more problems than benefits with this solution. Consuming single queue items over the EventBus should be fast. Why don't you just implement a custom queue processor (maybe in Houston/Eagle) to consume multiple queue items and process them as a batch?

@mcweba mcweba left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See last comment

@Kusig

Kusig commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Thanks for helpful comments. For sure the technical issues must be taken in account. We discussed in deep about different solutions for this dedicated use case. For some cases it might really make sense where as for others it doesn't. And all the mentioned drawbacks must be properly described in the usage guide where no common solutions are possible.

In the case of about 1000's of messages per second and multiple Pods all receiving them, the receiption by item makes simply no more sense and causes a lot of non necessary network traffic (yes, we hit limitations there). It simply as well get's to a maximum number of messages which is much lower that when sending in batches in dedicated use case.

Just some first thoughts about:

10/11 and others) The client decides to consume this way and therefore must be changed and prepared anyway. If the client does not request it, it must not change at all the way the system behaves now.

1/2/ and others) Logging of invalid item belongs to the consuming client. It is totally foreseen that all items in the batch are retried. If a client can't deal with that, it should simply not use it :-)

  1. For sure the problem with different methods or other differences must be resolved and clarified.

  2. If all messages in a batch are expired, the batch should expire at all and be discarded. Therefore, the last message is relevant, not the first one.

  3. The client can set the timeout on its registered listeners and therefore could deal perfectly with that itself and tune it as needed I think.

About the proposed alternative, this simply makes no difference, most of the mentioned problems remain and the message bus load remains high which we wan't to get down as well with this for certain dedicated well matching use cases.

Proper documentation of the remaining drawbacks is mandatory of course.

@ZhengXinCN

Copy link
Copy Markdown
Collaborator Author

@mcweba

  1. In the use case I have for this case, there only PUT. (Do I need group them by method?)

2.1 that is by design
2.2 that is depends on endpoint controller, just return non 20X will reject all
2.3 I didn't see any problem

  1. this can be improved at backend side

  2. for single (normal) also do the same, I didn't see different

  3. I can add this

  4. I think this need adjust when enqueue, not dequeue

  5. Can be improved

  6. Can be improved

  7. there have a max limit at Redisques side, user need take care about this (documentation missing)

  8. This is already by design, backend needs change

  9. The related service use batch queues, need adjuest

  10. Can be improved at backend

@mcweba

mcweba commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

@ZhengXinCN

  1. Your specific case may only have PUT requests. When we add such features to Gateleen it should work for all cases. Generally spoken when working on gateleen or vertx-redisques we/you really have to take a look at the whole picture. Changes may work for a specific case but break everything else. When this happens in production, we are screwed. For this point I would suggest to only allow queue items with the same http method in a batch.

  2. Timeout handling is not done on enqueue but on dequeue. When there is a queue item for long time in the queue the processing of the queue item has to be cancelled when it's expired.

As @Kusig mentioned, we need a really good documentation including drawbacks and potential problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants