surface real exception_log cause by mason-sharp · Pull Request #511 · pgEdge/spock

mason-sharp · 2026-06-19T01:56:29Z

spock_apply: surface real exception_log cause, not "unavailable"
Backport the exception_log message-quality work from main, adapted to
v5_STABLE. The files have diverged enough that these do not cherry-pick
cleanly, so the logic was reapplied against the current v5_STABLE code:

ed17523 "exception_log: replace 'unavailable' with informative discard
message" -- collateral rows (those not themselves the failing command)
now carry a message naming the active behaviour and the command that
failed, instead of the opaque placeholder "unavailable".
b32ef95 "spock_apply: surface root cause when a discard is not
attributable to a row" -- when failed_action is 0 (e.g. a COMMIT-time
failure) the captured root cause is surfaced instead of a dangling
command_counter pointer.
1f5738c "fix: clean up exception_log error-propagation code" -- folded
in for the error-propagation fix the two above build on.

Adapting was non-trivial: v5_STABLE's log_insert_exception() dropped the
message via "(failed) ? errmsg : NULL", so the matching row's
initial_error_message never reached the log. That is corrected here so the
real message actually lands.

On top of the backport:

Tag every captured/surfaced message with its SQLSTATE
([SQLSTATE xxxxx] ...), in exception_log and the "caught initial
exception" log line. exception_log has no sqlstate column, so carrying
it inline is the cheapest way to make the root cause unambiguous (e.g.
to tell a real constraint violation from a transient conflict).
Allocate the formatted messages in ApplyOperationContext (reset per
row) rather than the long-lived TopTransactionContext current on the
exception path, so logging every discarded row of a large transaction
does not accumulate.

Diagnostic change only: discard/disable/LSN behaviour is unchanged.
Connection-class errors still rethrow before this path (b75cb4f) and never
reach exception_log.

Test: 013_exception_handling.pl Part 5 updated to the new contract -- no
"unavailable"; failing row carries its SQLSTATE; bystander rows carry the
informative discard message.

Backport the exception_log message-quality work from main, adapted to v5_STABLE. The files have diverged enough that these do not cherry-pick cleanly, so the logic was reapplied against the current v5_STABLE code: - ed17523 "exception_log: replace 'unavailable' with informative discard message" -- collateral rows (those not themselves the failing command) now carry a message naming the active behaviour and the command that failed, instead of the opaque placeholder "unavailable". - b32ef95 "spock_apply: surface root cause when a discard is not attributable to a row" -- when failed_action is 0 (e.g. a COMMIT-time failure) the captured root cause is surfaced instead of a dangling command_counter pointer. - 1f5738c "fix: clean up exception_log error-propagation code" -- folded in for the error-propagation fix the two above build on. Adapting was non-trivial: v5_STABLE's log_insert_exception() dropped the message via "(failed) ? errmsg : NULL", so the matching row's initial_error_message never reached the log. That is corrected here so the real message actually lands. On top of the backport: - Tag every captured/surfaced message with its SQLSTATE ([SQLSTATE xxxxx] ...), in exception_log and the "caught initial exception" log line. exception_log has no sqlstate column, so carrying it inline is the cheapest way to make the root cause unambiguous (e.g. to tell a real constraint violation from a transient conflict). - Allocate the formatted messages in ApplyOperationContext (reset per row) rather than the long-lived TopTransactionContext current on the exception path, so logging every discarded row of a large transaction does not accumulate. Diagnostic change only: discard/disable/LSN behaviour is unchanged. Connection-class errors still rethrow before this path (b75cb4f) and never reach exception_log. Test: 013_exception_handling.pl Part 5 updated to the new contract -- no "unavailable"; failing row carries its SQLSTATE; bystander rows carry the informative discard message.

Reproduces a firewall-style replication outage: establish replication, block the provider's TCP port with iptables, commit a transaction during the outage (driven over the unix socket, which iptables does not touch), then restore the connection. Asserts the customer-relevant outcomes: rows committed during the outage do not reach the subscriber while blocked; in SUB_DISABLE mode the outage does not disable the subscription; after the block is lifted the subscription returns to replicating with no row loss; and the blip produces no spurious spock.exception_log entries. Whether iptables tears the connection down or merely stalls it is host dependent (over loopback it typically stalls), so the test reports which occurred as diag rather than asserting a specific teardown. Requires iptables usable as root or via passwordless sudo; skips cleanly otherwise. Not in the schedule -- run manually: PERL5LIB=t prove -v t/104_iptables_conn_block.pl

coderabbitai · 2026-06-19T01:56:38Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8ea6c273-3251-4b50-a880-f37e963f5462

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch v5_exception_msg

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-06-19T01:59:08Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity · 0 duplication

Metric Results

Complexity 0

Duplication 0

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

The SQLSTATE prefix added to captured/surfaced exception_log messages is valuable for real error codes (constraint violations, deadlocks, connection failures) but is just noise for a bare elog(ERROR), which defaults to ERRCODE_INTERNAL_ERROR (XX000) -- e.g. spock's own "logical replication did not find row to be updated". errmsg_with_sqlstate() now omits the prefix when the code is XX000 and keeps it otherwise. The "caught initial exception" log line and the initial_error_message capture are routed through the same helper so the rule is applied consistently. Updates the replication_set regress expected output: the not-found-row message no longer carries the XX000 prefix; the collateral discard rows keep their informative "discarded due to exception at command_counter N" text.

rasifr · 2026-06-19T11:28:56Z

+	Assert(error_message != NULL);
+	values[Anum_exception_log_error_message - 1] =


The Assert() here is not needed and should be removed.

mason-sharp added 2 commits June 18, 2026 15:52

mason-sharp requested a review from rasifr June 19, 2026 01:56

rasifr reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

surface real exception_log cause#511

surface real exception_log cause#511
mason-sharp wants to merge 3 commits into
v5_STABLEfrom
v5_exception_msg

mason-sharp commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Review skipped

Uh oh!

codacy-production Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

rasifr Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Assert(error_message != NULL);
		values[Anum_exception_log_error_message - 1] =

Conversation

mason-sharp commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codacy-production Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

rasifr Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

codacy-production Bot commented Jun 19, 2026 •

edited

Loading