Skip to content

Make large object migration node-local#38

Open
danolivo wants to merge 2 commits into
mainfrom
spoc-583
Open

Make large object migration node-local#38
danolivo wants to merge 2 commits into
mainfrom
spoc-583

Conversation

@danolivo

Copy link
Copy Markdown
Contributor

The migration functions shuffle rows between pg_catalog.pg_largeobject and the lolor tables. The native side is a system catalog that logical decoding never streams, so each node holds an independent set of native large objects and the migration can only ever be performed per node. The lolor side, however, consists of ordinary replicated tables, so the bulk migration DML used to leak to subscribers: migrate_from_native() fanned out rows that remote nodes must produce themselves, and migrate_to_native() replicated its deletes while the re-created native objects stayed local, destroying large objects on the subscriber side.

When spock is installed, run both functions under spock.repair_mode() so the migration DML is excluded from logical decoding. Enter repair mode only when spock is fully operational (extension installed, function present, library preloaded) and leave it again right after the migration DML, so statements later in the caller's transaction replicate normally. The pg_extension check also prevents a planted 'spock' schema from being executed with superuser rights.

Without spock the DML cannot be excluded from decoding, so refuse to migrate while logical replication slots exist in the database: migrate_from_native() warns and returns -1 (0 stays reserved for "nothing to migrate"), migrate_to_native() raises an error, which also makes DROP EXTENSION fail instead of silently dropping the lolor tables with the objects still inside.

After a successful migration, emit a NOTICE reminding the operator to run the same migration on each replica before modifying large objects.

Update README, docs and release notes accordingly: the previous "migrate on one node and let the rows replicate" model is replaced by the node-local model, orchestrated e.g. via spock.replicate_ddl().

The migration functions shuffle rows between pg_catalog.pg_largeobject
and the lolor tables. The native side is a system catalog that logical
decoding never streams, so each node holds an independent set of native
large objects and the migration can only ever be performed per node.
The lolor side, however, consists of ordinary replicated tables, so the
bulk migration DML used to leak to subscribers: migrate_from_native()
fanned out rows that remote nodes must produce themselves, and
migrate_to_native() replicated its deletes while the re-created native
objects stayed local, destroying large objects on the subscriber side.

When spock is installed, run both functions under spock.repair_mode()
so the migration DML is excluded from logical decoding. Enter repair
mode only when spock is fully operational (extension installed,
function present, library preloaded) and leave it again right after the
migration DML, so statements later in the caller's transaction
replicate normally. The pg_extension check also prevents a planted
'spock' schema from being executed with superuser rights.

Without spock the DML cannot be excluded from decoding, so refuse to
migrate while logical replication slots exist in the database:
migrate_from_native() warns and returns -1 (0 stays reserved for
"nothing to migrate"), migrate_to_native() raises an error, which also
makes DROP EXTENSION fail instead of silently dropping the lolor tables
with the objects still inside.

After a successful migration, emit a NOTICE reminding the operator to
run the same migration on each replica before modifying large objects.

Update README, docs and release notes accordingly: the previous
"migrate on one node and let the rows replicate" model is replaced by
the node-local model, orchestrated e.g. via spock.replicate_ddl().
@danolivo danolivo requested a review from mason-sharp June 10, 2026 13:01
@danolivo danolivo self-assigned this Jun 10, 2026
@danolivo danolivo added the bug Something isn't working label Jun 10, 2026
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3e60c60-2700-45fa-81cb-3154dd8ed52e

📥 Commits

Reviewing files that changed from the base of the PR and between 79f3cda and 697605d.

📒 Files selected for processing (1)
  • lolor--1.2.2--1.3.0.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • lolor--1.2.2--1.3.0.sql

📝 Walkthrough

Walkthrough

This PR adds logical replication awareness to large-object migration functions by detecting replication slots and conditionally using spock.repair_mode() to suppress migration DML replication. When spock is unavailable and slots exist, the behavior differs: migrate_from_native() warns and returns -1, while migrate_to_native() raises an exception. Documentation is updated across README, index, and release notes to clarify node-local migration semantics and OID collision handling.

Changes

Replication-safe large-object migration with spock repair mode

Layer / File(s) Summary
Migration function replication safety logic
lolor--1.2.2--1.3.0.sql
lolor.migrate_from_native() and lolor.migrate_to_native() now detect logical replication slots and conditionally enable spock.repair_mode() to prevent migration DML replication. When spock is unavailable and slots exist, migrate_from_native() warns and returns -1, while migrate_to_native() raises an exception with hint. Both functions emit notices indicating local-only execution and re-disable repair mode after migration.
Replication behavior documentation
README.md, docs/index.md, docs/lolor_release_notes.md
Documentation clarifies that spock.repair_mode() suppresses migration DML replication, explains failure/warning behavior when spock is absent and replication slots exist, documents that large-object migration is node-local (native pg_largeobject objects never replicate), and describes OID collision risks for preserved native OIDs versus collision-free behavior for newly created node-encoded objects.

Poem

🐰 Through slots and modes the objects hop,
With spock repair they never stop,
Each node migrates its local store,
No replication, safe and sure!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Make large object migration node-local' accurately summarizes the main change: the migration functions are being redesigned to operate per-node (node-local) rather than cross-node.
Description check ✅ Passed The description comprehensively explains the technical rationale, implementation details with spock.repair_mode(), behavior without spock, and documentation updates, all directly related to the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoc-583

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production

codacy-production Bot commented Jun 10, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@mason-sharp mason-sharp requested a review from ibrarahmad June 15, 2026 15:03
Comment thread lolor--1.2.2--1.3.0.sql
-- Without spock the migration DML cannot be excluded from logical decoding,
-- so if any logical replication slot exists the migrated rows would leak to
-- subscribers.
IF EXISTS (SELECT 1 FROM pg_catalog.pg_extension WHERE extname = 'spock')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spock.repair_mode only suppresses spock's own output plugin. Any non-spock logical slot in this database (pgoutput from a vanilla subscription, wal2json, debezium) will still receive the bulk migration DML on commit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why this code is a no-op for non-spock configurations

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is the mixed case: spock installed and a non-spock logical slot also present. There the code takes the spock branch, and repair_mode only silences the spock_output plugin, so the lr_slots_exists refusal never runs and a pgoutput/wal2json/debezium slot still gets the migration on commit. Could we also refuse inside the spock branch when a logical slot exists with plugin <> 'spock_output'?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction on the filter I suggested: spock treats a slot as its own when the plugin is either 'spock_output' or 'spock' (see spock_rpc.c, is_spock_slot check). So the condition should be plugin NOT IN ('spock_output', 'spock'), otherwise a legacy 'spock'-plugin slot would be wrongly refused. The point about foreign slots still receiving the DML stands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't see in the code such a check as well as anything like is_spock_slot routine. It seems to me that the plugin should have a standardised name.

Also, as I see, functions in the spock_sync.c never check the plugin name, only slot name. It seems unsafe.
Anyway, it is not hard to introduce such a filter if we find any evidence that it might be registered with an alternative name.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair on the name (it's spock_remote_slot_active, not is_spock_slot) and current spock always creates slots as spock_output (spock_sync.c:354/623/642). The evidence for the alias is spock's own slot handling: cleanup at spock_functions.c:347 and validation at spock_rpc.c:227-228, both 'plugin = spock_output OR plugin = spock'. So matching both just mirrors how spock itself identifies its slots for legacy cases.

Comment thread lolor--1.2.2--1.3.0.sql
Comment thread lolor--1.2.2--1.3.0.sql
@danolivo danolivo requested a review from ibrarahmad June 17, 2026 05:44
spock.repair_mode() only suppresses decoding by spock's own output
plugin. A logical replication slot using any other plugin (pgoutput,
wal2json, decoderbufs, ...) still decodes the bulk migration DML, even
when spock is fully operational.

For migrate_to_native() this is a data-loss hazard: the consumer would
replicate the lolor deletes but not the re-created native large objects,
losing LOs on the subscriber. For migrate_from_native() it leaks the
node-local row shuffling to that subscriber.

Detect such foreign slots and refuse to migrate. migrate_to_native()
raises an ERROR (a silently ignored failure could lose LOs);
migrate_from_native() is non-destructive, so it returns the soft -1
sentinel alongside a WARNING and leaves the native objects untouched for
a retry. Both point the operator at the offending slots via a HINT.

Update the header comments of both functions accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants