Conversation
The migration functions shuffle rows between pg_catalog.pg_largeobject and the lolor tables. The native side is a system catalog that logical decoding never streams, so each node holds an independent set of native large objects and the migration can only ever be performed per node. The lolor side, however, consists of ordinary replicated tables, so the bulk migration DML used to leak to subscribers: migrate_from_native() fanned out rows that remote nodes must produce themselves, and migrate_to_native() replicated its deletes while the re-created native objects stayed local, destroying large objects on the subscriber side. When spock is installed, run both functions under spock.repair_mode() so the migration DML is excluded from logical decoding. Enter repair mode only when spock is fully operational (extension installed, function present, library preloaded) and leave it again right after the migration DML, so statements later in the caller's transaction replicate normally. The pg_extension check also prevents a planted 'spock' schema from being executed with superuser rights. Without spock the DML cannot be excluded from decoding, so refuse to migrate while logical replication slots exist in the database: migrate_from_native() warns and returns -1 (0 stays reserved for "nothing to migrate"), migrate_to_native() raises an error, which also makes DROP EXTENSION fail instead of silently dropping the lolor tables with the objects still inside. After a successful migration, emit a NOTICE reminding the operator to run the same migration on each replica before modifying large objects. Update README, docs and release notes accordingly: the previous "migrate on one node and let the rows replicate" model is replaced by the node-local model, orchestrated e.g. via spock.replicate_ddl().
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds logical replication awareness to large-object migration functions by detecting replication slots and conditionally using ChangesReplication-safe large-object migration with spock repair mode
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| -- Without spock the migration DML cannot be excluded from logical decoding, | ||
| -- so if any logical replication slot exists the migrated rows would leak to | ||
| -- subscribers. | ||
| IF EXISTS (SELECT 1 FROM pg_catalog.pg_extension WHERE extname = 'spock') |
There was a problem hiding this comment.
spock.repair_mode only suppresses spock's own output plugin. Any non-spock logical slot in this database (pgoutput from a vanilla subscription, wal2json, debezium) will still receive the bulk migration DML on commit.
There was a problem hiding this comment.
That's why this code is a no-op for non-spock configurations
There was a problem hiding this comment.
My concern is the mixed case: spock installed and a non-spock logical slot also present. There the code takes the spock branch, and repair_mode only silences the spock_output plugin, so the lr_slots_exists refusal never runs and a pgoutput/wal2json/debezium slot still gets the migration on commit. Could we also refuse inside the spock branch when a logical slot exists with plugin <> 'spock_output'?
There was a problem hiding this comment.
Correction on the filter I suggested: spock treats a slot as its own when the plugin is either 'spock_output' or 'spock' (see spock_rpc.c, is_spock_slot check). So the condition should be plugin NOT IN ('spock_output', 'spock'), otherwise a legacy 'spock'-plugin slot would be wrongly refused. The point about foreign slots still receiving the DML stands.
There was a problem hiding this comment.
Hmm, I don't see in the code such a check as well as anything like is_spock_slot routine. It seems to me that the plugin should have a standardised name.
Also, as I see, functions in the spock_sync.c never check the plugin name, only slot name. It seems unsafe.
Anyway, it is not hard to introduce such a filter if we find any evidence that it might be registered with an alternative name.
There was a problem hiding this comment.
Fair on the name (it's spock_remote_slot_active, not is_spock_slot) and current spock always creates slots as spock_output (spock_sync.c:354/623/642). The evidence for the alias is spock's own slot handling: cleanup at spock_functions.c:347 and validation at spock_rpc.c:227-228, both 'plugin = spock_output OR plugin = spock'. So matching both just mirrors how spock itself identifies its slots for legacy cases.
spock.repair_mode() only suppresses decoding by spock's own output plugin. A logical replication slot using any other plugin (pgoutput, wal2json, decoderbufs, ...) still decodes the bulk migration DML, even when spock is fully operational. For migrate_to_native() this is a data-loss hazard: the consumer would replicate the lolor deletes but not the re-created native large objects, losing LOs on the subscriber. For migrate_from_native() it leaks the node-local row shuffling to that subscriber. Detect such foreign slots and refuse to migrate. migrate_to_native() raises an ERROR (a silently ignored failure could lose LOs); migrate_from_native() is non-destructive, so it returns the soft -1 sentinel alongside a WARNING and leaves the native objects untouched for a retry. Both point the operator at the offending slots via a HINT. Update the header comments of both functions accordingly.
The migration functions shuffle rows between pg_catalog.pg_largeobject and the lolor tables. The native side is a system catalog that logical decoding never streams, so each node holds an independent set of native large objects and the migration can only ever be performed per node. The lolor side, however, consists of ordinary replicated tables, so the bulk migration DML used to leak to subscribers: migrate_from_native() fanned out rows that remote nodes must produce themselves, and migrate_to_native() replicated its deletes while the re-created native objects stayed local, destroying large objects on the subscriber side.
When spock is installed, run both functions under spock.repair_mode() so the migration DML is excluded from logical decoding. Enter repair mode only when spock is fully operational (extension installed, function present, library preloaded) and leave it again right after the migration DML, so statements later in the caller's transaction replicate normally. The pg_extension check also prevents a planted 'spock' schema from being executed with superuser rights.
Without spock the DML cannot be excluded from decoding, so refuse to migrate while logical replication slots exist in the database: migrate_from_native() warns and returns -1 (0 stays reserved for "nothing to migrate"), migrate_to_native() raises an error, which also makes DROP EXTENSION fail instead of silently dropping the lolor tables with the objects still inside.
After a successful migration, emit a NOTICE reminding the operator to run the same migration on each replica before modifying large objects.
Update README, docs and release notes accordingly: the previous "migrate on one node and let the rows replicate" model is replaced by the node-local model, orchestrated e.g. via spock.replicate_ddl().