Summary
A rollup-node follower/sequencer node can get permanently stuck after being restarted during partial sync. On restart, the node restores the execution head from the local execution provider, but the forkchoice state's safe and finalized blocks remain at genesis. The L1 watcher then continues deriving later batches, and the chain orchestrator repeatedly fails with InvalidBatchReorg { safe_block_number: 0, ... }.
This was observed with scrolltech/rollup-node:v1.0.7-rc6 on a custom/dev Scroll-compatible chain using persistent storage.
Environment
- Image:
scrolltech/rollup-node:v1.0.7-rc6
- Deployment: Kubernetes StatefulSet with persistent
/data
- Role: follower / standby sequencer, sequencing enabled but automatic sequencing disabled
- Data source: L1 RPC + blob/S3 batch data
- Discovery: disabled, trusted peers configured
- Source checked locally at commit:
bc3d500
No private keys or node keys are relevant to this issue.
What happened
The node was partially synced. Before restart, it had imported historical derived L2 blocks. After a restart, startup found the last L2 block that existed in the execution node and set the local L2 head to that block:
Checking for L2 head block in EN l2_head_block_number=157478
Checking for L2 head block in EN l2_head_block_number=157477
Checking for L2 head block in EN l2_head_block_number=157476
Found L2 head block in EN l2_head_block_number=157476
Then the engine driver started with this forkchoice state:
Starting engine driver fcs="ForkchoiceState {
head: BlockInfo { number: 157476, hash: 0xbe4fe851ea59ee7cbf959165dc7cb6b45f987fa515d36a62721d358b4fc0cc25 },
safe: BlockInfo { number: 0, hash: 0xf9f7c524dce38b51a4d28ec2f18680773e5ba9d3f5f430d0e05f92cfeb65b1bc },
finalized: BlockInfo { number: 0, hash: 0xf9f7c524dce38b51a4d28ec2f18680773e5ba9d3f5f430d0e05f92cfeb65b1bc }
}"
The L1 watcher then started from a later finalized L1 block:
Starting L1 watcher l1_block_startup_info=FinalizedBlockNumber(14092206)
The next derived batch required continuing around L2 block 157479, but because safe was still genesis, the orchestrator rejected every batch:
Handling derived batch batch_info="BatchInfo { index: 799, hash: 0x57b75e730f5ae14637cababbe13721508c1f965ea019ffdc3fae9bc1938242b4 }" num_blocks=684
Reorging chain to derived block block_number=157479
Encountered error in the chain orchestrator err="InvalidBatchReorg { batch_info: BatchInfo { index: 799, hash: 0x57b75e730f5ae14637cababbe13721508c1f965ea019ffdc3fae9bc1938242b4 }, safe_block_number: 0, derived_block_number: 157479 }"
The same error repeated for later batches with increasing derived block numbers. The node stopped making L2 progress.
RPC status at this point showed:
{
"l1": {
"status": "Syncing",
"latest": 38713685,
"finalized": 38713685,
"processed": 15671705
},
"l2": {
"status": "Synced",
"head": {
"number": 157476,
"hash": "0xbe4fe851ea59ee7cbf959165dc7cb6b45f987fa515d36a62721d358b4fc0cc25"
},
"safe": {
"number": 0,
"hash": "0xf9f7c524dce38b51a4d28ec2f18680773e5ba9d3f5f430d0e05f92cfeb65b1bc"
},
"finalized": {
"number": 0,
"hash": "0xf9f7c524dce38b51a4d28ec2f18680773e5ba9d3f5f430d0e05f92cfeb65b1bc"
}
}
}
A 20s height sample showed no progress:
rpc1=157476 db1=157476
rpc2=157476 db2=157476
delta_rpc=0 delta_db=0
Relevant database state
The rollup DB metadata had:
l1_finalized_block|38713685
l1_latest_block|38713685
l1_processed_block|14151705
l2_head_block|157476
The rollup DB still had safe block records ahead of the execution node's persisted head:
select max(block_number) from l2_block where reverted=0;
157478
But the execution node only had block 157476; blocks 157477 and 157478 were missing from the execution provider after restart:
eth_getBlockByNumber(157476) -> 0xbe4fe851ea59ee7cbf959165dc7cb6b45f987fa515d36a62721d358b4fc0cc25
eth_getBlockByNumber(157477) -> null
eth_getBlockByNumber(157478) -> null
eth_getBlockByNumber(157479) -> null
Expected behavior
After restart during partial sync, the node should recover to a consistent forkchoice state and continue syncing.
In particular, if startup rolls l2_head_block back to the latest block present in the execution provider, it should also make safe/finalized consistent with the recovered execution state, or prune/reconcile rollup DB safe block records that are above the recovered execution head.
The node should not continue with:
head = recovered EN block
safe = genesis
finalized = genesis
when the L1 watcher is going to continue deriving later batches.
Actual behavior
The node starts with a non-genesis head but genesis safe/finalized, then continuously fails with:
InvalidBatchReorg { safe_block_number: 0, derived_block_number: <large number> }
It does not make further L2 progress without manual intervention.
Workaround used
The node was recovered manually by:
- Calling
rollupNodeAdmin_revertToL1Block to rewind rollup-node state to a previous known-good L1 batch boundary.
- Restarting the pod again so the node could replay from that clean point.
After doing this, forkchoice recovered to a consistent state and syncing resumed:
{
"l2": {
"head": { "number": 162854 },
"safe": { "number": 162854 },
"finalized": { "number": 162854 }
}
}
A subsequent 30s sample showed progress again:
rpc1=167868 db1=167452
rpc2=174031 db2=173904
delta_rpc=6163 delta_db=6452
Source locations that look related
Startup finds the latest L2 head block present in the execution provider and updates the FCS head / DB head:
crates/node/src/args.rs around the startup flow that calls ForkchoiceState::from_provider, prepare_l1_watcher_start_info, and then scans for l2_head_block_number in the execution provider.
prepare_l1_watcher_start_info resets processing batches and returns L1 startup info, but does not appear to restore an L2 safe/finalized forkchoice state:
crates/database/db/src/operations.rs around prepare_l1_watcher_start_info.
The DB has a helper to fetch the latest safe L2 block:
crates/database/db/src/operations.rs around get_latest_safe_l2_info.
The orchestrator fails when safe_block_number != derived_block_number - 1:
crates/chain-orchestrator/src/lib.rs around the InvalidBatchReorg check in handle_derived_batch.
Notes
This is easiest to trigger when the execution provider and rollup DB are slightly out of sync at shutdown/restart, for example when the rollup DB has recorded safe L2 block rows that are above the last block actually persisted by the execution provider.
The issue is not related to S3/blob availability or P2P connectivity in this case; blobs were reachable, peers were connected, and the node resumed syncing after the forkchoice/rollup DB state was manually rewound and the pod restarted.
Summary
A
rollup-nodefollower/sequencer node can get permanently stuck after being restarted during partial sync. On restart, the node restores the execution head from the local execution provider, but the forkchoice state'ssafeandfinalizedblocks remain at genesis. The L1 watcher then continues deriving later batches, and the chain orchestrator repeatedly fails withInvalidBatchReorg { safe_block_number: 0, ... }.This was observed with
scrolltech/rollup-node:v1.0.7-rc6on a custom/dev Scroll-compatible chain using persistent storage.Environment
scrolltech/rollup-node:v1.0.7-rc6/databc3d500No private keys or node keys are relevant to this issue.
What happened
The node was partially synced. Before restart, it had imported historical derived L2 blocks. After a restart, startup found the last L2 block that existed in the execution node and set the local L2 head to that block:
Then the engine driver started with this forkchoice state:
The L1 watcher then started from a later finalized L1 block:
The next derived batch required continuing around L2 block
157479, but becausesafewas still genesis, the orchestrator rejected every batch:The same error repeated for later batches with increasing derived block numbers. The node stopped making L2 progress.
RPC status at this point showed:
{ "l1": { "status": "Syncing", "latest": 38713685, "finalized": 38713685, "processed": 15671705 }, "l2": { "status": "Synced", "head": { "number": 157476, "hash": "0xbe4fe851ea59ee7cbf959165dc7cb6b45f987fa515d36a62721d358b4fc0cc25" }, "safe": { "number": 0, "hash": "0xf9f7c524dce38b51a4d28ec2f18680773e5ba9d3f5f430d0e05f92cfeb65b1bc" }, "finalized": { "number": 0, "hash": "0xf9f7c524dce38b51a4d28ec2f18680773e5ba9d3f5f430d0e05f92cfeb65b1bc" } } }A 20s height sample showed no progress:
Relevant database state
The rollup DB metadata had:
The rollup DB still had safe block records ahead of the execution node's persisted head:
But the execution node only had block
157476; blocks157477and157478were missing from the execution provider after restart:Expected behavior
After restart during partial sync, the node should recover to a consistent forkchoice state and continue syncing.
In particular, if startup rolls
l2_head_blockback to the latest block present in the execution provider, it should also makesafe/finalizedconsistent with the recovered execution state, or prune/reconcile rollup DB safe block records that are above the recovered execution head.The node should not continue with:
when the L1 watcher is going to continue deriving later batches.
Actual behavior
The node starts with a non-genesis head but genesis safe/finalized, then continuously fails with:
It does not make further L2 progress without manual intervention.
Workaround used
The node was recovered manually by:
rollupNodeAdmin_revertToL1Blockto rewind rollup-node state to a previous known-good L1 batch boundary.After doing this, forkchoice recovered to a consistent state and syncing resumed:
{ "l2": { "head": { "number": 162854 }, "safe": { "number": 162854 }, "finalized": { "number": 162854 } } }A subsequent 30s sample showed progress again:
Source locations that look related
Startup finds the latest L2 head block present in the execution provider and updates the FCS head / DB head:
crates/node/src/args.rsaround the startup flow that callsForkchoiceState::from_provider,prepare_l1_watcher_start_info, and then scans forl2_head_block_numberin the execution provider.prepare_l1_watcher_start_inforesets processing batches and returns L1 startup info, but does not appear to restore an L2 safe/finalized forkchoice state:crates/database/db/src/operations.rsaroundprepare_l1_watcher_start_info.The DB has a helper to fetch the latest safe L2 block:
crates/database/db/src/operations.rsaroundget_latest_safe_l2_info.The orchestrator fails when
safe_block_number != derived_block_number - 1:crates/chain-orchestrator/src/lib.rsaround theInvalidBatchReorgcheck inhandle_derived_batch.Notes
This is easiest to trigger when the execution provider and rollup DB are slightly out of sync at shutdown/restart, for example when the rollup DB has recorded safe L2 block rows that are above the last block actually persisted by the execution provider.
The issue is not related to S3/blob availability or P2P connectivity in this case; blobs were reachable, peers were connected, and the node resumed syncing after the forkchoice/rollup DB state was manually rewound and the pod restarted.