Skip to content

Queued updates and withdrawable stake#95

Open
awmacpherson wants to merge 2 commits into
ethersphere:masterfrom
awmacpherson:swip-40-41-combined
Open

Queued updates and withdrawable stake#95
awmacpherson wants to merge 2 commits into
ethersphere:masterfrom
awmacpherson:swip-40-41-combined

Conversation

@awmacpherson

Copy link
Copy Markdown

Merged spec from SWIPs 40 and 41, including everything except revert of SWIP-20. Updated to align with implementation at ethersphere/storage-incentives#309.

…o separate per-workflow methods. Lookahead methods semantics and interface defined. Interaction of update enqueue and freeze clarified.
@awmacpherson awmacpherson mentioned this pull request May 14, 2026
@significance

significance commented Jun 9, 2026

Copy link
Copy Markdown
Member

thank you for the SWIP 🙇

as discussed, please can you emphasise the rationale behind withdrawal delay i.e. encouraging predicability / reducing unpredictability

in this vein i think we have agreement that limiting the amount of times an overlay can be changed over some similar period would also be beneficial in this regard without incurring too much friction in terms of neighbourhood rebalancing

i would suggest it were set at once every half or quarter the period of the withdrawal delay, but certainly less, to allow for some margin of error in coordination of node repositioning

it was also mooted that 14 days could well be sufficient delay and that the exact value of that constant and/or mode and means of tuning it would benefit from some more discussion

@awmacpherson

Copy link
Copy Markdown
Author

as discussed, please can you emphasise the rationale behind withdrawal delay i.e. encouraging predicability / reducing unpredictability

Sure, I will expand on this.

in this vein i think we have agreement that limiting the amount of times an overlay can be changed over some similar period would also be beneficial in this regard without incurring too much friction in terms of neighbourhood rebalancing

i would suggest it were set at once every half or quarter the period of the withdrawal delay, but certainly less, to allow for some margin of error in coordination of node repositioning

While I think this is a reasonable direction to explore due to the sync costs excessive node churn could impose on the network, in terms of both motivation and implementation is it independent of the present SWIP. This SWIP does not make any change to the rate at which one can change neighbourhoods (whether staked or not). So I'd argue that this research direction should not block merging this PR. Would you agree?

it was also mooted that 14 days could well be sufficient delay and that the exact value of that constant and/or mode and means of tuning it would benefit from some more discussion

Regardless of what value we choose for the delay parameter now, we ought to establish processes to monitor the behaviour of the network and determine whether the delay parameter is performing as desired or if it ought to be changed. I'll work on adding something on that to the SWIP too.

If we are concerned about committing to a particular withdrawal notice period that may be awkward to change later, one option would be to add an admin function that allows the deployer to change the parameter on the live contract without having to redeploy. However, I would highlight a couple of caveats to this approach:

  1. Unrestricted control of the wait period would give the admin power to effectively rug funds by setting the wait to a very large value. That in turn places a higher target value on the admin keys, e.g. for social engineering attacks. To protect depositors, the admin function ought therefore to have some safeguards that restrict the admin's power, for example:
    • A hardcoded maximum value (30 days?)
    • An enforced wait period before admin-triggered changes to the value come into effect, so that depositors not happy with the new configuration have time to trigger exits or withdrawals.
  2. Implementing such a function does not save us from the need to set up processes for monitoring and deciding when the parameter should be updated. The situation in which it does buy us something is if we do decide there needs to be a retuning and there are no other upcoming updates to the stake registry that would necessitate a redeployment anyway.
  3. We need to decide whether decisions about calling the update function necessitate a SWIP or if it can be conducted more efficiently through some other sufficiently well-defined process.

Understanding these factors, especially (1), means that trying to get this function into the current SWIP could further delay the decision process. My view is that it isn't really necessary right now for the following reasons:

  1. We are probably going to need to update the stake registry again, and further parameter updates can be bundled with that change.
  2. The choice between 14 or 28 days, or even 7 days, probably does not matter all that much at the current network size. The real win here is coming from bringing it down from ∞ to less than a month.

Nonetheless, if you want it I also don't mind adding it.

@awmacpherson

Copy link
Copy Markdown
Author

Posting expanded rationale as a reply first so it's easier to discuss. If it satisfies you @significance, I'll merge the arguments into the SWIP.

TLDR

Reducing the withdrawal notice period attracts more stake but carries more risk of inducing operator behaviour that leads to a loss of service — call this network risk. Identifying the sweet spot depends on being able to measure this risk.

The current proposal text calls for a withdrawal notice period of 28 days. Other contributors have suggested that the wait could be shorter: either 7 or 14 days. Which should we choose?

Assessment. We expect that reducing the notice period from 28 days to 7 days (or any value between 7 and 28 days) will not attract a significant quantity of additional stake or nodes; nor will it significantly decrease the risk of dirty exits. On the other hand, the excess network risk associated with such reduction is hard to quantify: more research is needed. That said, we don't know any a priori reason to think it will be critical.

Recommendation. Dedicate some resources to identifying and monitoring risk signals that could inform our choice of notice period. However, do not block the update on the outcome of such research — we have enough information to agree on a value now and move forward.

We suggest starting at the upper limit of the discussed range, i.e. 28 days, because this has the lowest risk, and we are not yet well equipped to measure how much additional risk would be added by a shorter period.

Analysis

Our proposals for stake withdrawal and exit notice periods argue that the risk of service issues is greater for shorter notice periods because of the following two effects:

  • Filtering. A notice period filters out some operator policies that involve withdrawing stake and potentially also turning off nodes. Specifically, if the wait period is $N$ days, policies that require the operator to withdraw stake less than $N$ days after observing a signal are not possible. Increasing $N$ enlarges the family of policies that are filtered out.

  • Signalling. Broadcast the operator intent to reduce service in advance of the change taking effect, allowing other ecosystem participants to react before the service reduction occurs.

    • Preemptive repair. Advance signalling gives other operators a window in which to repair the service gap that will be left by the exiting nodes before such a gap ever appears. Increasing $N$ enlarges this window, hence increasing the chances that preemptive repair occurs.

    • Price signalling. Storage prices are a function of supply and demand signals available onchain. If there is a function of data available onchain $N$ days in advance that can be used to bound prices above, this makes prices more predictable, improving DX (compare Rapid storage price changes adversely impact UX/DX #86).

      Concretely, in Swarm today, the supply side input to the pricing function is the number of nodes that reveal in each neighbourhood. If there are no dirty exits and all staked nodes reveal each round, a notice period of $N$ days for exits allows us to bound below the number of nodes in each neighbourhood $N$ days into the future. This gives us an upper bound on pricing (assuming fixed demand).

      This reasoning survives some changes to the way prices are quoted in Swarm: as long as the price depends on the number of nodes revealing in each neighbourhood is an input, advance signalling of decreases in these numbers improves price predictability.

There is also a risk effect that increases with longer notice periods:

  • Dirty exit. If the exit wait is too long, operators may choose to turn off nodes without respecting the withdrawal notice period, forgoing expected revenue to which they would otherwise be entitled. This risk increases as $N$ increases, with the extreme being the status quo $N=\infty$ in which only dirty exits are possible. Dirty exit risk could likely be counteracted by introducing explicit penalties for missed rounds.

@awmacpherson

Copy link
Copy Markdown
Author

Just to make things a bit more concrete, here are some candidates for risk signals that we can monitor to evaluate our choice of wait period:

  • Worst case service quality. Smallest number of staked nodes in any neighbourhood.
  • Reaction to market signals. Monitor the relationship at various lags of operator behaviour, especially node exits, to market signals such as the price of BZZ or Swarm network revenue.
  • Repair events. For each replication rate $N\leq 4$, monitor events in which the number of staked nodes in a given neighbourhood drops below $N$. For each such event, record the time to repair (TTR), that is, the length of time before the node count in that neighbourhood returns to $N$.
  • Preemptive repair events. As above, but measure the exit time from when the exit is enqueued rather than executed. Identify events in which repair occurs — i.e. the node count increases — before the exit is executed.
  • Dirty exits. Monitor nodes that trigger an exit but miss reveals before the wait period has elapsed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants