Slashing penalty analysis; EIP-7251

https://ethresear.ch/t/slashing-penalty-analysis-eip-7251/16509

Slashing penalty analysis; EIP-7251

by mike & barnabé
august 30, 2023
cdot
tl;dr;
The primary concern around the proposal to increase the MAXIMUM_EFFECTIVE_BALANCE (EIP-7251) is the increased slashing risk taken on by validators with large effective balances. This doc explores the 4 types of penalties incurred by a slashed validator and how these penalties scale with effective balance.

We propose
(i) the initial penalty is fixed to a constant amount or modified to scale sublinearly, and
(ii) the correlation penalty is modified to scale quadratically rather than linearly.

Conversely, we suggest leaving the attestation and inactivity leak penalties unchanged.

The code used to generate the figures in this article is available here. Thanks to Ben Edgington for his ETH2 Book and Vitalik for his Annotated spec – both of which serve as excellent references.
cdot
Motivation
Validator rewards are computed to scale with validator balances, such that an entity with half the stake receives half the rewards. This fairness principle is important to facilitate economic decentralization and avoid economies of scale. However, validator penalties need not exhibit the same dynamics. Penalties are computed today based on slashing the entire balance of all validators participating in an attack involving 1/3+ of the stake and preventing the strategic use of intentional slashing to avoid incurring inactivity penalties. Slashed amounts function as a credible threat, and while it is important to levy high penalties for critical attacks, we also don’t want to deter socially beneficial participation in protocol changes (e.g., validator balance consolidation) due to unnecessarily high risks.
cdot
Contents

  1. Initial slashing penalty
  2. Correlation penalty
  3. Attestation penalty
  4. Inactivity leak penalty

cdot
Related work

titledescription
proposalinitial ethresear.ch post
diff prdiff-view PR showing the proposed spec change
security considerations docanalysis of the security implications of this change
eip prEIP-7251 open PR
faq docFAQ on the proposal
responses to questionsQ&A based on questions from LIDO

cdot
Thanks
Many thanks to Terence, Vitalik, Mikhail, Lion, stokes and Izzy for relevant discussions.

1. Initial slashing penalty

When a validator is slashed, slash_validator applies an initial penalty proportional to the effective balance of the validator.

def slash_validator(state: BeaconState, slashed_index: ValidatorIndex, ...) -> None:
    ...
    slashing_penalty = validator.effective_balance // MIN_SLASHING_PENALTY_QUOTIENT_BELLATRIX
    decrease_balance(state, slashed_index, slashing_penalty)

With MIN_SLASHING_PENALTY_QUOTIENT_BELLATRIX=32, validators with an effective balance of 32 ETH (the current MAX_EFFECTIVE_BALANCE) are slashed exactly 1 ETH. Without changing this function, the slashing penalty of a validator with an effective balance of 2048 ETH (the proposed new MAX_EFFECTIVE_BALANCE), would be 64 ETH. The initial slashing penalty scales linearly with the effective balance of the validator, making it inherently more risky to run validators with larger effective balances. We could simply make this initial penalty constant.

def slash_validator(state: BeaconState, slashed_index: ValidatorIndex, ...) -> None:
    ...
-    slashing_penalty = validator.effective_balance // MIN_SLASHING_PENALTY_QUOTIENT_BELLATRIX
+    slashing_penalty = MIN_SLASHING_PENALTY
     decrease_balance(state, slashed_index, slashing_penalty)

Here, with MIN_SLASHING_PENALTY=1, we ensure that everyone is slashed exactly 1 ETH for the initial penalty. If we decide that a constant penalty is insufficient, there are various sublinear functions we could use to make the initial penalty less punitive but still monotone increasing in the effective balance. Let EB denote the effective balance of a validator. For the range EB in [32, 2048], consider the family of polynomials

text{Initial Penalty} = frac{EB^{;text{pow}}}{32}.

The figure below shows a few examples for values of text{pow} leq 1.

On inspection, if we were to choose from this family of non-constant functions, text{pow}=3/4 or text{pow}=7/8 seem to provide a reasonable compromise of still punishing larger validators while reducing their absolute risk significantly.

2. Correlation penalty

The correlation penalty is applied at the halfway point of the validator being withdrawable (normally around 4096 epochs, about 18 days, after the slashing). The process_slashings function applies this penalty.

def process_slashings(state: BeaconState) -> None:
    epoch = get_current_epoch(state)
    total_balance = get_total_active_balance(state)
    adjusted_total_slashing_balance = min(
        sum(state.slashings) * PROPORTIONAL_SLASHING_MULTIPLIER_BELLATRIX,  # [Modified in Bellatrix]
        total_balance
    )
    for index, validator in enumerate(state.validators):
        if validator.slashed and epoch + EPOCHS_PER_SLASHINGS_VECTOR // 2 == validator.withdrawable_epoch:
            increment = EFFECTIVE_BALANCE_INCREMENT  # Factored out from penalty numerator to avoid uint64 overflow
            penalty_numerator = validator.effective_balance // increment * adjusted_total_slashing_balance
            penalty = penalty_numerator // total_balance * increment
            decrease_balance(state, ValidatorIndex(index), penalty)

Let EB denote the “effective balance” of the validator, SB denote the “slashable balance” (the correlated ETH that is slashable), and TB denote the “total balance” of the beacon chain, then

text{Correlation Penalty} = frac{3cdot EB cdot SB}{TB}.

If 1/3 of the total stake is slashable, the penalty equals the effective balance of the validator;

SB = TB / 3 implies text{Correlation Penalty} = EB.

On the other hand, because the penalty is calculated with integer division, we have

3cdot EB cdot SB < TB implies text{Correlation Penalty} = 0.

This implies that for isolated slashing events, the correlation penalty is zero. Putting some numbers to this, we currently have 24 million ETH staked. This implies that even a 2048 ETH validator that is slashed in isolation would not have any correlation penalty because

3 cdot 2048 cdot 2048 = 1.2582912 times 10^7 < 2.4 times 10^7.

The figure below shows the correlation penalty as a function of the proportion of slashable ETH for different validator sizes.

upload_84cc55368d2b710726167f52f19b0d1d

The rate at which the correlation penalty increases for large validators is higher because the validator effective balance is the coefficient of the linear function on the ratio of SB and TB. Notice that these linear functions must have the property that when the proportion of ETH slashed is 1/3, the penalty is the entire validator balance. We could leave this as is, but if we wanted to encourage more validator consolidation, we could reduce the risk of larger correlation penalties for validators with higher effective balances. Consider the following new correlation penalty:

text{New Correlation Penalty} = frac{3^2cdot EB cdot SB^2}{TB^2}.

Notice that this still has the property

begin{align}
SB = TB / 3 implies text{New Correlation Penalty} &= frac{3^2 cdot EB cdot (TB / 3)^2}{TB^2}
&= EB.
end{align}

The penalty scales quadratically rather than linearly. The plot below demonstrates this.

upload_5b6cadcff67f3859d88de29e7c283c38

Now the validator effective balance is the coefficient of the quadratic. The important point is that at 1/3-slashable ETH, the penalty is the full effective balance. Under this scheme, the consolidated validator faces less risk than in the linearly scaling penalty of today. The figure below demonstrates this point.

upload_b23dbd1d9970ae6403a2f3ed78dfd146

At every proportion of ETH slashed below 1/3, the quadratic penalty (solid line) results in less ETH lost than the linear penalty (the dashed line). We can also take a zoomed-in look at the penalties when smaller amounts of ETH are slashed. The figure below shows the penalties for different validator sizes under both the linear and quadratic schemes for up to 500 * 2048 = 1,024,000 ETH slashed in the correlation period.

upload_cd73e7357245bfd93ae82b4535ecefd3

This figure shows that all validator sizes have less correlation risk under the quadratic scaling. The diff below shows the proposed modified process_slashings function.

def process_slashings(state: BeaconState) -> None:
...
+    penalty_numerator = validator.effective_balance**2 // increment * adjusted_total_slashing_balance
+    penalty_numerator *= PROPORTIONAL_SLASHING_MULTIPLIER_BELLATRIX
+    penalty = penalty_numerator // total_balance**2 * increment
     decrease_balance(state, ValidatorIndex(index), penalty)

3. Attestation penalty

When a validator is slashed, their attestations are no longer considered valid. They are penalized as if they went offline for the 8192 epochs until they become withdrawable. Attestations contain “source”, “target”, and “head” votes. In get_flag_index_deltas the penalties are applied only for the “source” and “target” votes. The relative weights of these votes are specified here. We care about TIMELY_SOURCE_WEIGHT=14, TIMELY_TARGET_WEIGHT=26, and WEIGHT_DENOMINATOR=64. For each of the EPOCHS_PER_SLASHINGS_VECTOR=8192 epochs, the slashed validator will be penalized

text{Epoch penalty} = frac{text{base reward } cdot EB cdot (14 + 26)}{64}

Here the “base reward” is defined in get_base_reward as

text{base reward} = frac{64}{lfloorsqrt{TB} rfloor}

With TB at 24 million ETH, we have lfloorsqrt{TB} rfloor = 4898, which gives a base reward of 413 GWEI. For a 32 ETH validator, we have

begin{align}
text{total attestation penalty (32 ETH Val)} &= 8192 cdot frac{413 cdot 32 cdot 40}{64} &approx 6.767 times 10^7 ;;text{GWEI}
&approx 0.06767 ;; text{ETH}
end{align}

For a full 2048 ETH validator, the attestation penalty just scales linearly with the effective balance, so we have

begin{align}
text{total attestation penalty (2048 ETH Val)} &approx 64 cdot 0.06767 ;; text{ETH} & approx 4.331 ;; text{ETH}
end{align}

We don’t think this penalty needs to change because it is still relatively small. However, we could consider reducing the attestation penalties by modifying the number of epochs that we consider the validator “offline”. These penalties ensure that it is never worth it to self-slash intentionally to avoid inactivity penalties. Thus as long as the penalty is larger than what an unslashed, exiting, offline validator would pay, we don’t change the security model.

4. Inactivity leak penalty

If the chain is in an “inactivity leak” state, where we have not finalized for MIN_EPOCHS_TO_INACTIVITY_PENALTY=4 epochs (see is_in_inactivity_leak), there is an additional set of penalties levied against the slashed validator. Fully online validators should earn exactly 0 rewards, while any offline validators will start to leak some of their stake. This enables the chain to reset by increasing the relative weight of online validators until it can finalize again. Since a slashed validator appears “offline” to the chain, the inactivity leak can significantly punish them for not fulfilling their duties.

Inactivity leak penalties are calculated in get_inactivity_penalty_deltas, which is included below.

def get_inactivity_penalty_deltas(state: BeaconState) -> Tuple[Sequence[Gwei], Sequence[Gwei]]:
    """
    Return the inactivity penalty deltas by considering timely target participation flags and inactivity scores.
    """
    rewards = [Gwei(0) for _ in range(len(state.validators))]
    penalties = [Gwei(0) for _ in range(len(state.validators))]
    previous_epoch = get_previous_epoch(state)
    matching_target_indices = get_unslashed_participating_indices(state, TIMELY_TARGET_FLAG_INDEX, previous_epoch)
    for index in get_eligible_validator_indices(state):
        if index not in matching_target_indices:
            penalty_numerator = state.validators[index].effective_balance * state.inactivity_scores[index]
            # [Modified in Bellatrix]
            penalty_denominator = INACTIVITY_SCORE_BIAS * INACTIVITY_PENALTY_QUOTIENT_BELLATRIX
            penalties[index] += Gwei(penalty_numerator // penalty_denominator)
    return rewards, penalties

A few notable constants:

  • INACTIVITY_SCORE_BIAS=4 (see here)
  • INACTIVITY_PENALTY_QUOTIENT_BELLATRIX=2^24 (see here)

The penalty_numerator is the product of the effective balance of the validator and their “inactivity score”. See Vitalik’s annotated spec for more details about the inactivity scoring. The inactivity score of each validator is updated in process_inactivity_updates.

def process_inactivity_updates(state: BeaconState) -> None:
    # Skip the genesis epoch as score updates are based on the previous epoch participation
    if get_current_epoch(state) == GENESIS_EPOCH:
        return
    for index in get_eligible_validator_indices(state):
        # Increase the inactivity score of inactive validators
        if index in get_unslashed_participating_indices(state, TIMELY_TARGET_FLAG_INDEX, get_previous_epoch(state)):
            state.inactivity_scores[index] -= min(1, state.inactivity_scores[index])
        else:
            state.inactivity_scores[index] += config.INACTIVITY_SCORE_BIAS
        # Decrease the inactivity score of all eligible validators during a leak-free epoch
        if not is_in_inactivity_leak(state):
            state.inactivity_scores[index] -= min(config.INACTIVITY_SCORE_RECOVERY_RATE, state.inactivity_scores[index])

During an inactivity leak period, a slashed validator will have their inactivity score incremented by 4 points every epoch. Each point is a pseudo “1 ETH” of additional effective balance to increase the punishment against offline validators. The table below shows varying-length inactivity leak penalties for differing validator sizes. The penalties scale linearly with the validator’s effective balance.

validator size16 epoch leak128 epoch leak1024 epoch leak
32 ETH0.000259 ETH0.0157 ETH1.00 ETH
256 ETH0.00208 ETH0.126 ETH8.01 ETH
2048 ETH0.0166 ETH1.01 ETH64.1 ETH

These penalties feel pretty well contained for large validators, so we propose not modifying them because the leak is already relatively gradual.

1 post – 1 participant

Read full topic

No free lunch – a new inclusion list design

https://ethresear.ch/t/no-free-lunch-a-new-inclusion-list-design/16389

No free lunch – a new inclusion list design

by vitalik & mike
august 15, 2023
cdot
tl;dr; The free data availability problem is the core limitation of many inclusion list instantiations. We outline the mechanics of a new design under which the inclusion list is split into a Summary, which the proposer signs over, and a list of Txns, which remain unsigned. By walking through the lifecycle of this new inclusion list, we show that the free data availability problem is solved, while the commitments of the inclusion list are enforceable by the state-transition function. We conclude by modifying the design to be more data efficient.
cdot
Contents

  1. The free data availability problem
  2. Core mechanics

  3. Solving the data efficiency problem

  4. FAQ
  5. Appendix 1: Rebuilder encoding strategy
  6. Appendix 2: ReducedSummary stuffing

cdot
Related work

cdot
Acronyms & abbreviations

sourceexpansion
ILinclusion list
DAdata availability
Txnstransactions

cdot
Thanks
Many thanks to Justin and Barnabé for comments on the draft. Additional thanks to Jon, Hasu, Tomasz, Chris, Toni, Terence, Potuz, Dankrad, and Danny for relevant discussions.


1. The free data availability problem

As outlined in Vitalik’s State of research piece, one of the key desiderata of an anti-censorship scheme is not providing free data availability (abbr. DA). Francesco’s Forward Inclusion List proposal addresses this by not incorporating data about the inclusion list (abbr. IL) into any block. The slot n IL is enforced by the slot n+1 attesting committee based on their local view of the p2p data. While this is an elegant solution that eliminates the free DA problem, it is a subjective enforcement of the IL. A non-conformant block can still become canonical if, for example, the slot n+1 attesters collude to censor by pretending to not see the IL on time. Additionally, it adds another sub-slot synchrony point to the protocol, as a deadline for the availability of the IL must be set.

Ideally, we want objective enforcement of the IL. It should be impossible to produce a valid block that doesn’t conform to the constraints set out in the IL. The naïve solution is to place the IL into the block body for slot n, allowing slot n+1 attesters can use the data as part of their state-transition function. This is objective because any block that doesn’t conform to the IL would be seen as invalid, and thus could not become canonical. Unfortunately, this idea falls victim to the free DA problem.

The key issue here is that a proposer must be able to commit to their IL before seeing the contents of their block. The reason is simple: in proposer-builder separations (PBS) schemes (mev-boost today, potentially ePBS in the future) the proposer has to commit to their block before receiving its contents to protect the builder from MEV stealing. Because the proposer blindly commits to their block, we cannot enforce that all of the transactions in the IL are valid after the slot n payload is executed. The figure below depicts an example:

Here the proposer commits to an IL which includes txn 0xef, which is from: b with nonce: 7. Unfortunately (or perhaps intentionally), the payload for their slot includes txn 0xde which is also from: b with nonce: 7. Thus txn 0xef is no longer valid and won’t pay gas, even if it is much larger than txn 0xde; getting txn 0xef into the IL but not the block itself may offer extreme gas savings by not requiring that the originator pays for the calldata stored with the transaction. However, since it is part of the state-transition function, it must be available in the chain history.

(Observation 1) Any inclusion list scheme that

  • allows proposers to commit to specific transactions before seeing their payload, and
  • relies on the state-transition function to enforce the IL commitments,

admits free DA.

The reasoning here is quite simple – if the conditions of (Obs. 1) are met, the contents of the IL transactions must be available to validate the block. Even if the block only committed to a hash of the IL transaction, we still need to see the full transaction in the clear for the state-transition function to be deterministic.

2. Core mechanics

To solve the free DA problem, we begin by specifying the building blocks of the new IL design and lifecycle, which is split into the construction, inclusion, and validation phases.

Definitions

  • slot n pre-stateThe execution-layer state before the slot n payload is executed (i.e., the state based on the parent block).
  • slot n post-stateThe execution-layer state after the slot n payload is executed.
  • InclusionList (abbr. IL)The transactions and associated metadata that a slot n proposer constructs to enforce validity conditions on the slot n+1 block. The IL is decomposed into two, equal-length lists – Summary and Txns.
  • SummaryA list of (address, gasLimit) pairs, which specify the from and gasLimit of each transaction in Txns. Each pair is referred to as an Entry.
  • TxnsA list of full transactions corresponding to the metadata specified in the Summary. These transactions must be valid in the slot n pre-state and have a maxFeePerGas greater than the slot n block base fee times 1.125 (to account for the possible base fee increase in the next block).
  • EntryA specific (address, gasLimit) element in the Summary. An Entry represents a commitment to a transaction from address getting included either in slot n or n+1 as long as the remaining gas in the slot n+1 payload is less than gasLimit.
  • Entry satisfactionAn Entry can be satisfied in one of three ways:
    1. a transaction from address is included in the slot n payload,
    2. a transaction from address is included in the slot n+1 payload, or
    3. the gas remaining (i.e., the block.gasLimit minus gas used) in the slot n+1 payload is less than the gasLimit.

(Observation 2) A transaction that is valid in the slot n pre-state will be invalid in the slot n post-state if

  • the slot n payload includes at least one transaction from the same address (nonce reuse) or
  • the maxFeePerGas is less than the base fee of the subsequent block.

While transactions may fail for exogenous reasons (e.g., the price on a uniswap pool moving outside of the slippage set by the original transaction), they remain valid.

Inclusion list lifecycle

We now present the new IL design (this is a slightly simplified version – we add a few additional features later). Using slot n as the starting point, we split the IL lifecycle into three phases. The slot n proposer performs the construction, the slot n+1 proposer does the inclusion, and the entire network does the validation. Each phase is detailed below.

  1. Construction – The proposer for slot n constructs at least one IL = Summary + Txns, and signs the Summary (the fact that the proposer can construct multiple ILs is important).

    • The transactions in Txns must be valid based on the slot n pre-state (and have a high enough maxFeePerGas), but the proposer does not sign over them.
    • The proposer then gossips an object containing:
      1. their SignedBeaconBlock, and
      2. their IL = Summary (signed) + Txns (unsigned).
    • Both the block and an IL must be present in the validator’s view to consider the block as eligible for the fork-choice rule.
  2. Inclusion – The proposer for slot n+1 creates a block that conforms to a Summary that they have observed (there must be at least one for them to build on that block).

    • The slot n+1 block includes a slot n Summary along with the signature from the slot n proposer.
  3. Validation – The network validates the block using the state-transition function.

    • Each Entry in the Summary must be satisfied for the block to be valid.
    • The signature over the Summary must be valid.

Wait… that’s it? yup :slight_smile: (well this solves the free DA problem – we introduce a few extra tricks later, but this is the gist of it). The figure below shows the construction and inclusion stages.

upload_7163f83a8af5c2b9be13e1e60f8c4d33

  • The slot n proposer signs the Summary=[(0xa, 3), (0xb, 2), (0xc, 7)] and broadcasts it along with Txns = [txn a, txn b, txn c].
  • The slot n payload includes txn c and txn a (order doesn’t matter). These transactions satisfy (0xc, 7) and (0xa, 3) respectively.
  • The slot n+1 proposer sees that the only entry that they need to satisfy in their payload is (0xb, 2), which they do by including txn b.
  • The rest of the network checks that each Entry is satisfied and that the signature over the Summary is valid.

Validators require that there exists at least one valid IL before they consider the block for the fork-choice rule. If a malicious proposer publishes a block without a corresponding IL=Summary+Txns, the honest attesters in their slot (and future slots) will vote against the block because they don’t have an available IL.

How does that solve the free DA problem?

Two important facts allow this scheme to avoid admitting free DA.

  1. Potential for multiple ILs. Since the proposer doesn’t include anything about their IL in their block, they can create multiple without committing a proposer equivocation.
  2. Reduced specificity of the IL commitments. The Summary can be satisfied by a transaction in either the slot n or the slot n+1 payload and the transaction that satisfies a specific Entry in the Summary needn’t be the same transaction that accompanied the Summary in the Txns list.

By signing over the list of (address, gasLimit) pairs, the proposer is saying: “I commit that during slot n or slot n+1, a transaction from address will be included as long as the remaining gas in the slot n+1 payload is less than gasLimit.”

By not committing to a specific set of transactions, the slot n proposer gives the network deniability. This concept relates to cryptographic deniability in that validators can deny having received a transaction without an adversary being able to disprove that. This property follows from the observation below.

(Observation 3) The only way to achieve free DA is by sending multiple transactions from the same address with the same nonce.

Recall that the free DA problem arises when a transaction that was valid in the slot n pre-state is no longer valid in the slot n post-state but is still committed to in the inclusion list. From (Obs. 2), the only way this can happen is through nonce reuse (the base fee is covered by requiring the transactions to have 12.5% higher maxFeePerGas than the current block base fee). This leads to the final observation.

(Observation 4) If txn b aims to achieve free DA, then there exists a txn a such that txn a satisfies the same Entry in the Summary as txn b. Thus validators can safely deny having seen txn b, because they can claim to have seen txn a instead.

In other words, validators don’t have to store the contents of any transactions that don’t make it on chain, and the state-transition function is still deterministic. The figure below encapsulates this deniability.

upload_59e992ef20efc1670b7a9b8406c09cef

  • The slot n proposer creates their Block and Summary. They notice that txn a and txn b are both valid at the slot n pre-state and satisfy the Entry 0xa, 3.
  • The slot n proposer distributes two versions of Txns, one with each transaction.
  • The slot n+1 proposer sees that txn b is invalid in the slot n post state (because txn a, which is also from the 0xa address, is in the slot n payload).
  • The slot n+1 proposer constructs their block with the Summary, but safely drops txn b, because txn a satisfies the Entry.
  • The slot n+1 attester includes the block in their view because they have seen a valid IL (where the Txns object contains txn a).
  • The slot n+1 attester verifies the signature and confirms that the Entry is satisfied.
  • Key point – the slot n+1 attester never saw txn b, but they are still able to verify the slot n+1 block. This implies that the attester can credibly deny having txn b available.

Thus txn b can safely be dropped from the beacon nodes because it is not needed for the state-transition function; txn b is no longer available. This example is slightly simplified in that txn a and txn b are both satisfied by the same Entry in the Summary (meaning they have the same gasLimit). With different gasLimit values, the slot n proposer would need to create and sign multiple Summary objects, which is fine because the Summary is not part of their block.

3. Solving the data efficiency problem

The design above solves the free DA problem, but it introduces a new (smaller) problem around data efficiency. The slot n+1 proposer includes the entire slot n Summary in their block. With 30M gas available and the minimum transaction consuming 21k gas, a block could have up to 1428 transactions. Thus the Summary could have 1428 entries, each of which consumes 20 (address) + 1 (gasLimit) bytes (using a single byte to represent the gasLimit in the Summary). This implies that the Summary could be up to 29988 bytes, which is a lot of additional data for each block. Based on the fact that each Entry in the Summary is either satisfied in slot n or slot n+1, we decompose the Summary object into two components:

  • ReducedSummary – the remaining Entry values that are not satisfied by the slot n payload, and
  • Rebuilder – an array used to indicate which transactions in the slot n payload satisfy Entry values in the original Summary.

The slot n+1 proposer only needs to include the ReducedSummary and the Rebuilder for the rest of the network to reconstruct the full Summary. With the full Summary, the slot n proposer signature can be verified as part of the slot n+1 block validation.

What the heck is the “Rebuilder”?

The Rebuilder is a (likely sparse) array with the same length as the number of transactions in the slot n payload. For each index i:

  • Rebuilder[i] = 0 implies that the ith transaction of the slot n payload can be ignored.
  • Rebuilder[i] = x, where x !=0 implies that the ith transaction of the slot n payload corresponds to an Entry in the signed Summary, where x indicates the gasLimit from the original Entry.

Now the algorithm to reconstruct the original Summary is as follows:

ReconstructedEntries = []
for i in range(len(Rebuilder)):
    if Rebuilder[i] != 0:
        ReconstructedEntries.append(
            Entry(
                address=SlotNPayload[i].address,
                gasLimit=Rebuilder[i]
            )
        )
Summary = sorted(ReducedSummary.Extend(ReconstructedEntries))

The Summary needs some deterministic order to verify the slot n proposer signature. The easiest solution is to simply sort based on the address of each Entry. We can further reduce the amount of data in the Rebuilder by representing the gasLimit with a uint8 rather than a full uint32.

Inclusion list lifecycle (revisited)

The IL lifecycle largely remains the same, but it is probably worth revisiting it with the addition of the ReducedSummary and Rebuilder.

  1. (unchanged) Construction – The proposer for slot n constructs at least one IL = Summary + Txns, and signs the Summary (the fact that the proposer can sign multiple ILs is important).
    • The transactions in Txns must be valid based on the slot n pre-state (and have a high enough maxFeePerGas), but the proposer does not sign over them.
    • The proposer then gossips an object containing:
      1. their SignedBeaconBlock, and
      2. their IL = Summary (signed) + Txns (unsigned).
    • Both the block and an IL must be present in the validator’s view to consider the block as eligible for the fork-choice rule.
  2. (changed) Inclusion – The proposer for slot n+1 creates a block that conforms to the Summary they have observed.
    • They construct the ReducedSummary and Rebuilder based on the slot n payload.
    • The block includes the ReducedSummary, Rebuilder, and the original signature from the slot n proposer.
  3. (changed) Validation – The network validates the block using the state-transition function.
    • The full Summary is reconstructed using the ReducedSummary and the Rebuilder.
    • The slot n proposer signature is verified against the full Summary.
    • Each Entry in the Summary must be satisfied for the block to be valid.

The figure below demonstrates this process.

upload_9fee68d5181bc9c1366b209654650cba

  • (unchanged) The slot n proposer signs the Summary=[(0xa, 3), (0xb, 2), (0xc, 7)] and broadcasts it along with Txns = [txn a, txn b, txn c], which must be valid in the slot n pre-state.
  • (unchanged) The slot n payload includes txn c and txn a (order doesn’t matter).
  • (changed) The slot n+1 proposer sees that entries 0,2 in the Summary are satisfied, so makes the ReducedSummary=(0xb, 2). This is the only entry that they need to satisfy in slot n+1, which they do by including txn b in their payload.
  • (changed) The slot n+1 proposer constructs the Rebuilder by referencing the transaction indices in the slot n payload needed to recover the addresses. The Rebuilder array also contains the original gasLimit values that the slot n+1 proposer received.
  • (changed) The slot n+1 attesters use the Rebuilder, the ReducedSummary, and the slot n payload to reconstruct the full Summary object to verify the signature.

This scheme takes advantage of the fact that most of the Summary data (the address of each Entry satisfied in slot n) will be already stored in the slot n payload. Rather than storing these addresses twice, the Rebuilder acts as a pointer to the existing data. The Rebuilder needs to store the gasLimit of each original Entry because the transaction in the slot n payload may be different than what originally came in the Txns.

*-thanks for reading! :open_book::heart:-*

FAQ

  • What is the deal with the maxFeePerGas?

    • One of the transaction fields is maxFeePerGas. This specifies how much the transaction is willing to pay for the base fee. To ensure the transaction is valid in the slot n post-state, we need to enforce that the maxFeePerGas is at least 12.5% (the max amount the base fee can increase from block to block) higher than the current base fee.
  • Why do we need to include the ReducedSummary in the slot n+1 payload?

    • We technically don’t! We could use a Rebuilder structure to recover the Summary entries that are satisfied in the slot n+1 payload as well. It is just a little extra complexity that we didn’t think was necessary for this post. This ultimately comes down to an implementation decision that we can make.
  • What happens if a proposer never publishes their IL, but gets still accumulates malicious fork-choice votes on their block?

    • Part of the honest behavior of accepting a block into their fork-choice view is that a valid IL accompanies it. Even if the malicious attesters vote for a block that doesn’t have an IL, all of the subsequent honest attesters will vote against that fork based on not seeing the IL.
  • Can the slot n proposer play timing games with the release of their IL?

    • Yes, but no more than they can do already. It is the same as if the slot n proposer tried to grief the slot n+1 proposer by not sending them the block in time. They risk not accumulating enough attestations to overpower the proposer boost of the subsequent slot.
  • What happens if a proposer credibly commits (e.g., through the use of a TEE) to only signing a single Summary?

    • Justin came up with a scenario where a proposer and a transaction originator can collude to get a single valid Summary published (e.g., by using a TEE) that has an Entry that is only satisfied by a single transaction. This would break the free DA in that all honest attesters would need to see this transaction as part of the IL they require to accept the block into their fork-choice view. We can avoid this by allowing anyone to sign arbitrary Summary objects for any slot that is at least n slots in the past. The default behavior could be for some validators to simply sign empty Summary objects after 5 slots have passed.
  • How does a sync work with the IL?

    • This is related to the question above, because seeing a block as valid in the fork-choice view requires a full IL for that slot. If we allow anyone to sign ILs for past slots, the syncing node can simply sign ILs for each historical slot until it reaches the head of the chain.
  • Why use uint8 instead of uint32 for the gas limits in the Summary?

    • This is just a small optimization to reduce the potential size of the maximum Summary by a factor of four. The constraint would be that the Txns must use less than or equal to the uint8 gasLimit specified in the corresponding entry. This becomes an implementation decision as well.

Appendix 1: Rebuilder encoding strategy

The slot n proposer has control over some of the data that ends up in the slot n+1 Rebuilder, and thus can use it to achieve a small amount of free DA (up to 1428 bits = 178.5 bytes). The technique is quite simple. Let’s use the case where the proposer’s payload contains 1000 transactions, which allows the proposer to store a 1000-bit message for free, denoted msg. Let payload[i] and msg[i] denote the ith transaction in their payload and the ith bit in the message respectively.

  1. The slot n proposer self-builds a block, thus they know the contents of the block before creating their Summary.
  2. To construct their Summary, for each index i, do
    • if msg[i] == 0, don’t include payload[i] in the Summary.
    • if msg[i] == 1, include payload[i] in the Summary.

It follows that by casting Rebuilder from a byte array to a bit array, msg is recovered. Since the Rebuilder is part of the slot n+1 block, msg is encoded into the historical state. However, the fact that this is at most 178.5 bytes per block makes it unlikely to be an attractive source of DA. Additionally, it’s only possible to store as many bits as there are valid transactions to include in the slot n payload. The maximum is 1428 if each transaction is a simple transfer, but historically blocks contain closer to 150-200 transactions on average.

Appendix 2: ReducedSummary stuffing

It is also worth considering the case where the slot n proposer tries to ensure that the slot n+1 ReducedSummary is large. The most they can do is self-build an empty block while putting every valid transaction they see into their Summary object. With an empty block, the slot n+1 ReducedSummary is equivalent to the slot n Summary (because none of the entries have been satisfied in the slot n payload). As we calculated above, the max size of the ReducedSummary would be 29988 bytes, which is rather large, but only achievable if there are 1428 transfers lying around in the mempool. Even if that happens, the slot n proposer just gave up all of their execution layer rewards to make the next block (at most) 30kB larger. Blocks can already be much larger than that (some are hundreds of kB), and post-4844, this will be even less relevant. Thus this doesn’t seem like a real griefing vector to be concerned about. We also could simply use a Rebuilder for the slot n+1 payload as well if necessary.

1 post – 1 participant

Read full topic

Relays in a post-ePBS world

https://ethresear.ch/t/relays-in-a-post-epbs-world/16278

Relays in a post-ePBS world

cdot
by mike, jon, hasu, tomasz, chris, & toni
based on discussions with justin, caspar, & stokes
august 4, 2023
cdot
tl;dr;
Continued ePBS research and the evolving mev-boost landscape have made it clear that the incentive to use relays will likely remain even if we enshrine a PBS mechanism. This document describes the exact services that relays offer today and how they could change under ePBS.

Post enshrinement, the protocol would serve as a default “neutral relay” while the out-of-protocol relay market continues to develop, offering potential latency optimizations and other ancillary services (e.g., bid cancellations and more flexible payments). This greatly reduces the current dependency on public goods relays.

We also present a new in-protocol unconditional payment design proposed by Caspar and Justin, which we call Top-of-Block (abbr. ToB) Payments. This modification simplifies ePBS meaningfully and further reduces the scope of services that require relays.

Although removing relays has often been cited as the raison d’être for enshrinement, we believe ePBS is still highly beneficial even if relays persist in some (reduced) form. The primary tradeoff is the added protocol complexity.
cdot
Contents
(1) Why enshrine PBS? revisits the original question and sets the stage for why we expect the relay market to exist post-ePBS.
(2) Relay roles today presents the current relay functionality.
(3) A simple ePBS instantiation outlines the core primitives needed for ePBS and introduces Top-of-Block payments.
(4) Relay role evolution post-ePBS revisits (2) and presents the advantages that future relays may have over the enshrined mechanism.
(5) The bull case for enshrinement presents the argument that ePBS is still worth doing despite (4), and also explores the counter-factual of allowing the mev-boost market to evolve unchecked.

Note: we continue using the term “relay” for the post-enshrinement out-of-protocol PBS facilitator. It’s worth considering adopting a different name for these entities to not conflate them with relays of today, but for clarity in this article, we continue using the familiar term.
cdot
Thanks
Many thanks to Justin, Barnabé, Thomas, Vitalik, & Bert for your comments.


acronymmeaning
PBSProposer-Builder Separation
ePBSenshrined Proposer-Builder Separation
PTCPayload-Timeliness Committee
ToBTop-of-Block

(1) Why enshrine PBS?

Why enshrine Proposer-Builder Separation? outlines 3 reasons:

(i) relays oppose Ethereum’s values, (note: strong wording is a quote from the original)
(ii) out-of-protocol software is brittle, and
(iii) relays are expensive public goods.

The core idea was that ePBS eliminates the need for mev-boost and the relay ecosystem by enshrining a mechanism in the consensus layer to facilitate outsourced block production.

While points (i-iii) remain true, it is not clear that ePBS can fully eliminate the relay market. It appears likely that relays would continue to offer services that both proposers and builders may be incentivized to use.

We can’t mandate that proposers only use the ePBS mechanism. If we tried to enforce that all blocks were seen in the P2P layer, for example, it’s still possible for proposers to receive them from side channels (e.g., at the last second from a latency-optimized relay) before sending them to the in-protocol mechanism. This document presents the case that enshrining is still worthwhile while being pragmatic about the realities of latency, centralization pressures, and the incentives at play.

(2) Relay roles today

Relays are mutually-trusted entities that facilitate the PBS auction between proposers and builders. The essence of a PBS mechanism is:

(i) a commit-reveal scheme to protect the builder from the proposer, and
(ii) a payment enforcement mechanism to protect the proposer from the builder.

For (i), relays provide two complementary services:

  1. MEV-stealing/unbundling protection Relays protect builders from proposers by enforcing a blind signing of the header to prevent the stealing and/or unbundling of builder transactions.
  2. Block validity enforcement Relays check builder blocks for validity. This ensures that proposers only commit to blocks that are valid and thus should become canonical (if they are not late).

For (ii), relays implement one of the following:

  1. Payment verification Relays verify that builder blocks correctly pay the proposer fee recipient. In the original Flashbots implementation, the payment was enforced at the last transaction in the block. Other relays allow for more flexible payment mechanisms (e.g., using the coinbase transfer for the proposer payment) and there is an active PR in the Flashbots builder repo to upstream this logic.
  2. Collateral escrow Optimistic relays remove payment verification and block validity enforcement to reduce latency. They instead escrow collateral from the builder to protect proposers from invalid/unpaying blocks.

Lastly, relays offer cancellations (an add-on feature not necessary for PBS):

  1. Cancellation support Relays allow builders to cancel bids. Cancellations are especially valuable for CEX-DEX arbitrageurs to update their bids throughout the slot as CEX prices fluctuate. Cancellations also allow for other builder bidding strategies.

(3) A simple ePBS instantiation

We now present a simple ePBS instantiation, which allows us to consider the relay role post-ePBS. While we focus on a specific design, other versions of ePBS have the same/similar effects in terms of relay evolution. Let’s continue using the following framing for PBS mechanisms:

(i) a commit-reveal scheme to protect the builder from the proposer, and
(ii) a payment enforcement mechanism to protect the proposer from the builder.

For (i) we can use the Payload-Timeliness Committee (abbr. PTC) to enforce that builder blocks are included if they are made available on time (though other designs like Two-slot and PEPC are also possible).

upload_b7e69764e2a40ab3654c4d27c2165ff0

In the PTC design, a committee of attesters votes on the timeliness of the builder payload. The subsequent proposer uses these votes to determine whether to build on the “full” CL block (which includes the builder’s ExecutionPayload) or the “empty” CL block (which doesn’t include the builder’s ExecutionPayload).

For (ii) we present a new unconditional payment mechanism called “Top-of-Block” payments. h/t to Caspar for casually coming up with this neat solution over a bistro dinner in Paris and Justin for describing a very similar mechanism in MEV burn – a simple design; c’est parfait :fr:.

Top-of-Block Payments (abbr. ToB)

To ensure that proposers are paid fairly despite committing to the builder’s bid without knowing the contents of the ExecutionPayload, we need an unconditional payment mechanism to protect proposers in case the builder doesn’t release a payload on time. The idea here is simple:

  • Part of the builder bid is a transaction (the ToB payment) to the proposer fee recipient; the transaction will likely be a transfer from an EOA (we could also make use of smart contract payments, but this adds complexity in that the amount of gas used in the payment must be capped – since the outcome is the same, we exclude the implementation details).
  • The payment is valid if and only if the consensus block commits to the ExecutionPayloadHeader corresponding to the bid.
  • The payment must be valid given the current head of the chain (i.e., it builds on the state of the parent block). In other words, it is valid at the top of the block.
  • The ExecutionPayload from the builder then extends the state containing the ToB payment (i.e., it builds on the state after the unconditional payment).
  • If the builder never reveals the payload, the transaction that pays the proposer is still executed.

The figure below depicts the ToB payment flow:

diagram-20230804

For slots n and n+2, the corresponding ExecutionPayloads are included and the EL state is updated. In slot n+1, the builder didn’t reveal their payload, but the payment transaction is still valid and included. This is a just-in-time (JIT) payment mechanism with two key points:

  • the builder no longer needs to post collateral with the protocol, and
  • the builder must still have sufficient liquidity on hand to make the ToB payment (i.e., they’re still unable to use the value they would generate within the ExecutionPayload to fund their bid).

If a builder does not have sufficient capital before the successful execution of the ExecutionPayload, a relay would still be required to verify the payment to the proposer.

(4) Relay role evolution post-ePBS

Let’s revisit how the relays’ services evolve if PTC + ToB payments are introduced. We use :white_check_mark: to denote that the relay is no longer needed and :x: to denote that the relay may have some edge over ePBS.

  1. MEV-stealing/unbundling protection Relay is no longer relevant :white_check_mark:
    The consensus layer enforces the commit-reveal through the PTC, so the builder is protected in that a proposer must commit to their block before they reveal it.
  2. Block validity enforcement Relay is no longer relevant :white_check_mark:
    No block validity check is made, but today’s proposers only care about the validity of the block insofar as it ensures their payment is valid. ToB payments give them that guarantee. Note that there is an assumption that proposers only care about their payment attached to a block (and not the block contents itself). While this is generally the case, proposers may make other commitments (e.g., via restaking) that are slashable if not upheld (outside of the Ethereum slashing conditions). In this case, a proposer would need to know that a builder block also fulfills the criteria of their commitment made via restaking (e.g., to enforce some transaction order).
  3. Payment verification Relay is superior for high-value blocks :x:
    The ToB payment enforces the unconditional payment to the proposer. However, the relay can allow more flexible payments (e.g., the last transaction in the block) and thus doesn’t require the builder to have enough liquidity up front to make the payment as the first transaction.
  4. Collateral escrow Relay is no longer relevant :white_check_mark:
    Collateral escrow now becomes unnecessary and capital inefficient. If the builder has sufficient liquidity to post collateral, it is strictly better for them to just use ToB payments rather than locking up collateral.
  5. Cancellation support Relay is still needed for cancellations :x:
    Relays support cancellations, whereas the protocol does not.

Relay advantages over ePBS

Now the critical question: what incentives do proposers and builders have to bypass ePBS through the use of relays? Key takeaway – Relays probably will still exist post-ePBS, but they will be much less important and hopefully only provide a marginal advantage over the in-protocol solution.

  1. More flexible payments – Relays can offer flexible payments (rather than just ToB payments) because they have access to the full block contents. Enforcing this requires simulation by the relay to ensure that the block is valid. This adds latency, which may be a deterrent for using relays to bypass the P2P layer in normal circumstances. However, this would be needed for high-value payments which cannot be expressed via ToB payments (i.e., if the builder needs to capture the payment within the ExecutionPayload to pay at the end of the block). Relays could also allow builders to pay rewards denominated in currencies other than ETH. Note that with zkEVMs, this relay advantage disappears because the builder can post a bid along the encrypted payload with proof that the corresponding block is valid and accurately pays the proposer (VDFs or threshold decryption would be needed to ensure the payload is decrypted promptly).
  2. Lower latency connection – Because relays have direct TCP connections with the proposer and builder, the fastest path between the two may be through the relay rather than through the P2P gossip layer (most notably if the relay is vertically integrated with a builder, implying the builder & proposer have a direct connection). It is not clear exactly how large this advantage may be, especially when compared to a builder that is well-peered in the P2P network.
  3. Bid cancellations & bid privacy – Because relays determine which bid to serve to the proposer, they can support cancellations and/or bid privacy. For a sealed-bid auction, relays could choose not to reveal the value of the bid to other builders and only allow the proposer to call getHeader a single time. It doesn’t seem plausible to support cancellations on the P2P layer, and bid privacy in ePBS is also an unsolved problem. With FHE or other cryptographic primitives, it may be possible to enshrine bid privacy, but this is likely infeasible in the short term.

These benefits may be enough for some builders to prefer relays over ePBS, so we should expect the relay ecosystem to evolve based on the value that relays add. In short, we expect a relay market to exist. A few examples of possible entities:

  1. Vertically-integrated builder/relay – Some builders might vertically integrate to reduce latency and overhead in submitting blocks to relays. They will need to convince validators to trust them, the same as any relay.
  2. Relay as a Service (RaaS) – Rather than start a relay, builders may continue to use third-party relays. Relay operators already have trusted reputations, validator connections, and experience running this infrastructure. If the services mentioned above are sufficiently valuable, these relays can begin to operate as viable profit-making entities.
  3. Public goods relays – Some of the existing third-party relays may remain operational through public goods funding. These would likely be non-censoring relays that are credibly neutral and are supported by ecosystem funds. However, it’s not clear that relay public goods funding would be necessary anymore after ePBS. At this point, relays will only provide optional services with a potentially minimal benefit versus the in-protocol option.

A proposer may hook up to multiple entities to source bids for their slot;
consider the example below.

upload_098ebda4f1832e6b5f69dccf7e4e68da

Here the proposer is connected to two relays and the P2P bidpool. The proposer may always choose the highest bid, but they could also have different heuristics for selection (e.g., use the P2P bid if it’s within 3% of the highest non-P2P bid, which is very similar to the min-bid feature in mev-boost).

This diagram also presents three different builder behaviors:

  • builder A is part of the vertically-integrated builder/relay, resulting in a latency advantage over the other builders (and their relay may only accept bids from their builder).
  • builder B may be a smaller builder who doesn’t want to run a relay, but is willing to pay an independent relay for RaaS to get more payment flexibility or better latency.
  • builder C might not be willing to pay for RaaS or run a relay, but instead chooses to be well connected in the P2P layer and get blocks relayed through the enshrined mechanism.

Note that builder A and builder B are sending their bids to the bidpool as well because there is a chance that the proposer is only listening over the P2P layer or that some issue in the relay causes their bid to go undelivered. It is always worth it to send to the bidpool as well (except in the case where the builder may want to cancel the bid).

The obvious concern is that builder A will have a significant advantage over the other builders, and thus will dominate the market and lead to further builder centralization. This is a possibility, but we note that this is a fundamental risk of PBS, and not something unique to any ePBS proposal. There are still additional benefits of doing ePBS instead of allowing the mev-boost ecosystem to evolve entirely outside the protocol.

(5) The bull case for enshrinement

Despite the realities of the potential relay advantages over the in-protocol mechanism, we still believe there is value in moving forward with ePBS for the following reasons:

  • ePBS may be more efficient than running a relay – It is possible that instead of running a relay or paying for RaaS, builders are competitive by just having good P2P connectivity. The relay-specific benefits described above may be too marginal to justify the additional operations and associated costs. In this case, it would be economically rational to simply use the enshrined mechanism.
  • ePBS significantly reduces the cost of altruism – Presently, the “honest” behavior is to build blocks locally instead of outsourcing to mev-boost. However, 95% of blocks are built through mev-boost because the reward gap between honest and mev-boost blocks is too high (i.e., altruism is too expensive h/t Vitalik). With ePBS, honest behavior allows for outsourcing of the block production to a builder, whereas side-channeling a block through a relay remains out of protocol. Hopefully, the value difference between the P2P bid and the relay bid will be small enough that a larger percentage of validators choose to follow the honest behavior of using the P2P layer to source their blocks (i.e., altruism is less expensive). Additionally, relying only on the in-protocol mechanism explicitly reduces the proposers’ risks associated with running out-of-protocol infrastructure.
  • ePBS delineates in-protocol PBS and out-of-protocol mev-boost – Currently, with 95% of block share, mev-boost is de facto in-protocol software (though there is circuit-breaking in the beacon clients to revert to local block building in the case of many missed slots). This leads to issues around ownership of the software maintenance/testing, consensus stability depending on mev-boost, and continued friction around the integration with consensus client software (see Out-of-protocol software is brittle). By clearly drawing the line between ePBS and mev-boost, these issues become less pronounced because anyone running mev-boost is now taking on the risk of running this sidecar for a much smaller reward. The marginally higher rewards gained from running mev-boost incur a higher risk of going out of protocol.
  • ePBS removes the neutral relay funding issues – The current relay market is not in a stable equilibrium. A huge topic of discussion continues to be relay funding, which is the tragedy of the commons issue faced in supporting public goods. Through ePBS, the protocol becomes the canonical neutral relay, while allowing the relay marketplace to evolve.
  • ePBS is future-compatible with mev-burn, inclusion lists, and L1 zkEVM proof generation – By enshrining the PBS auction, mev-burn becomes possible through the use of the bidpool as an MEV oracle for each slot (we could use relay bids to set the bid floor, but this essentially forces proposers to use relays rather than relying only on the bidpool, which seems fragile). This constrains the builder blocks that are side-channeled in that they must burn some ETH despite not going through the P2P layer, which may compress the margin of running relays even further. Inclusion lists are also a very natural extension of ePBS (inclusion lists could be implemented without ePBS, but we defer that discussion to an upcoming post). Inclusion lists also constrain builder behavior by forcing blocks to contain a certain set of transactions to be considered valid, which is critical for censorship resistance of the protocol (especially in a regime with a relatively oligopolistic builder market). Once we move to an L1 zkEVM world, having a mechanism in place for proposers to outsource proof generation is also highly desirable (see Vitalik’s Endgame).
  • ePBS backstops the builder market in the case of relay outages – As relays evolve, bugs and outages may occur; this is the risk associated with connecting to relays. If the relays experience an outage, the P2P layer at least allows for the PBS market to continue running without forcing all proposers back into a local block-building regime. This may be critical in high-MEV scenarios where relays struggle under the surge of builder blocks and each slot may be highly valuable.

Overall, it’s clear that relays can still provide services in an ePBS world. What’s not yet clear is the precise economic value of these services versus the associated costs and risks. If the delta is high, it is reasonable to expect that relays would continue to play a prominent role. If the delta is low, it may be economically rational for many actors to simply follow the in-protocol mechanism. We hope the reality lies somewhere in the middle.

What happens if we don’t do ePBS?

It is worth asking the question of what happens if we don’t enshrine anything. One thing is very clear – we are not in a stable equilibrium in today’s relay ecosystem. Below are some possible outcomes.

  • Non-monetizing, public goods relays continue searching for sustainable paths forward – The continued survival of credibly neutral relays becomes a higher priority because, without ePBS, the only access validators have to the builder market is through the relays. Neutral relays will need to be supported through public goods funding or some sort of deal between builders, relays, and validators.
  • Inclusion lists and censorship resistance are prioritized – If we capitulate and allow the existing market to evolve, censorship resistance becomes increasingly important. We would likely need to enforce some sort of inclusion list mechanism either through mev-boost or directly in the protocol (again, we think it is possible to do inclusion lists without ePBS – this discussion is forthcoming).
  • We give up on mev-burn in the near-term – Without ePBS, there is no clear way to implement mev-burn.
  • We continue relying on mev-boost software – Without ePBS, mev-boost and the relays continue to be de facto enshrined. We would probably benefit from more explicit ownership over the software and its relationship with the consensus client implementations.

Overall, we assess that the benefits of ePBS outweigh the downside (mostly protocol complexity) even if there exists some incentive to bypass it at times. The remaining uncertainty isn’t so much if we should enshrine something, but rather what we should enshrine (which is a different discussion that I defer to Barnabé) – c’est la vie.

Appendix – advantages of well-capitalized builders under Top-of-Block Payments or collateralized bidding

In the case of multiple equal or roughly equal bids received, the proposer is always incentivized to select bids that include a ToB payment (rather than a flexible payment later in the block). This is strictly less risky for them than trusting a third party for accurate payments. Additionally, there is a latency advantage with ToB payments versus flexible payments if the relay must simulate the block to validate the payment (though the vertically integrated builder/relay wouldn’t check their bids).

Relays would likely support ToB payments, though this is not possible if the builder doesn’t have the collateral on hand. This inherently presents some advantage to larger builders with more capital (it’s possible for the relay or some other entity to loan money to the smaller builder to make the ToB payment, but this would presumably be accompanied by some capital cost – again making the smaller builder less competitive on the margin).

Note that this same tradeoff exists whether the payment is guaranteed via a ToB payment or an in-protocol collateral mechanism. In general, this advantage is likely to arise very infrequently (i.e., builders will nearly always have capital for ToB payment on hand).

The way to counter this capital advantage for large builders would be to impose some in-protocol cap on the guaranteed payment (either via collateral or ToB). Then, no builder could offer a trustless payment to the proposer above the cap. However, this would simply increase the incentive for proposers to turn to out-of-protocol services during high-MEV periods, and they would more frequently be forced into trusting some third party for their payment verification, so we don’t think this is the right path to take.

The frequency and magnitude of this advantage could be diminished if mev-burn is implemented because the only part of the payment that must be guaranteed is the priority fee of the bid. In mev-burn, it may be reasonable to set two limits:

  • Protocol payment (burn) – Cap the maximum ToB burn transaction (or collateral) for the burn payments. In MEV burn – a simple design, Justin discussed the hypothetical example of a 32 ETH cap for example on collateral here.
  • Proposer payment (priority fee) – This ToB payment can be left unbounded so that the proposer is never forced into trusting out-of-protocol solutions. This still favors well-capitalized builders, but at least it is not the full value of the bid, and the burn portion doesn’t need to be guaranteed.

In this case, a failure to deliver the ExecutionPayload would result in the following:

  • Protocol – the burn is guaranteed up to some capped amount (e.g., 32 ETH), but beyond that, a failure to deliver the ExecutionPayload would result in the protocol socializing some loss (i.e., the amount of ETH that should have otherwise been burned).
  • Proposer – the priority payment is received in full regardless.

Merci d’avoir lu :open_book:

1 post – 1 participant

Read full topic

Why enshrine Proposer-Builder Separation? A viable path to ePBS

https://ethresear.ch/t/why-enshrine-proposer-builder-separation-a-viable-path-to-epbs/15710

Why enshrine Proposer-Builder Separation? A viable path to ePBS

by mike neuder and justin drake; may 25, 2023

tl;dr; Proposer-Builder Separation (PBS) decouples the task of block proposing (done by PoS validators) from block building (done most profitably by MEV searchers). PBS aims to improve access to MEV for validators by explicitly creating a permissionless market for block production. By allowing proposers to outsource block construction, validators can continue running on consumer-grade hardware without missing out on the valuable MEV exposed while they are the elected proposer. mev-boost implements out-of-protocol PBS and accounts for ≈90% of Ethereum blocks. Enshrined PBS (ePBS, sometimes referred to as in-protocol/IP PBS) evolves the consensus layer to implement PBS at the protocol level. While ePBS has been discussed since 2021 and is part of Ethereum’s roadmap, recent events (Low-Carb Crusador, Shapella mev-boost prysm signature bug, & relay response to unbundling) have turned the attention of the Ethereum community towards mev-boost and the evolution of PBS.

This document aims to…
[a] outline arguments for ePBS (as opposed to continuing with mev-boost),
[b] present and respond to the counter-arguments to ePBS,
[c] describe desirable properties of ePBS mechanisms,
[d] sketch an ePBS design based on the existing research,
[e] highlight optimistic relaying as an additional tool to build towards ePBS, and
[f] encourage discussions around the ePBS design space.

This document does not aim to…
[a] perform an exhaustive literature review (see here),
[b] fully spec out an ePBS implementation, or
[c] cover alternative designs for ePBS.

Many thanks to Barnabé, Dan Marzec, Terence, Chris Hager, Toni, Francesco, Rajiv, Thomas, and Jacob for comments on draft versions of this document.


Introduction

Proposer-Builder Separation (PBS) allows validators to outsource their block building duties to a set of specialized builders who are well equipped to extract MEV (hence separating the roles of proposer and builder). Proposers sell their block-production rights to builders who pay for the privilege of choosing the transaction ordering in a block. Proposers earn MEV rewards in addition to their protocol issuance, and block builders compete to assemble valuable blocks while saving a portion of the MEV for themselves as profit.

Enshrined PBS

Enshrined PBS (ePBS) advocates for implementing PBS into the consensus layer of the Ethereum protocol. Because there was no in-protocol solution at the time of the merge, Flashbots built mev-boost, which became a massively adopted out-of-protocol solution for PBS that accounts for ≈90% of Ethereum blocks produced.

upload_1a170bd1d4ae5f23cc52d66e4cc0ed52

Figure 1mev-boost slot share (orange) since the merge. Source mevboost.pics.

mev-boost continues to be critical infrastructure provided to grant permissionless access to the external block-building market for all validators, but it relies heavily on a small set of centralized relays to act as mutually-trusted auctioneers facilitating the block-production pipeline. We present the case for ePBS by highlighting (i) that relays are antithetical to Ethereum’s core values, (ii) the risks and inefficiences of side-car software, and (iii) the costs and unsustainability of relay operations.

Reasons to enshrine

  • Relays oppose Ethereum’s values. The following core tenants of Ethereum are eroded by the mass dependence on relays.

    • Decentralization: Relays are centralized. Six relays, operated by five different entities, account for 99% of mev-boost blocks. This small consortium of relay operators should not play such an outsized role in the ecosystem.
    • Censorship resistance: Relays can censor blocks. Since relays are centralized, they are exposed to regulation. This played out post-merge as some relays were pressured to censor transactions interacting with addresses on the OFAC sanctions list.
    • Trustlessness: Relays are trusted by validators and builders. Validators trust relays to provide them a valid block header and to publish the full beacon block; builders trust relays not to steal MEV. A violation of either of these trust assumptions would be detectable, but as demonstrated by the “Low-Carb Crusader”, dishonesty can be profitable, even if only through a one-time attack.
  • Out-of-protocol software is brittle.

    • The “Low-Carb Crusader” unbundling exploited a relay vulnerability for 20+ million USD. This attack and the general class of equivocation attacks it embodies demonstrate that relays are valuable targets outside of the protocol.
    • The relay response to the unbundling attack caused consensus instability. Due to relay-induced latency into the block-production pipeline, there was a 5x increase in reorged blocks immediately after the attack. See “Time, slots, and the ordering of events in Ethereum Proof-of-Stake” for more details.
    • During the Shapella upgrade, there was a bug in the Prysm code that interacts with mev-boost. This resulted in a brief 10x spike in missed slots immediately following the hard-fork. This bug was not caught because the code path for externally-built blocks is not covered by the consensus spec tests.
    • There are significant core-dev coordination costs involved in maintaining compatiblity between beacon clients & relays. Each hard-fork represents a significant amount of work from the relay and core developers to ensure mev-boost continues functioning. This involves designing the builder spec, maintaining/improving the relay spec, and the software changes on the beacon clients, mev-boost, and the mev-boost relays. Because mev-boost is out-of-protocol, this coordination is strictly additive to the standard ACD pipeline and usually happens later in the development cycle as a result.
    • mev-boost does not inherit the benefits of client diversity and the full consensus specification process. Though there are multiple relay repositories, the vast majority of blocks flow through relays using the flashbots implementation. While this is simpler to maintain, it lacks the structural benefits of client diversity enjoyed by the beacon nodes; the full specification/spec-test infrastructure is also not leveraged by the differing relay repositories.
  • Relays are expensive public goods.

    • Relay operational costs range from ≈20k-100k USD per-year depending on the desired performance. This doesn’t include engineering and DevOps costs associated with running a highly-available production service.
    • Relays are public goods that don’t have a clear funding model. While there are discussions around guilds, grants, and other funding vehicles, there is no obvious way to support relay development and operation (similar to the issues faced in supporting core-dev teams).

~~ ePBS resolves these issues by eliminating the relay. ~~

Note: MEV burn is currently being explored for its economic & security benefits. If we decide to pursue MEV burn, these benefits serve as yet another carrot on a stick for enshrinement.

Reasons not to enshrine

We believe it is important to highlight the counter-point of enshrinement by addressing the main arguments and presenting our responses.

  • “If it ain’t broke don’t fix it.” mev-boost has worked incredibly well given the scale of its adoption. As the implementation continues to harden, we can gain confidence in its security properties and build out the specification. If we can find a credibly neutral way to fund a set of relays, then we could continue depending on them into the future.

    • Response – mev-boost has worked well, but there are no guarantees that this stability will continue. Another major censorship event, further attacks, and continued centralization pressures of relays, builders, and searchers pose significant risks to Ethereum. There is value in having clarity about ePBS to handle a situation where there is a pressing need for faster enshrinement. Additionally, ePBS will take time to design and implement – we should start formalizing it now, even if we continue with the relays for the next O(1-2 ; years) as ePBS progresses.
  • Could MEV be addressed with different tools? There is a growing discourse around protecting users from MEV on the application/transaction level. SUAVE, CoW swap, and MEVBlocker are three of many solutions that are continuing to gain usage. If a significant portion of MEV can be protected against, perhaps enshrining PBS is an unnecessary step on an already ambitious roadmap.

    • Response – We hope that this line of work can help protect users from “toxic” MEV, but we don’t expect on-chain MEV to ever be fully eliminated. Further, some MEV is extracted off-chain, requiring sophistication beyond just computational power. For example, in order to execute a CEX-DEX arbitrage, a validator would need liquidity and connectivity with a CEX in addition to the algorithmic resources to find and execute such an opportunity. We don’t envision a future in which there is little to no MEV in Ethereum or a solo-staking validator could meaningfully extract it on their own.
  • There are other roadmap items that should take precedence. The roadmap has many goals beyond ePBS. If we choose to go ahead with ePBS, it begs the question of where this can fit on the roadmap and what upgrades will be pushed down the line as a result.

    • Response – We believe that ePBS depends on Single-Slot Finality (SSF) for security and complexity reasons. Additionally, a validator set consolidation is a prerequisite for any SSF progress (see Increase the MAX_EFFECTIVE_BALANCE). The resource allocation problem is difficult, but we believe that ePBS should be part of these discussions, especially in the context of being bundled (pun-intended) with a larger consensus upgrade.
  • What is the right thing to enshrine? From a protocol design perspective, there are many mechanisms that could be implemented. Barnabé explores these concepts in “Unbundling PBS”, “Notes on Proposer-Builder Separation”, and “Seeing like a protocol”. One takeaway from this work is that mev-boost implements a block-auction, which is not the only option for ePBS. Julian explores this further in “Block vs Slot Auction PBS”.

    • Response – ePBS is only useful insofar as it is adopted by builders and validators; the worst-case scenario is that ePBS is sidestepped by out-of-protocol solutions we didn’t foresee. While we acknowledge that any protocol upgrade has unknown-unknowns, we believe that by opening the discussion, working to achieve rough community consensus, and taking the next step in formalizing the design space of ePBS will improve confidence around what we are working towards. We also present the optimistic relay roadmap below, which takes a more iterative approach at evolving mev-boost.

ePBS design space

For extensive ePBS literature links see “Bookmarks relevent for Proposer-Builder Separation researchers”. We define the following properties as desirable:

  1. honest builder publication safety – If an honest builder wins the auction, the builder (i) must have an opportunity to create a block, and (ii) must be confident that any payload contents they release become canonical (i.e., protection from unbundling & equivocation attacks from the proposer).
  2. honest builder payment safety – If an honest builder payment is processed, the builder must be able to publish a block that becomes canonical.
  3. honest proposer safety – If an honest proposer commits to a block on-time, they must receive a payment at least as large as specified by the bid they selected.
  4. permissionless – Any builder can participate in the auction and any validator can outsource block production.
  5. censorship resistance – There must be a mechanism by which honest proposers can force through transactions they suspect are being censored without significantly sacrificing on their own rewards (“If we rely on altruism, don’t make altruism expensive” –Vitalik).
  6. roadmap compatibility – The design must be compatible with future roadmap upgrades (SSF, mev-burn, distributed block-building, SSLE, DAS, etc).

One design instantiation – Two-Block HeadLock (TBHL)

While there are many proposed ePBS implementations, we present a small modification of the original two-slot design from Vitalik. We call it Two-Block HeadLock (TBHL) because it uses a single slot to produce two blocks. The first is a proposer block that contains a commitment to a specific execution payload and the second is a builder block that contains the actual transaction contents (here we just call the overall pair of blocks a “single” slot because only one execution payload is produced). Note that with a second round of attestations, the slot time will likely need to increase. TBHL also incorporates some of the features of headlock to protect builders from proposer equivocations. TBHL shares many components with the current mechanism (HLMD-GHOST) and satisfies the six properties specified above.

Note: This is a sketch of the design; it is intentionally brief to improve readability. If we gain confidence that TBHL is a good overall approach, we can begin the specification and full security analysis. The aim is to present a simple, concrete example of a mechanism that satisfies the ePBS design properties, without overloading the reader with implementation details.

upload_8955ce1c08fe500e9aa8cbe27d31032f

Figure 2The slot anatomy of TBHL. A proposer block is proposed and attested to in the purple phase, while a builder block is proposed and attested to in the yellow phase. The proposers, attesters, and builders each make different observations at various timestamps in the slot.

TBHL has the notion of proposer and builder blocks. Each slot can contain at most: one proposer block + one builder block, each of which receives attestations. The slot duration is divided into 4 periods.

t=t_0 : The proposer chooses winning bid and publishes a proposer block. The proposer starts by observing the bidpool, which is a p2p topic where builders send their bids. The proposer selects one of these bids and includes it in a block they publish before t_1.
t=t_1 : The attesting committee for the proposer block observes for a timely proposal. This is the equivalent of the “attestation deadline” at t=4 in the current mechanism. If at least one block is seen, the attesting committee votes for the first one that they saw. If no block is observed, the attesting committee votes for an empty slot (this requires block, slot voting).
t=t_{1.5} : The attesting committee for the builder block checks for equivocations. If the attesting committee sees (i) more than one proposer block or (ii) no proposer blocks, they give no proposer boost to any subsequent builder block. If the attesting committee sees a unique proposer block, they give proposer boost to the builder associated with that bid (see Headlock in ePBS for more details).
t=t_2 : The builder checks if they are the unique winner. If a builder sees an equivocation, they produce a block that includes the equivocation as proof that their unconditional payment should be reverted. Otherwise, the builder can safely publish their builder block with a payload (the transaction contents). If the builder does not see the proposer block as the head of the chain, they publish an empty block extending their head (see Headlock in ePBS for more details).
t=t_3 : The attesting committee for the builder block observes for a timely proposal. This is a round of attestations that vote for the builder block. This makes t_3 a second attestation deadline.

We can assert that this mechanism satisfies the ePBS design properties.

  1. honest builder publication safety – The only situation where builder safety could be in question is if the proposer equivocates. For brevity, the details of the equivocation protection are left out of this document. Please see “Headlock in ePBS”.
  2. honest builder payment safety – If an honest builder is selected and their payment is processed, they will either (i) see no equivocations and have the opportunity to create a block with confidence that they are the unique recipient of proposer boost or (ii) see an equivocation and use it as proof to revert the payment. Again, please see “Headlock in ePBS” for further details.
  3. honest proposer safety – If an honest proposer commits to a block on-time, their block will receive attestations and the unconditional payment will go through without reversion because the builder will not have any proof of an equivocation. Even if the builder block is not produced, the bid payment occurs so long as no equivocation proof is presented.
  4. permissionless – The p2p layer is permissionless and any builder can submit bids to the bidpool. Any validator can listen to the bidpool if they want to outsource block building, or they can choose to build locally instead.
  5. censorship resistance – The proposal is compatible with censorship resistance schemes. For example, the proposer block could contain a forward inclusion list. See “PBS censorship-resistance alternatives” for more context.
  6. roadmap compatiblity – SSF fits naturally with this proposal by adding a third round of attestations after the builder block attestation round. The third round includes the full validator set and justifies the block immediately with a supermajority consensus. See “A simple Single Slot Finality protocol” for more details. This mechanism is also highly compatible with mev-burn, as the base fee floor deadline, D, could precede t_0.

Optimistic relaying – an iterative approach to PBS

The design framework and TBHL presented above provide a “top-down” approach to ePBS. This has historically been the way R&D is done in Ethereum. Once the design is fleshed out, a spec is written, and the client teams implement it.

The existence of mev-boost and in particular mev-boost relays gives us an interesting additional angle to approach the problem – “bottom-up”. We can imagine there are many PBS implementations that lie on a spectrum between the original mev-boost implementation and full ePBS. By modifying the relay, we can move “up” towards an ePBS implementation without needing to modify the spec and make changes to the consensus node software. This allows us to forerun and derisk some of the features of a full ePBS system (e.g., are builders OK with us removing cancellations?) while also remaining agile. This objective has already been presented in the optimistic roadmap.

The main theme of the optimistic roadmap is to remove relay responsibilities. This has the added benefit of improving the operational efficiency of running a relay. As mentioned in “Reasons to enshrine,” relay operation is expensive and is currently being done only as a public good. By lowering the barrier to entry for relay operaters, we enable a more sustainable future for mev-boost as we flesh out the details of ePBS.

Block submission in mev-boost

Before describing optimistic relaying, we briefly present the builder bid submission pipeline in the mev-boost-relay. Processing builder bids is the main function of the relay, and incurs the highest latency and compute costs. When a builder submits a bid to the relay the following occurs.

upload_29875d098550a358c78c57236b487925

Figure 3Once the builder block is received by the relay, it is validated against an execution layer (EL) client. Once it is validated, the block is eligible to win the auction and may be signed by the proposer. Once the relay receives the signed header, it publishes the block to the p2p through a consensus layer (CL) client.

Since hundreds of bids are submitted each slot, the relay must (i) handle the ingress bytes of all the builder submissions, (ii) simulate the blocks on the EL clients, and (iii) serve as a data availablity layer for the execution payloads. Additionally, the validator relies on the relay to publish the block in a timely manner once they sign the header.

Optimistic relaying v1

The first version of optimistic relaying simply removes the block validation step from the block submission pipeline.

upload_7118043a7e81e563e4e0812a5833d129

Figure 4Once the builder block is received by the relay, it is immediately eligible to win the auction and be signed by the proposer. Once the relay receives the signed header, it publishes the block to the p2p network through a consensus layer (CL) client.

The risk incurred by skipping the block validation is that an invalid block may be unknownly signed by the validator. This results in a missed slot because the attesting committee will reject the invalid block. The relay financially protects the validator against this situation by holding builder collateral. If a bad builder block results in a proposer missing a slot, the relay uses the builder collateral to refund the proposer. Optimistic relaying v1 is already upstreamed into the Flashbot’s mev-boost-relay repository and running on ultra sound relay. See “An optimistic weekend” and “Optimistic relays and where to find them” for more details.

Optimistic relaying endgame

The final iteration of optimistic relaying behaves more like TBHL. Instead of the attesting committee enforcing the rules, the relay serves as a centralized “oracle” for the timeliness of events that take place in the bidpool. The flow of a block proposal is diagrammed below.

upload_dc0b3bfcb4652c9e14e6f4b370911eeb

Figure 5Builders now directly submit bids to the p2p layer (instead of the relay). Proposers observe these bids and sign the corresponding header of the winning bid. The builder of that signed header publishes the full block. The relay observes the bidpool and checks for timeliness of (i) the proposer’s signed header and (ii) the builder’s block publication. Notice that these observations are exactly what the attesting committee is responsible for in TBHL. The relay still holds builder collateral to refund a proposer if they sign a header on-time, but the builder doesn’t produce a valid block.

Endgame optimistic relaying contains some of the ePBS machinery; proposers and builders will be interacting directly through the bidpool and relays will be implementing the validity conditions that the attesting committee would enforce at the consensus layer. Additionally, relay operation at that point is reduced to a collateralized mempool oracle service, which should be much cheaper and easier to run than the full validating relays of today.

Conclusion

Proposer-Builder Separation is an important piece of Ethereum’s roadmap and continues to gain momentum in the public discourse. This document aims to present the arguments for and against enshrinement, lay out design goals of an ePBS mechanism, present Two-Block HeadLock (a minor variant of Two-slot PBS), and describe the utility of the optimistic relay roadmap. We hope to open up the enshrinement discussion and solicit alternative ePBS proposals from the community. While these “top-down” design and specification discussions continue, we hope to move forward on the “bottom-up” approach of optimistic relaying with the goal of making relays cheaper and more sustainable in the medium-term.

For any questions, concerns, or corrections, please don’t hesitate to reach out on twitter or through telegram.

thanks for reading!
-mike & justin

1 post – 1 participant

Read full topic

Bid cancellations considered harmful

https://ethresear.ch/t/bid-cancellations-considered-harmful/15500

Bid cancellations considered harmful*

mike neuder & thomas thiery in collaboration with chris hager – May 5, 2023

tl;dr; Under the current implementation of mev-boost block auctions, builders can cancel bids by submitting a later bid with a lower value. While relays facilitate bid cancellations, they cannot guarantee their effectiveness because proposers can end the auction at any given time by asking the relay for the highest bid. In this post, we explore the usage and implications of bid cancellations. We present the case that they are harmful because they (i) incentivize validators to behave dishonestly, (ii) increase the “gameability” of the auction, (iii) waste relay resources, and (iv) are incompatible with current enshrined-PBS designs.

Thanks to Barnabé, Julian, and Jolene for the helpful comments on the draft! Additional thanks to Xin, Davide, Francesco, Justin, the Flashbots team, relay operators, and block builders for discussions around cancellations.

Block proposals using mev-boost

Ethereum validators run mev-boost to interact with the external block-building market. Builders submit blocks and their associated bids to relays in hopes of winning the opportunity to produce the next block. The relay executes a first-price auction based on the builders’ submissions and provides the proposer with the header associated with the highest-paying bid. The figure below shows the flow of messages between the proposer, the relay, and the p2p layer.

Figure 1. Builders compete to have the highest bid by the time the proposer calls getHeader, which fetches an ExecutionPayloadHeader from the relay. The relay returns the header corresponding to the highest-paying bid among the builders’ submissions. The proposer then signs the header and returns it to the relay by calling getPayload, which causes the relay to publish the block to the p2p network.

Let n denote the proposer’s slot. The block auction mostly takes place during slot n-1. The figure below shows an example auction with hypothetical timestamps where t=0 represents the beginning of slot n.

upload_eeac5c0e5355f779922cf51421ef13a2

Figure 2. An example timeline of the auction for slot n. At t=-8 (the attestation deadline of slot n-1 – see Time, slots, and the ordering of events in Ethereum Proof-of-Stake), a canonical block will generally be observed by the network. Builders immediately begin building blocks for slot n, and compete by submitting bids relative to the extractable value from transactions. The proposer calls getHeader at t=0.3, right after the slot boundary. Note that the auction is effectively over at this point, but neither the builders nor the relay know because getHeader doesn’t check the identity of the caller. After signing the header, the proposer initiates a getPayload request to the relay at t=1.7. Once the signature is verified, the relay knows the auction is over, and stops accepting bids for slot n. At t=2.3, the relay beacon nodes publish the new beacon block, and the builders immediately begin building for slot n+1. Units are displayed in seconds.

During the auction, new MEV-producing transactions are created. Builders compete by using these new transactions to construct higher-paying blocks thereby increasing the value of their bids; the winning bid typically arrives very near to or right after the start of slot n. In Figure 3, we show increasing bids being submitted by builders and collected by the relay over the duration of a single slot.

upload_beb5263af7b21bf7e326df1b82383221

Figure 3. The value of builder bids as time passes. The green circle denotes the winning bid for this slot. A majority of builder bids arrive around t=0, which marks the beginning of the proposer’s slot. Bids generally increase as time passes, because new MEV opportunities arise.

NOTE: builder bid data is public via the Data API as specified in the relay spec.

Bid cancellations

A lesser-known feature in the relay architecture allows builders to decrease the value of their bid by submitting a subsequent bid with a lower value (and the cancellations argument in the request). The cancelling bid has to arrive at the relay later then the cancelled bid; the relay only tracks the latest bid received from each builder pubkey. The cancellation property is applied to the incoming bid, and the relay determines what to do with this bid according to the following logic:

  1. if the bid is higher than the existing bid from that builder, update the builder’s bid
  2. if the bid is lower than the existing bid from that builder
    • if cancellations are enabled, update the builder’s bid
    • if cancellations are not enabled, do not update the builder’s bid

For example, consider a builder who currently has a bid of 0.1 ETH. If they submit a bid with 0.2 ETH, that bid will become the builder’s bid no matter what. If they submit a bid with 0.05 ETH, it will become the builder’s bid only if cancellations are enabled.

Builder cancellations are used to provisionally allow searchers to cancel bundles sent to builders. One of the prominent use cases for searcher cancellations is CEX DEX arbitrage, where the bundles are non-atomic with on-chain transactions. Centralized exchanges typically have ticker data and price updates on a much higher frequency than the 12 second slot duration. Thus a CEXDEX arb opportunity that is available at the beginning of the slot might not be available by the end and searchers would like to cancel the bundle to avoid a non-profitable trade. If we decide to keep cancellations, further research around the searcher cancellation strategies should be done.

Cancellations impacting the outcome of the auction

Effective cancellations change the outcome of the auction. Given a winning bid, we define a bid as an effective cancellation if (a) its value is larger than the winning bid, and (b) it was eligible in the auction before the winning bid.

We need (b) because the relay cannot always know when the proposer called getHeader. From the winning bid, we know the getHeader call came after that bid became eligible (otherwise it wouldn’t have won the auction), thus any bids that were eligible before the winning bid must also have arrived before getHeader.

This subset of bids is relevant as each could have won the auction had cancellations not been allowed. We found that effective cancellations are quite common; from a sample of data from ultra sound relay over a 24-hour period on April 27-28th (slot 6316941 to 6324141), 269/2846 (9.5%) of the slots relayed had at least 1 effective cancellation. Similarly, a sample from Flashbots’ relay over the same time showed 256/2110 (12.1%) slots with effective cancellations. Figure 3 shows that most cancellations are submitted around t=0 (median = -510 ms), as well as the distributions of cancellation bid values (median = 0.07 ETH), and the percentage of cancellation value increase relative to winning bids (median = 0.97%).

upload_42b3ba7168ec96efc8110eb599681996

Figure 4. (left) The distribution of cancellation times for various builders, where 0 denotes the beginning of slot n. (middle) The distribution of the value of the canceled bids. (right) The percentage increase of canceled bids. For example, a value of 10% means the canceled bid was 10% larger than the bid that replaced it.

Why are bid cancellations considered harmful?

We highlight four issues:

  1. cancellations are not incentive compatible for validators
  2. cancellations increase the “gameability” of the block auction
  3. cancellations are wasteful of relay resources
  4. cancellations are not compatible with existing enshrined-PBS designs

Cancellations and validator behavior

Validators control when they call getHeader and thus can effectively end the auction at any arbitrary time. The honest behavior as implemented in mev-boost is to call getHeader a single time at the beginning of the proposer’s slot (t=0).

upload_2ece315d6d3e2292153cf78b51c0def4

Figure 5. (left) The distribution of timings for getHeader and getPayload from a sample of blocks from ultra sound relay. (right) The distribution of the difference between the call timestamps. This represents the time it takes for the proposer to sign and return the header.

We collected this data by matching the IP address of the getPayload and getHeader calls, which limits the sample to proposers who make the call from the same IP address. The vast majority of getHeader calls arrive right after the slot begins (at t=0). However, with cancellations, rational validators are incentivized to call getHeader multiple times, and only sign the header of the highest bid that they received. This is demonstrated in the example below.

upload_f49c1bbe1feff24d489d27a1bdd3fe4f

Figure 6. Honest vs optimal validator behavior for calling getHeader. Each circle represents a builder bid, where the builder has cancellations enabled. If the validator behaves honestly and calls getHeader at t=0, the relay will return the latest builder bid, which has a value v=1 (in red). However, if the proposer calls getHeader at t=-1 (one second before their slot begins), they will receive a higher bid with a value v=4 (in blue).

Validators can effectively remove cancellations by calling getHeader repeatedly, and only signing the highest response. Furthermore, there exists an incentive for them to do this because it can only increase the bid value of the header they end up signing.

Cancellations increase the “gameability” of the auction

By allowing builders to cancel bids, the action space in the auction increases dramatically. We focus on two strategies enabled through cancellations (though there are likely many more!):

  1. bid erosion – where a winning builder reduces their bid near the end of the slot.
  2. bid shielding – where a builder “hides” the true value of the highest bid with an artificially high bid that they know they will cancel.

Bid erosion

This strategy is quite simple: if the builder knows that they have the highest bid on the relay, they can simply reduce the value of the bid gradually so long as they maintain a lead over the other bids, thus increasing their profits as a direct consequence of paying less to the proposer. Another heuristic of bid erosion is the winning-builder bids decreasing in value as other builders bids continue increasing in value. Figure 7 shows this strategy playing out.

upload_471072443d0267075d1888a0c393b8fb

|

upload_e3977f4431686deb94b61126ec9c4f75

Figure 7. In both plots, the green circle represents the winning bid and the red x’s are effective cancellations. (left) We see that the winning builder continually reduces their bid, but still wins the auction. (right) The builder submits a set of high bids and quickly erodes it down to a reduced value.

Bid shielding

As described in Undesirable and Fraudulent Behaviour in Online Auctions, a bid shielding strategy places an artificially high bid to obfuscate the true value of the auction, only to cancel the bid before the auction concludes. Applied to the block auction, a builder can hide the true value for a slot by posting a high bid and canceling it before the proposer calls getHeader. This strategy takes on some risk because it is possible that the builder must pay out the high bid if it wins the auction, but cancelling the bid a few seconds before the start of the slot makes the strategy quite safe. Additionally, this strategy could be used to grief other builders who have an advantage during a slot into bidding higher than they would have if the shielding bid was not present.

upload_2c7963db6e835fea8f89d3895cc2e5c1

|

upload_c91ae37759e9ab79fd7e956bc662e11a

Figure 8. Potential bid shielding examples. (left) We see multiple builders bidding high between t=-4 and t=-1.5, which may be an attempt to cause other builders to bid higher than they would have otherwise. As the slot boundary approaches, the shielding bids are cancelled, leaving the winning bid to a different builder. (right) A builder setting a cluster of high bids at t=-2 only to reduce their bid closer to t=0, while still winning the auction.

Cancellations are wasteful of relay resources

With cancellations, relays must process each incoming bid from the builders, regardless of the bid value. This results in the unnecessary simulation of many blocks that have a very low probability of winning the auction. Without cancellations, the relay would only need to accept bids that were greater than the current highest value. On a sample of 252 recent slots, ultra sound relay had an average of 400 submissions per slot. Of those 400, on average only 60 (15%) were greater than the top bid of the time. This implies an 85% reduction in simulation load on the relay infrastructure by removing cancellations. To illustrate this, consider the example of slot 6249130 below; of the 1014 bids received, only 59 (6%) incrementally improve the highest bid, and the remaining 955 bids could safely be ignored without cancellations.

upload_1c7921b1b89b9a550a070bb890a0e841

Figure 9. The value of bids as time evolves over a single slot. With cancellations, every bid (in gray) must be processed by the relays. Without cancellations, only bids that incrementally increase the highest bid (in purple) would need to be processed.

Cancellations conflict with enshrined-PBS

Lastly, cancellations are not compatible with current designs of enshrined-PBS. For example, if we examine proposer behavior in the two-slot mechanism presented by Vitalik, we see that just like in mev-boost, there exists an incentive for the validator to ignore builder cancellations.

upload_8690dc383509b04f965ff92f7e29ce10

Figure 10. A builder submits two headers as bids. The first has val=2 and the second (a cancelling bid) val=1. The proposer observes the headers in the order that they were published. With cancellations, the proposer should include h2 in their beacon block, because it was the later bid from the builder. However, this is not rational behavior and if they include h1 instead, they earn a higher reward.

Without the relay serving as an intermediary, builder bids will be gossiped through the p2p network. Similar to the rational validator behavior of calling getHeader and only signing the highest-paying bid, any validator can listen to the gossip network and only sign the highest bid they observe. Without an additional mechanism to enforce ordering on builder bids, there is no way to prove that a validator observed the canceling bid on-time or in any specific order.

Additionally, if the final ePBS design implements mev-burn or mev-smoothing, where consensus bids from the attesting committee enforce that the highest-paying bid is selected by the proposer, bid cancellations are again incompatible without significant design modifications (e.g., block validity conditions are modified to (i) the block captures the maximal MEV as seen by attesters listening to bids (ii) attesters have not seen a timely cancellation message for the bid). This would increase the complexity of the change and require extensive study to understand the new potential attack vectors it exposes.

Future directions

Below are some short & medium-term changes we encourage the community to consider:

  1. Remove cancellations from the relays.

    • This is the simplest approach, but would require a coordinated effort from the relays and generally is not favorable for builders. The relays would enter a prisoner’s dilemma, where defecting is a competitive advantage (i.e., there exists an incentive to allow cancellations because it may grant the relay exclusive access to valuable builder blockflow).
  2. Implement SSE stream for top bids from relays.

    • An SSE stream of top bids would eliminate the need for builders to continually poll getHeader; on ultra sound relay, around 1 million calls to getHeader are received every hour. The biggest challenge here is ensuring fair ordering of the message delivery from the SSE. Perhaps by using builder collateral or reputation as an anti-sybil mechanism, the relays can limit the number of connections and randomize the ordering. As implemented today, builders are already incentivized to colocate with relays and call getHeader as often as possible to get the value of the highest bid, so the SSE could also simplify the builder-side logic.
  3. Require proposer signature for getHeader.

    • With (2) above, we could limit getHeader to just the current proposer by using a signature to check the identity of the caller. We use the same mechanism to ensure that getPayload is only called by the proposer. This change would alter the builder spec because the request would need a signature. Note that this could further incentivize relay-builder or builder-proposer collusion, as builders will want to access bidding information.
  4. Encourage proposer-side polling of getHeader.

    • On the mev-boost (proposer side) software, we could implement the highest-bid logic. This is accomplished either by polling getHeader throughout the slot (e.g., one request every 100ms) or by listening to the SSE stream if we implement (2). This effectively removes cancellations using the validator, so would not require a coordinated relay effort. Validators could opt-in to the new software and would earn more rewards if they updated. This change could cause builders to decrease the value of their bids they cannot cancel, which could also incentivize proposer-builder trusted relationships where the proposer gets access to higher bids if they allow cancellations and ignore the relay all together. Before implementing this, more discussion is needed to avoid this scenario.
  5. Research the role of cancellations in auctions.

    • We hope this post can lead to more research into exploring the nuanced spectrum between simply enabling and disabling cancellations. While we argue cancellations are considered harmful, examining the motivations behind their use by auction participants (e.g., builders, searchers) and assessing their impact on auction revenue and MEV distribution between proposers and builders remain open questions.

Please reach out with any comments, suggestions, or critiques! Thanks for reading :slight_smile:

1 post – 1 participant

Read full topic

Equivocation attacks in mev-boost and ePBS

https://ethresear.ch/t/equivocation-attacks-in-mev-boost-and-epbs/15338

Equivocation attacks in mev-boost and ePBS

authors (alphabetically)
francesco d’amato & mike neuder – April 18, 2023

cross posted at Equivocation attacks in mev-boost and ePBS – HackMD


tl;dr; We analyze “equivocation” attacks, in which a malicious proposer double-signs a header in an attempt to unblind a builder block. These attacks can be used to steal MEV and unbundle transactions. We present “headlock”, which is also explored in Subverting the total eclipse (of the heart) by Dan Marzec and Louis Thibault, as a potential short-term solution for the current mev-boost ecosystem and discuss its implications as a long-term solution to equivocation attacks in two-slot ePBS.


Attacks

April 2nd unbundling attack

During slot 6137846, a malicious proposer exploited the mev-boost relay to unbundle a sandwiching-searcher’s transactions for a profit of over 20 million USD. This attack has been well analyzed; see Further Reading from Subverting the total eclipse (of the heart). The figure below schematizes the attack.

Two factors made this attack easier to execute: (1) the relay didn’t check for the validity of the signed header sent from the proposer, and (2) the relay returned the block body to the proposer before publishing it to the p2p layer. These relay weaknesses were quickly addressed in commit 1 & commit 2 respectively, and a number of other relay modifications were published in the days following the attack. Special thanks to Chris Hager and the Flashbots team for the huge effort leading the charge to patch the relays. As a result of this work, the attack as executed on April 2nd is no longer feasible. However, as the community began exploring the space of double-signing attacks, many individuals and teams coallesced to a more general (though harder to execute) attack that the relay in its current form cannot defend against.

Equivocation attack mev-boost

The diagram below outlines the general version of the attack. Due the the fixes mentioned above, the proposer now must submit a valid signed header to the relay and will first observe the block body over the p2p network after it has been published by the relay.

In order to successfully launch this attack in a single slot, the attacker must (1) observe the block body over the p2p, (2) unbundle it to construct a conflicting block b, and (3) publish block b fast enough to to win a majority of the attestations for that slot. It has been pointed out that if the attacker controls two slots in a row, the second slot block can build on block b, giving proposer boost to that fork. This reduces the proportion of attestations that the attacker needs. Nonetheless, the attacker still must acquire at least 30% attestation weight on block b (proposer boost gives 40%, so if a block A has more than 70% attestation weight, it won’t be reorged out, even with proposer boost on the attacker fork).

Equivocation attack in ePBS

This attack has implications for the current ePBS proposals. For this section we use Vitalik’s two-slot ePBS as the reference design. The figure below demonstrates the “happy path” of block production in two-slot ePBS.

In two-slot ePBS, each proposer slot (where validators are elected as block-producers according to the existing Proof-of-Stake protocol) is paired with a builder slot. During the proposer slot, the elected validator publishes a beacon block (b1 above) that includes a builder block header (h1 above). This block header elects the builder as the block-producer for the subsequent slot. If b1 remains canonical, the builder of h1 unconditionally pays the proposer. During the subsequent slot, an honest builder of h1 publishes a new beacon block (b1' above) including the execution block body. Because b1' is a child block of b1 and there are no equivocations, honest attesters will correctly grant proposer boost to the builder of b1' for the duration of the slot. The next PoS proposer will treat b1' as the head of the chain and use it as the parent of their beacon block.

An equivocating proposer can execute a griefing attack against the builder as follows.

By concurrently publishing two conflicting beacon blocks (with different execution headers h1 & h2), the attacker partitions the attester set. The builder observes the equivocation, and is forced to make a decision: either (a) publish their beacon block (b1' above) and hope that it is not reorged, or (b) withhold their beacon block and run the risk of b1 becoming canonical. In the former case, the builders exposes e1 (the execution payload associated with h1), which contains the MEV and bundled transactions. If b1 gets orphaned by b2, the builder does not make a payment, but the transactions in b1' may be unbundled and the MEV stolen in the subsequent slot. In the latter case, if b1 becomes canonical, the builder unconditionally pays the proposer for the slot, but doesn’t earn any MEV or block reward from the block publication. To drill down on the reason e1 could easily be reorged, we can examine the different views of attesters during these slots.

This attack is made possible by the lack of a unique builder. With the attestations between b1 and b2 being so close, the builder has no way of knowing which will become canonical, thus is forced into making a decision. Because h1 & h2 specify different builders, attesters that see b1 vs b2 as canonical assign proposer boost to their respective builders. Note that the single slot unbundling is not possible in this scenario because the attacker had to commit to a header h2 before ever seeing the contents of e1. In this case, the attack griefs the builder into exposing e1 without any guarantee that the block becomes canonical.

Defending against proposer equivocation

Headlock in mev-boost

As proposed in Subverting the total eclipse (of the heart), a near-term solution to the equivocation attack can be acheived by modifying the honest attestation behavior to “lock” into a specific header before the data of the body is made available. The idea relies on a new pub-sub topic on the p2p layer, where headers are propagated between beacon nodes. With this in place, “headlock” is instantiated through the following changes–

New honest attesting behavior

  1. Listen on the header topic for incoming headers for the next slot.
  2. If a valid header is observed, the node will only accept a block with a matching body (this header is “locked”).
  3. If an equivocating header is observed, the node will reject it and not propagate it.
  4. If no block corresponding to the locked header is published by 4s (the attestation deadline), vote for the parent block.

New relay behavior

  1. Gossip the first signed header that is received.
  2. Wait for a fixed amount of time.
  3. Check for equivocations and refuse to publish if any are observed.
  4. Publish the block body otherwise.

These new mechanics are demonstrated in the figure below.

This allows the relay to only release the body of the block when they are confident that most honest attesters are locked on to the non-equivocating header that they have observed.

Headlock timing impact

Headlock introduces more overhead into the relay publication time. The 4s attestation deadline is already a small window, and we saw a large increase in missed slots and orphaned blocks during the April 2nd attack response when relays were modifying the acceptable deadline for block publications. The figure below shows the distribution of the arrival of getPayload (signed header received) calls on the ultra sound relay. The relay still has a number of checks to perform before broadcasting the block as quickly as possible to ensure most validators see it by the attestation deadline 4000ms into the slot.

Trying to add an additional header propogation period and equivocation check would likely be not possible without extending the attestation deadline to 6s, which also implies reducing the attestation aggregation and propagation time from 4s to 3s each (or increasing the slot time). More data and discussion with the core-dev, staking, mev-boost, and research communities would need to take place before committing to this direction.

Headlock in ePBS

Headlock can also be extended to ePBS by making two changes to the two-slot ePBS design:

  • A change in how the “proposer for builder slots” (the builder) is selected.
  • A change in the payment logic, so that the payment is not released if equivocation evidence is available.

Together, the two changes ensure that builders can safely abstain from publishing whenever there is no agreement on a single builder, due to proposer equivocation, without risking to have to make a payment anyway. If there is agreement on a single builder, they can be protected from reorgs to the same extent as proposers are protected in proposer slots, e.g., through proposer boost. The key here is that proposer boost works if everyone agrees on who is the proposer, implying that there’s a unique recipient of the boost.

The second change is straightforward, as it only entails delaying the payment for sufficiently long that equivocation evidence for the proposer (two signatures for the same slot) can be submitted on-chain. Let’s now examine the first change, by going through the steps of a proposer slot.

New proposer slot behavior

  1. Proposer broadcasts a beacon block normally at t=0s (we do not rely on honest timing)
  2. Attesters cast attestations for beacon blocks normally at t=4s.
  3. Any validator locally selects the block builder for the following builder slot at t=8s. This is only a local update, based on the messages observed at this point. If no beacon block is seen, no builder is selected. Similarly, no builder is selected if two equivocating beacon blocks are seen. If only a single beacon block is seen, the builder is set to the one it specifies.

In other words, whereas the proposer for proposer slots is selected on-chain, the “proposer” for a builder slot is selected locally, 8s into the previous slot. During a builder slot, this selection affects attestation behavior, in the same way that the (on-chain) proposer selection affects it during proposer slots: if validator v has selected builder v’ as the builder for slot n, v treats v’ exactly as if it were the proposer of slot n (in a world without PBS). In particular, v assigns proposer boost to v’. If all validators agree on the builder selection, they all assign proposer boost to the same builder, which makes the proposer boost mechanism effective in protecting that builder from reorgs (alternative reorg resilience mechanisms to proposer boost, like view-merge, similarly require the identification of a unique proposer which is protected by the mechanism). To demonstrate this mechanism, let’s go through a builder slot:

  1. At 0s, a builder v which has observed a single beacon block b selecting it as builder runs the fork-choice and determines the head b'. If b = b', v publishes a beacon block extending b, with the appropriate payload. If b != b', v publishes a beacon block extending b', without a payload. A builder which has observed equivocation from the proposer does not publish, regardless of whether they are selected by one of the equivocating blocks.
  2. At 4s, attesters cast attestations, giving proposer boost to the builder which they have selected in the previous slot, if there was one. If they had not selected any builder, they ignore any proposal and just vote for the empty slot extending the head of the chain in their view.

The new flow is represented below.

The key here is that the builder only needs to publish if no equivocations are present. If there is an equivocation, the builder can use it as proof to avoid the unconditional payment. This ensures that the builder only is required to releases the block body if they are certain their block will end up on chain by nature of being the only builder with proposer boost, which establishes the property of builder-uniqueness which is needed to properly protect the builder.

Guarantees of ePBS with headlock

Let’s more thoroughly examine the guarantees which ePBS with headlock gives to honest builders. We want the following two properties:

  • Honest builder payload safety: If an honest builder publishes a beacon block with a payload, it becomes canonical.
  • Honest builder payment safety: If a payment from an honest builder is released, the corresponding payload becomes canonical.

Looking at the possible cases during a proposer slot assigned to a possibly malicious proposer:

  • Proposer does not publish any beacon block on time, before 4s: all attesters in the proposer slot vote for an empty slot. If the proposer publishes some beacon block b later, it will not become canonical, so no payment will be released. Moreover, the builder selected by b will either not publish (if it doesn’t see b or sees an equivocation) or publish a block with empty payload (if it sees b and no equivocation): crucially, it will never publish a block with a payload, because that would require seeing b as the canonical head. Therefore, no builder loses money nor reveals their payload, satisfying both properties.
  • Proposer publishes multiple beacon blocks before 8s: any builder will observe the equivocation by 12s, and not publish, satisfying the first property. Moreover, no payment will be released, satisfying the second property.
  • Proposer publishes a single beacon block b before 4s, and no equivocation before 8s: all validators see b before 8s, and do not see any equivocation by 8s, so at 8s they all locally set the builder to be the one selected by b, say v. Now v is fully in control, because they are recognized by all as the “proposer” for the next slot, so all validators assign it boost. There are three possible cases:
    • v sees an equivocation before 12s. In this case, we again clearly satisfy all properties (no payment released, v does not publish).
    • v does not see an equivocation before 12s, and sees b as the head of the chain. v publishes a block extending b, with the right payload, which receives boost and becomes canonical. Both properties are satisfied.
    • v does not see an equivocation before 12s, and sees b' != b as the head of the chain. v publishes a block with empty payload extending b', which receives boost and becomes canonical. Therefore, b does not become canonical, so the payment is not released. Since neither the payment is released nor the payload published, both properties are satisfied.

Additionally, we still have a strong guarantee for honest proposers, i.e. honest proposer reorg safety (also known as reorg resilience): the beacon block proposed by an honest proposer becomes (and stays) canonical. This is simply due to the normal applications of reorg protections for proposers, such as proposer boost, and immediately implies honest proposer payment safety (that they receive the agreed-upon payment), because honest proposers do not equivocate, so the payment is always released.

Finally, the protocol is also protected from proposer and builders colluding to extract more MEV by releasing late: we have time-buying collusion-resistance. This is simply because we employ (block, slot)attestations (as in regular two-slot PBS), i.e. we allow (and require) validators to vote against late proposals. If sufficiently many validators do so, the builder is unable to do anything to force a late proposal to be canonical. This is similar to the Allow honest validators to reorg late blocks spec PR from Michael Sproul, where proposers uses their proposer boost to enforce timeliness in the previous slot. The key difference is that with (block, slot) attestations we do not rely on honest behavior from the next proposer, only on honest majority of attesters. This is particularly important here, given that the next “proposer” is the chosen builder, and we are precisely trying to guard against collusion of a proposer and the builder of their choice.

1 post – 1 participant

Read full topic