Factors of decentralization of web3 protocols: Tools for planning greater decentralization

https://a16zcrypto.com/posts/article/decentralization-factors-web3-protocols-tables/

Decentralization is the key innovation enabled by blockchain technology and one of the most important characteristics of web3 protocols. As a result, web3 participants, policymakers, and regulators must develop a more uniform and nuanced understanding of what constitutes decentralization to enable more accurate assessments and comparisons of the decentralization of various web3 protocols. This will also better position web3 regulation and policy to understand and account for how decentralization can reduce risk and ultimately incentivize web3 builders to pursue decentralization in such a way that maximizes the public benefits that web3 promises.

To that end, we define three types of decentralization and suggest factors relevant for each type of decentralization in the context of (1) tokenized blockchain protocols (e.g., layer 1 and layer 2 blockchains like Bitcoin, Ethereum, Polygon, Solana, Optimism, Arbitrum, zksync, etc.) and (2) tokenized smart contract protocols deployed to blockchains (e.g., Uniswap, Aave, Compound, Curve, etc.). We also introduce two tables – one for (1) tokenized blockchain protocols and one for (2) tokenized smart contract protocols, viewable at their respective links – that should help enumerate the components of decentralization to provide a more concrete and standardized definition.

Any analysis of the decentralization of a blockchain protocol or smart contract protocol must consider the totality of the circumstances surrounding such a protocol. The factors we lay out here are an attempt to chart a course for such analysis.

Why decentralization?

Web3 has ushered in a new age of the internet – the age of “read, write, own.” The technology underpinning web3 enables “trustless computation,” which removes the need to rely on a centralized entity to navigate the web and databases. This makes it possible to develop more complex and sophisticated protocols that offer the functionality of the modern internet but that can also be owned by users. For example, a decentralized social media protocol could enable various applications to be built, all of which leverage the same protocol to distribute ownership and control over that protocol to a broad constituency of developers and users through token ownership.

Decentralization is the critical feature of web3 protocols that enables this paradigm shift. Decentralization will drive the creation of a democratized internet and will enable three important shifts: promoting competition, safeguarding freedoms, and rewarding stakeholders.

First, decentralization enables web3 systems to be credibly neutral (they cannot discriminate against any individual stakeholder or any group of stakeholders, which is critical to incentivize developers to build within ecosystems) and composable (to mix and match software components like Lego bricks). As a result, web3 systems function more like public infrastructure than proprietary technology platforms. In contrast to the gated software of Web2, web3 protocols provide decentralized internet infrastructure on which anybody can build and create an internet business. Crucially, in web3, this can be done without the permission of the original deployer of the protocol or the need to use a centrally controlled interface. For example, contrast, say, Twitter with a web3 protocol that provides an underlying data architecture designed for social media that is controlled by the public through token ownership rather than a corporation. In such a system, anyone could build their own client or application on top of the protocol and gain access to its network of users.

This is abstract, so consider this diagram of a web3 ecosystem, with a decentralized blockchain, a decentralized smart contract protocol governed by a DAO of token-holders, and several proprietary clients operated as separate businesses using traditional entity forms on top of the network and protocol. Each of the blockchain and smart contract protocols functions as decentralized internet infrastructure on which businesses can build, compete, and innovate.

Second, decentralization necessitates the broad distribution of control and participation in web3 protocols, ensuring that the evolution and usage of the network reflects the input of a wide variety of stakeholders and not just the companies that created them. Properly structured protocols that promote decentralization limit the power that can accrue to one or a small group of companies. As such, decentralization should limit corporate or individual power to gatekeep, and ensure that any changes to the protocol are aligned with its broad ecosystem of users that hold tokens and ultimately govern the network.

Third, decentralization enables the design of systems that prioritize stakeholder capitalism – systems that are designed to more equitably serve the interests of all participants rather than a certain subset of them. Token-incentivized stakeholder capitalism distributes ownership and control to a broader set of stakeholders rather than prioritizing equity holders over all other stakeholders, including customers and employees. As a result, web3 protocols and networks serve as a fertile design space for systems that more equitably serve the interests of all stakeholders. And such decentralized protocols provide more stable internet infrastructure on which a broader group of stakeholders can confidently build. 

Types of decentralization

There are three different but interrelated lenses through which to view decentralization: Technical, Economic, and Legal. All three are important but often have competing interests, creating a complex design challenge in maximizing overall decentralization and utility.

Technical decentralization (T)

Technical decentralization relates primarily to the security and structural mechanisms of web3 systems. Programmable blockchains and autonomous smart contract protocols can support technical decentralization by providing an autonomous, permissionless, trustless, and verifiable ecosystem in which value can be transferred. Products and services can be deployed and run without requiring trusted, centralized intermediaries to operate (or pull the rug out from under) them, opening a vast world of possibilities.

For blockchain protocols, technical decentralization is an exceedingly challenging problem, and one that requires a balancing of several competing forces. But for smart contract protocols, this type of decentralization can be achieved relatively quickly and easily, by making the smart contracts immutable (i.e., uncontrollable and unupgradable by anyone). (For more examples, see here and here.)

Economic decentralization (E)

The ability of blockchains and smart contract protocols to make use of their own native tokens unlocks the potential for these open source and decentralized systems to have their own decentralized economies (i.e., autonomous free-market economies) and for more people to participate in and benefit from these decentralized ecosystems.

Builders of web3 systems can facilitate the formation of decentralized economies through careful design decisions that lead to the exchange and accrual of value — whether information, economic value, voting power, or otherwise — from a broad array of sources. Decentralized ecosystems, if properly structured, can use tokens to incentivize participants to contribute value to the ecosystem and correspondingly distribute that value more equitably among system stakeholders according to their contributions. To achieve this, web3 systems need to vest meaningful power, control, and ownership with system stakeholders (via airdrops, other token distributions, decentralized governance, etc.). As a consequence, the value of the ecosystem as a whole accrues to a broader array of participants rather than one central entity and its shareholders.

The ongoing balancing of incentives among the stakeholders — developers, contributors, and consumers — can then drive further contributions of value to the overall system, to the benefit of all. In other words: all the benefits of modern network effects, but without the pitfalls of centralized control and captive economies.

Legal decentralization (L)

Legal decentralization depends on whether the decentralization of a system eliminates the risks that a specific regulation may be intended to address.

For instance, technically decentralized blockchains and smart contract protocols can eliminate the risks associated with trusted intermediaries. As a result, the technical decentralization of such systems may also mean that they are legally decentralized when it comes to regulations that target trusted intermediaries.

Systems that are technically and economically decentralized can eliminate additional risks, including those associated with a web3 system’s tokens and their underlying value. Such decentralization would negate the need to apply U.S. securities laws to transactions of tokens that might otherwise significantly restrict their broad distribution.

Based on SEC guidance, we can define this level of legal decentralization as that point at which a web3 system can eliminate both the potential for significant information asymmetries to arise and reliance on essential managerial efforts of others to drive the success or failure of that enterprise. Upon meeting such a threshold the system may be “sufficiently decentralized” so that the application of U.S. securities laws to such system’s tokens should be unnecessary. As a threshold matter, this necessitates that a given token does not provide its holder with any contractual rights with respect to the ongoing efforts, assets, income, or resources of the issuer or any of its affiliates. 

Factors of decentralization

In web3 systems that make use of native tokens, the three types of decentralization — technical, economic, legal — must be viewed holistically. Changes to one may affect the others. For example, decentralized economies help drive systems towards legal decentralization by prioritizing decentralized ownership among stakeholders, value accretion from decentralized sources, and value distribution to decentralized stakeholders. All of these decrease the risk of information asymmetries and the need to rely on managerial efforts of individuals.

Conversely, if the value of a digital asset of a web3 system is dependent on the ongoing managerial efforts of the original development team, then the decentralization of the system on all three levels may be jeopardized. For example, the management team’s departure could place enormous downward pressure on the price of the digital asset, which could make the system more susceptible to a 51% attack.

Given this interplay, we have broken down decentralization into the many factors that may influence it. Tables 1 and 2 present a comprehensive list of the most important factors that influence the decentralization of tokenized consensus blockchain protocols and tokenized smart contract protocols, respectively. These factors are broken down by both type (Technical, Economic, and Legal) and category (Computational, Development, Governance, Value Accrual, and Usage & Accessibility).

***

Decentralization is a process that is assessed not on an absolute basis but on a spectrum that includes the totality of the circumstances of any web3 system. The relative importance of the factors will change depending on the web3 system and the purpose of the assessor. Additionally, the preferred tradeoffs between different types of decentralization may differ between projects and people.

The two matrices we’ve prepared should prove, we hope, useful tools in providing a more concrete and standardized definition of decentralization. In turn, we hope this allows web3 participants to contribute to building more decentralized projects while empowering policymakers and regulators to design regulatory frameworks that recognize the power of decentralization to reduce and eliminate risks.

***

***

Miles Jennings is General Counsel and Head of Decentralization of a16z crypto, where he advises the firm and its portfolio companies on decentralization, DAOs, governance, NFTs, and state and federal securities laws.

Steve Wink is Co-Chair of Latham’s Fintech Industry Group and Global Digital Assets & Web3 Practice and is a Partner in the New York office of Latham & Watkins. He advises clients on a variety of matters involving the regulation of markets, as well as related compliance and enforcement matters, and is ranked in Band 1 by Chambers’ Professional Advisers FinTech, and is recognized by Chambers USA as one of the country’s leading financial services broker-dealer regulation lawyers.

Adam Zuckerman is a member of the Digital Assets & Web3 and Emerging Companies practices in the Bay Area offices of Latham & Watkins, where he advises crypto and Web3 clients across their lifecycle on token launches, DAO formation, product counseling, a variety of regulatory matters, general corporate matters, and financings.

***

Disclaimer: These materials were written in partnership between a16z crypto and Latham & Watkins LLP and are for informational purposes only.  The materials do not and should not be construed as legal advice for any particular facts or circumstances.  None of the materials provided hereby are intended to be treated as legal advice or to create an attorney-client relationship.  The materials might not reflect all current updates to the law or applicable interpretive guidance and the authors disclaim any obligation to update the materials.  We strongly urge you to contact a reputable attorney in your jurisdiction to address your specific legal needs.

Under applicable Rules of Professional Responsibility, portions of this communication may contain attorney advertising.  Prior results do not guarantee a similar outcome.  Results depend upon a variety of factors unique to each representation.  Please direct all inquiries regarding the conduct of Latham & Watkins attorneys under New York’s Disciplinary Rules to Latham & Watkins LLP, 885 1271 Avenue of the Americas, New York, NY 10020, Phone: +1.212.906.1200.

Additionally, the views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not necessarily the views of a16z or its affiliates. a16z is an investment adviser registered with the U.S. Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party information; a16z has not reviewed such material and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities, digital assets, investment strategies or techniques are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments or investment strategies will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Additionally, this material is provided for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. Investing in pooled investment vehicles and/or digital assets includes many risks not fully discussed herein, including but not limited to, significant volatility, liquidity, technological, and regulatory risks. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. All materials used in this document, unless otherwise stated, are joint copyright works of a16z crypto and Latham & Watkins. Please see https://a16z.com/disclosures/ and https://www.lw.com/ for additional important information regarding regulatory disclosures.

Private, on-chain voting with Cicada

https://a16zcrypto.com/posts/article/building-cicada-private-on-chain-voting-using-time-lock-puzzles/

All voting systems rely on integrity and transparency to function in any meaningful way. At face value, this makes blockchains an ideal platform to build these systems on – and indeed, many decentralized organizations have embraced permissionless voting to express collective intent, often in the context of wielding substantial treasuries or tuning critical protocol parameters. But there are drawbacks to on-chain voting, and privacy remains unexplored to the detriment of web3 voting systems – in the majority of on-chain voting protocols used today, ballots and vote tallies are fully public. Without privacy, voting results are susceptible to manipulation and misaligned voter incentives, potentially leading to undemocratic outcomes. 

That’s why we are releasing Cicada: a new, open source Solidity library that leverages time-lock puzzles and zero-knowledge proofs for private on-chain voting. Compared to existing systems, Cicada has novel privacy properties, minimizes trust assumptions, and is efficient enough to be used on Ethereum mainnet. 

In this post, we survey the landscape of voting privacy and provide a high-level description of how Cicada works (with formal proofs to come). We also encourage developers to check out the GitHub repository – Cicada can be adapted and extended in many ways to support different voting schemes and features, and we hope to collaborate with the community to explore these possibilities. 

A brief survey of private voting

In any voting system (on-chain or otherwise), there are many different layers of privacy to consider. The disclosure of individual ballots, the running tally, and voter identities can all influence voter incentives in different ways. Which privacy properties are necessary depends on the context of the vote. A few that frequently arise in cryptography and social science literature:

  • Ballot privacy: The secret ballot, also called “the Australian ballot” was developed for real-world voting systems as a way to keep the preferences of individual voters private, and mitigate bribery and coercion (in an on-chain setting, we may need a stronger property than ballot privacy – see “receipt-freeness” below). Ballot privacy also mitigates social desirability bias – there is less pressure for someone to vote based on what others think of their choice.
  • Running tally privacy: Many voting systems hide the running tally, or how many votes have been cast for each option, while voters are still casting ballots to avoid impacting turnout and voter incentives. We’ve seen this play out in the real world; for example, US Senators who vote later are more likely to align with their party than those who vote earlier. And on-chain: in token-weighted voting, whales can lull their opponents into a false sense of security by letting them hold the lead (some may not bother to vote, assuming they will win regardless) and then cast their ballot at the last minute to swing the outcome.
  • Voter anonymity: Your vote is private in many real-world voting systems, but the fact that you voted is often public. This can be important as a safeguard against voter fraud, because publishing records of who voted allows people to check whether someone else cast a ballot in their name. On-chain, however, we can prevent voter fraud while preserving anonymity using cryptographic primitives – with Semaphore, for example, you can prove in zero knowledge that you are an eligible voter who hasn’t cast a ballot yet. 
  • Receipt-freeness: Individual voters provide a “receipt” of their ballot to prove how they voted to third parties, which otherwise might lead to vote selling. A closely related but stronger property is coercion-resistance, which prevents someone from coercing a voter to vote a certain way. These properties are especially appealing in a decentralized setting, where voting power can be made liquid via smart contract marketplaces. Unfortunately, they are also very difficult to achieve – in fact, Juels et al. state that it is impossible in a permissionless setting without trusted hardware.

Cicada is focused on running tally privacy, but (as we discuss later) it can be composed with zero-knowledge group membership proofs to obtain voter anonymity and ballot privacy as well. 

Introducing Cicada: Tally privacy from homomorphic time-lock puzzles

To achieve running tally privacy, Cicada draws from cryptographic primitives that (to our knowledge) have never been used on-chain before. 

First, a time-lock puzzle (Rivest, Shamir, Wagner, 1996) is a cryptographic puzzle that encapsulates a secret which can only be revealed after some predetermined amount of time has elapsed – more specifically, the puzzle can be decrypted by repeatedly performing some non-parallelizable computation. Time-lock puzzles are useful in the context of voting for achieving running tally privacy: Users can submit their ballots as time-lock puzzles, so that they are secret during the vote but can be revealed afterwards. Unlike most other private voting constructions, this enables running tally privacy without relying on tallying authorities (like election staff counting paper or digital ballots), threshold encryption (where several trusted parties must cooperate to decrypt a message), or any other trusted parties: anybody can solve a time-lock puzzle to ensure results are revealed after the vote.

Second, a homomorphic time-lock puzzle (Malavolta Thyagarajan, 2019) has the additional property that some computation on the encrypted value is possible knowing the secret key, decrypting the puzzle, or using a backdoor. In particular, a linearly homomorphic time-lock puzzle allows us to combine puzzles together, producing a new puzzle that encapsulates the sum of the original puzzles’ secret values.

As the authors of the paper note, linearly homomorphic time-lock puzzles are a particularly suitable primitive for private voting: Ballots can be encoded as puzzles, and they can be homomorphically combined to obtain a puzzle encoding the final tally. This means that only one computation is needed to reveal the final tally, instead of solving a unique puzzle for every ballot.

A new construction: efficiency and tradeoffs

There are several more considerations to make for a voting scheme to be practical on-chain. First, an attacker may try to manipulate the vote by casting an incorrectly encoded ballot. For example, we might expect each ballot’s time-lock puzzle to encode a boolean value: “1” to support the voted-upon proposal, “0” to oppose it. An ardent supporter of the proposal may instead attempt to encode e.g. “100” to amplify their effective voting power.

We can prevent this sort of attack by having voters submit a zero-knowledge proof of ballot validity alongside the ballot itself. Zero-knowledge proofs can be computationally expensive though – to keep the costs of voter participation as low as possible, the proof should be (1) efficiently computable client-side and (2) efficiently verifiable on-chain.

To make proving as efficient as possible, we use a bespoke sigma protocol – a zero-knowledge proof designed for a specific algebraic relation, as opposed to a generalized proof system. This enables extremely fast prover times: generating a ballot validity proof in Python takes 14ms on an off-the-shelf laptop. 

Though the verifier for this sigma protocol is conceptually simple, it requires a few large modular exponentiations. Malavolta and Thyagarajan’s linearly-homomorphic scheme uses Paillier encryption, so these exponentiations would be performed modulo N^2 for some RSA modulus N. For a reasonably sized N, the exponentiations are prohibitively expensive (millions of gas) on most EVM chains. To reduce this cost, Cicada instead uses exponential ElGamal – exponential ElGamal still provides additive homomorphism, but works over a much smaller modulus (N instead of N^2).

One downside of using ElGamal is that the last step of decrypting the tally requires brute-forcing a discrete log (note that this is done off-chain and efficiently verified on-chain). As such, it is only suitable if the expected final tally is reasonably small (e.g. less than 2^32, or about 4.3 million votes). In the original Paillier-based scheme, the tally can be efficiently decrypted regardless of its size.

Selecting the RSA modulus N also involves a tradeoff. Our implementation uses a 1024-bit modulus for gas efficiency. While this is well above the largest RSA modulus ever publicly factored (which was 829 bits) it is below the normally recommended size of 2048 bits for use with RSA encryption or signatures. However, we don’t need long-term security in our application: once an election is finished there is no risk if N is factored in the future. The tally and ballots are assumed to become public after the time-lock expires, so it is reasonable to use a relatively small modulus. (This can also be easily updated in the future if factoring algorithms improve.)

Anonymity and voter eligibility

Cicada, as described above, provides running tally privacy – the time-lock puzzle property keeps the tally private for the duration of the vote. However, each individual ballot is also a time-lock puzzle, encrypted under the same public parameters. This means that just as the tally can be decrypted (by performing the requisite computation), so can each individual ballot. In other words, Cicada only guarantees ballot privacy for the duration of the vote – if a curious observer wishes to decrypt a particular voter’s ballot, they can do so. Decrypting any individual ballot is as expensive as decrypting the final tally, so naively it requires O(n) work to fully decrypt a vote with n voters. But all of these ballots can be decrypted in parallel (assuming enough machines), taking the same amount of wall-clock time as it takes to decrypt the final tally.

For some votes this may not be desirable. While we are satisfied with temporary running tally privacy, we may want indefinite ballot privacy. To accomplish this, we can combine Cicada with an anonymous voter eligibility protocol, instantiated by zero-knowledge group membership proofs. This way, even if a ballot is decrypted, all it would reveal is that someone voted that way – which we would already know from the tally.

In our repository we include an example contract that uses Semaphore for voter anonymity. Note, however, that the Cicada contract itself makes no assumptions on how voter eligibility is determined or enforced. In particular, you could replace Semaphore with e.g. Semacaulk or ZK state proofs (as proposed here and here). 

Tallying authorities

One of our priorities in designing Cicada was to avoid the need for tallying authorities: Many private voting constructions require a semi-trusted tallying authority (or committee of authorities, coordinating via secure multi-party computation) who receives and aggregates the ballots. In a blockchain context, this means that these schemes cannot be conducted by a smart contract alone and that some human intervention and trust is needed.

In most constructions, tallying authorities are not trusted for integrity (they cannot manipulate the vote count), but they are trusted for liveness – if they go offline, the final result cannot be computed, indefinitely stalling the outcome of the vote. In some constructions they are also trusted to maintain privacy – that is, they learn how each individual votes but are expected to publish the vote result without revealing this information.

Though tallying authorities are a reasonable (and necessary) assumption in many real-world scenarios, they are not ideal in a blockchains context, where our objective is to minimize trust and ensure censorship resistance. 

***
Cicada explores one of many directions in the field of on-chain voting privacy, and complements much of the ongoing research being done by other teams. As mentioned above, Cicada goes hand-in-hand with anonymous group-membership technologies like Semaphore, ZK storage proofs, and rate-limiting nullifiers. Cicada could also integrate the optimistic proof checker proposed by Nouns Vortex team to reduce the gas burden on voters. 

There are also opportunities to adapt Cicada to support different voting schemes (e.g. token-weighted voting, quadratic voting) – more complex schemes may be too computationally expensive for Ethereum mainnet, but they could be practical on L2s. With that in mind, we welcome your contributions, forks, and suggestions on where to take Cicada next. 

Acknowledgements: Cicada was developed jointly with Joseph Bonneau. Thanks to Andrew Hall for supplying context around the historical context of voting privacy. Thanks also to Robert Hackett for feedback on this post. Special thanks to Stephanie Zinn for editing. 

***

Views expressed in “posts” (including articles, podcasts, videos, and social media) are those of the individuals quoted therein and are not necessarily the views of AH Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z is an investment adviser registered with the U.S. Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party information; a16z has not reviewed such material and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities, digital assets, investment strategies or techniques are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments or investment strategies will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Additionally, this material is provided for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. Investing in pooled investment vehicles and/or digital assets includes many risks not fully discussed herein, including but not limited to, significant volatility, liquidity, technological, and regulatory risks. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures/ for additional important information.

Investing in PLAI Labs 1.19.23

https://a16zcrypto.com/posts/announcement/investing-in-plai-labs/

Within every era of technology, the most talented entrepreneurs have taken unsolved problems and open questions and turned them into businesses.

In the early 2000s, the biggest open questions were social: how do you take a read-only interface and allow people to write to it, making it interactive and personal? In the 2010s, the possibility space opened up as mobile proliferation grew: suddenly, the questions were around what you could do when everyone had a device you could use to communicate, make micropayments, and play games.

Today, we have two new computing technologies reaching production scale: AI and crypto. We’re already seeing some of the most talented entrepreneurs in the world beginning to explore what can be unlocked when you use the two technologies.

That’s why we’re thrilled to announce our investment in PLAI Labs, founded by Chris DeWolfe and Aber Whitcomb, technology veterans who previously were the founders of both the social media platform MySpace and the game studio Jam City.

MySpace, founded in 2003, was one of the first social networks to reach a global audience of millions of users. Jam City is best known for mobile games like Cookie Jam, Harry Potter: Hogwarts Mystery, and Disney Emoji Blitz. With over 30 million monthly active users, billions of downloads, they’re one of the most successful game studios in the industry.

Chris and Aber have been at the forefront of past eras of technology, and now they are laser-focused on the future.

PLAI Labs, Chris and Aber’s new venture, is focused on building the next generation of social platforms leveraging AI and web3. With their shared experience in social media, game design, and technology they are creating a new platform for users to play, talk, battle, trade, and adventure together.

Their first entertainment experience is a game built on the platform, Champions Ascension, a massively multiplayer online role playing game (MMORPG) where players can port in their existing non-fungible token (NFTs) characters, go on quests, trade items, fight in the colosseum, build their own custom dungeons and more. They’re also building an AI protocol platform which will help with everything from user generated content (UGC) to matchmaking to 2D to 3D asset rendering.

We believe that the future of social networks begins with games, and the best games have strong communities, enticing core gameplay, and robust meta-progression systems that together define how players interact and socialize with each other. Web3 games in particular have the opportunity to create the decentralized metaverse of the future with their distributed infrastructure and focus on empowering players through asset ownership. Chris, Aber, and their team are one of the best equipped to go after that opportunity.

We’re excited to support their team going forward, and you can see a preview of what they’re building here: https://www.champions.io/

 

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

A new mining tax on crypto?

https://a16zcrypto.com/posts/article/a-new-mining-tax-on-crypto/

The White House recently proposed a new mining tax. Their 30 percent tax will be imposed not on the mining of minerals but on the mining of cryptocurrencies. But the so-called DAME Tax seems unlikely to accomplish any of the goals that the administration has laid out to justify it. The intention is to reduce any negative impact from cryptocurrency mining on local electricity prices and global pollution. However, while a tax can reduce crypto-mining in the US, it is far from obvious, from an economics perspective, that the other goals will follow. 

Mining taxes can be popular amongst economists. Minerals are a pure gift from the land that no one who has been around for the last 100,000 years was responsible for. While it takes some effort to extract and then ship minerals, some great luck is involved in having the license to mine a particular tract of land. The economic term for the profits that flow from this is called rent. Even if the government imposes a resource rent tax, the land will likely be mined anyway. All that changes is who gets the rent – the miner or the government – which makes the tax less popular amongst miners.

It seems perhaps unfair to compare mineral mining to crypto mining, but the crypto community chose the name for a reason. In Proof of Work blockchains, what legitimizes nodes – which propose and confirm blocks of transactions at any given time – is that they have to demonstrate that they are there to behave well and not obstruct or attack the underlying network. Satoshi Nakamoto went to some considerable effort to outline what might constitute “work” for the Bitcoin blockchain and to design it in such a way that it could scale.

Nakamoto’s notion of work was imposed on everybody who wanted to be a node to help run the network – an entry fee if you will. To get the chance to confirm a block of new transactions, nodes would have to compete to solve a simple computational puzzle. The puzzle was meaningless – no one cared about the answer – but it was integrated neatly into the blockchain itself. To win this computational contest, miners had to find the answer before anyone else. While they couldn’t guarantee that outcome, the more resources – that is, compute – they applied to the problem, the more likely it was that they would win. 

And what did miners get for their trouble? Transaction fees paid by users, for starters. But, more significantly, they also received new bitcoins. Just how much they earned changed over time, but when bitcoins started to be worth some real money, the prize for ten minutes of computational effort became significant. They searched and dug (by offering resources and guessing puzzle answers), and then “extracted” (by receiving tokens if they solved it): mining. 

Like mineral mining, the total amount of resources (compute) devoted to Bitcoin mining was dictated by the value of the outcome. The more bitcoins were worth, the more intense the computational contest. If blocks were minted too quickly each one would process fewer transactions per reward. Thus, to keep mining to an average of 10 minutes per block confirmation, the difficulty of the computational puzzle would adjust. The more compute added to the contest, the more difficult it became, and vice versa.

Given that the contest to mine bitcoins is open to anyone with a computer worldwide, how did miners make money? After all, the outcome of the broader game was driven by free entry. If there were profits to be had, it would pay someone somewhere to devote a processor and their electricity bill to the contest. They wouldn’t win often, but on average, their winnings would cover their costs. Indeed, Nakamoto outlined in his whitepaper a more democratic process. But economically, expected profits would be low, and  not at the more predictable level of diamond mines. 

Consequently, the mining business became big, and miners formed pools to provide a more certain stream of income. The miners also became more sophisticated, evolving from a person with a computer in their basement to big data centers with thousands of specialized ASIC processors devoted to crypto mining. The electricity bill from those data centers grew. Electricity companies (and chip makers) weren’t complaining, though – any more than shovel makers did during the Gold Rush.

One result is that, by some estimates, crypto mining was consuming as much electricity as a small country. And for what, skeptics (some might say cynics) asked? To play a computational game? For cryptocurrency that seemed to some like monopoly money or worse, casino chips? What did the rest of society get from this other than higher electricity bills and increased pollution locally and potentially globally? For the past decade or more, the crypto community – at least those focused on Proof of Work –  hasn’t really had a good answer to that question. 

Yet, despite this, if you were to ask an economist, they would be hard-pressed to condemn crypto mining-based electricity consumption relative to any other electricity consumption. Yes, crypto mining might look like a waste of resources – and if there is one thing economists don’t like, it is wasted resources. Many critiqued Bitcoin, for instance, for using the electricity generated by a reasonably sized country like Sweden. But you know who else is using the electricity generated by a reasonably sized country like Sweden? Sweden. And economists don’t particularly seem to mind Sweden. The point being that people are actually paying for the crypto-mining electricity and seemingly of their own free will. Who are we to judge?

Apparently, many governments are happy to judge. Some, like China, have prohibited crypto mining altogether (albeit for reasons beyond environmental concerns). The Biden proposal, called the DAME (Digital Asset Mining Energy) excise tax, stops short of prohibition but would mark up US crypto miners’ electricity bills by a flat 30 percent. The goals are, ostensibly, to lower electricity prices and, although this seems contradictory, to reduce both local and carbon pollution. 

The tax  is not expected to be a big revenue generator – just a few billion over the next decade – because mining electricity bills aren’t actually that large; and also, crypto mining is globally competitive. Raise costs by that much and, unlike minerals, crypto miners can relocate to anywhere with an internet or satellite connection. 

Therein lies the issue. If the goal were to reduce what was regarded as sinful waste (in the same way one might tax tobacco to reduce health problems), it is unlikely to happen at a global level. Crypto mining exists in the U.S. because it is cheaper to mine there than anywhere else on the planet. If the tax causes some of those mines to shut down and others to open elsewhere, there would be more, not less, waste produced. 

But it is even worse than that. It is far from obvious how this change will actually reduce global pollution. Reducing local pollution in the U.S. may be possible, but that pollution will follow the miners somewhere else, so this is what we would call a “beggar-my-neighbour” outcome. As the name suggests, it’s kind of selfish. Moreover, the U.S. government’s massive climate policy that passed last year involved billions of dollars of investment in renewables and innovations to make energy production less climate-damaging. (Not to mention that many Proof-of-Work cryptocurrency miners have been locating their efforts in areas where there is latent capacity, or mixing in more renewable energy.) 

The DAME tax will cause some of the users of that cleaner energy to go elsewhere. Frankly, it is highly unlikely they will end up in a cleaner place.

Indeed, this seems counter to the moves by some in the proof-of-work crypto industry to try and promote more clean energy. While I am personally skeptical about some of the ways advocates claim this might be done, if mining demand, as a large potential user of electricity in a region, can underwrite new investment in renewable generation, then this is a way that approach to crypto may spur cleaner energy in the long run. Such schemes are being proposed – and the DAME tax may threaten them. If you take a renewable project’s biggest customer and say that 30 percent of the bill must go to the government, much of the cost will likely come off the bottom line of the renewable electricity provider. That is not an investment we would want to deter. 

The point here is that the DAME tax targets crypto mining for reasons that would otherwise apply to many electricity users – including those not involved in mining cryptocurrency. Given that mining is globally competitive, it is unlikely to move the needle on the environment and could, in fact, subvert it. A better approach would be to tax miners who rely on non-renewable electricity generation. But that sounds like what it is: a carbon tax. And some in the U.S. government are loath to impose such a measure, even though it would undoubtedly help the environment.

***

Joshua Gans is a Professor of Strategic Management at the Rotman School of Management, University of Toronto and Chief Economist of its Creative Destruction Lab. His forthcoming book is The Economics of Blockchain Consensus (to be published by Palgrave/Macmillan in July). Joshua Gans is not associated with a16z, nor has he referred to any work or persons at a16z. All views are his. 

***

Views expressed in “posts” (including articles, podcasts, videos, and social media) are those of the individuals quoted therein and are not necessarily the views of AH Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z is an investment adviser registered with the U.S. Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party information; a16z has not reviewed such material and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities, digital assets, investment strategies or techniques are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments or investment strategies will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Additionally, this material is provided for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. Investing in pooled investment vehicles and/or digital assets includes many risks not fully discussed herein, including but not limited to, significant volatility, liquidity, technological, and regulatory risks. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures/ for additional important information.

Of data availability & danksharding

https://a16zcrypto.com/posts/podcast/data-availability-danksharding-web3-with-a16z-podcast/

Robert: Hi everyone, Robert Hackett back here again with another episode for web3 with a16z. I recently chatted live with some of our researchers on the topic of Data Availability Sampling and Danksharding – which is relevant to blockchain scaling, as well as paving the way for more advanced blockchain networks & user applications. While much of the discussion is especially relevant to Ethereum, some of the concepts we cover are also applicable to advances in computing and networking generally. 

This discussion involves some specific math, which we cover (and briefly explain) in the episode.

For quick context as you listen, “polynomial commitments”, which you’ll hear more about, are a tool that helps reduce the amount of data needed to verify complex computations. And “interpolation”, another term you’ll be hearing, is a way to reconstruct data from a limited set of data points.

Be sure to check out the paper referenced in this episode for a deeper explanation at a16zcrypto.com/das – that’s DAS for data availability sampling. As always, none of the following is investment, business, legal or tax advice. See a16z.com/disclosures for more important information, including a link to a list of our investments.

Robert: Okay, so today we’re going to be talking about data availability sampling and danksharding. Now, if you’re unfamiliar with those things, don’t be scared because I’m here with a couple of experts who are gonna break things down for you. We’ve got Dan Boneh, Stanford professor, eminent cryptographer, and a16z crypto research advisor, and Lera Nikolaenko, research partner at a16z crypto.

Lera: Great to be here. Thanks, Robert. 

Dan: Thanks, Robert. Great to be here. 

Robert: Thanks, Lera. Thanks, Dan. So, as mentioned, Dan and Lera recently wrote up a great post and have a proposal related to ways to improve protodanksharding, which is this upgrade that is planned for later this year in Ethereum. There is a lot of rich detail in that post, so we’ll dig in there.

But before we get into all the details, I thought maybe we could zoom out and back up and start with a bigger picture here. So let’s start with Dan. Help us understand the subject of today’s talk, data availability sampling. And maybe you could do it without breaking anyone’s minds here.

Maybe we could keep it as you know, succinct and accessible to a broader audience as you might be able to. 

Dan: Sure, so maybe we can zoom out a little bit before we talk about data availability sampling. Maybe let’s talk about the general goal here. So this is part of the efforts to scale Ethereum.

And so one of the ways to scale Ethereum is using rollups. And rollups actually need to push a lot of data on chain. So a rollup allows you to take, say, a hundred or a thousand transactions and process them all as a single transaction on the Ethereum Layer 1. And then all the data associated with those transactions basically gets pushed on chain.

And as a result, there’s a need to actually store quite a lot of data on chain. And the question is how to do that. And that’s exactly where danksharding comes into play. So it’s a really beautiful, beautiful idea. It came from the Ethereum Foundation, particularly from Dankrad Feist. It’s a really elegant construction and basically Lera and I wanted to give an overview of how this construction works and potentially look at some options in which it may be, can be improved a little bit.

Lera: Yeah. Rollups essentially allow execution to scale for Ethereum, but the question is, How do you make the data scale? And danksharding/data availability sampling basically adds this missing piece to Ethereum that will allow it to achieve full scaling. This post is kind of quite technical. Some aspects we don’t explore, like networking, which is also interesting to research and to write about, but we are mainly focusing on the cryptographic aspects of danksharding.

Dan: And I would actually even add that there’s really beautiful open questions that are still left to think about for researchers if you are interested in this area. There are beautiful questions around coding and polynomial commitments. There are really quite interesting questions. If they could be resolved, we would end up with even more efficient systems.

Robert: That’s excellent context. I definitely want to ask you about those open areas for potential inquiry in a little bit. But before we do that, let’s talk more about this proposal that would free up space on the Ethereum blockchain. So the blockchain is basically this giant record of transactions happening from the time that this system launched way back when to the Genesis block. How are developers thinking about freeing up space, making it get greater throughput, scalability, cheaper transactions, all of these things that sound great and would make the system much more usable. How do you actually get there? What is required to get these efficiency gains? 

Lera: I think the main challenge is that you cannot just ask validators to store more data. It won’t get you too far. If you want to scale the blockchain to increase the block size by several orders of magnitude, you have to split your block and disperse it to validators so that each validator only stores some fragment of the block. There’s where the ideas from error correcting and erasure coding comes in that allows you to do that.

So basically increasing the block size without putting too much burden on the validators. I would say that’s the main technical difficulty. That’s what danksharding is solving.

Robert: You mentioned erasure coding, and it sounds like that’s a key piece of this technology that enables it to work. Maybe you could provide some more detail there. What is erasure coding? How does it work, and how does it apply in this context? 

Lera: Sure, absolutely. So you basically take a block with users’ data and you expand it. You erasure code it, turn a smaller block into a larger block, and this larger block can tolerate some omissions from it. So you can lose some portion of the block.

In the case of danksharding, you can lose 25% of the block and still be able to reconstruct these missing pieces from what you have. So when you disperse this expanded block to the validators – and the validators go down because some of them are byzantine or faulty – you can still reconstruct from those validators and that’s okay if they go down and they lose their pieces, the rest of the validators who are still online and still honest can recover those missing pieces. And that’s why the properties of erasure coding are useful here, just to substitute for byzantine or faulty validators. 

Dan: And maybe we can even explain it a bit more with an analogy. It’s like, you know, if you’re watching a Netflix movie and say, you know, the Netflix servers are sending packets to your computer and you are watching the movie, well imagine 10% of the packets actually don’t make it through and your computer only gets to see 90% of the packets.

So, normally you would start to see all sorts of lossy video and degradation in performance. With erasure coding, what happens is if the movie is coded using an erasure code, even if only 90% of the packets get through the laptop has enough information to actually reconstruct the entire movie.

What’s interesting is erasure coding is used everywhere, like communication networks wouldn’t be able to work without erasure coding. And maybe again, just another example, when you have a deep space probe, it’s sending messages back to earth. There’s a lot of noise and a lot of them get either dropped or they get garbled, and yet on Earth we’re able to recover the signal and get those crisp images from Mars.

That actually also is done using a slightly stronger technique called error correcting codes, where we’re not only losing packets, but also we are recovering from packets being distorted, being changed in value. What’s interesting in the context of the blockchain is that all the data is being signed. So we actually don’t care so much about data corruption because the signature layer will detect data corruption.

So really all we care about, the only thing that a malicious node can do – someone who’s trying to prevent the data from being reconstructed – the only thing that that node can do is, in some sense, remove pieces that make up the data. So we don’t care so much about recovering from corruption of the data because that’s taken care of by the signature.

But we do worry quite a lot about pieces of the data simply missing, and as a result, maybe whoever’s trying to reconstruct the data is not able to do it. And so that’s exactly where erasure coding comes in, where we know that the only thing that can happen is that a certain piece of the data is missing. It can’t be garbled. So if we got it, we got the right one, and that’s because of the signature. But if it’s missing, we have to somehow recover. And that’s exactly as Lera was saying, that’s the idea of erasure coding. 

The way you do that is basically you take your original data – in the case of Ethereum, you would take a block and you would expand it a little bit.

Actually, you expand it by like a factor of four to get more data in the block, so that the data is now redundant in the block, and now you break it into little pieces. Now you can ask every validator, “Oh, you know, you don’t have to store the entire block – you only have to store this small piece of the block.”

And if enough validators do their job and store those little pieces – and when we need to reconstruct the block, they send those pieces back to us – if enough pieces get back, then we are able to reconstruct the entire block. In danksharding in particular – again, it’s a beautiful, beautiful proposal – the recovery rate is 75%. So if 75% of the validators respond and we’re able to recover 75% of the pieces, then we’re able to reconstruct the entire block. That’s kind of the core mechanism that’s used in danksharding.

Robert: That’s really useful. So it sounds like erasure coding is this kind of technique that enables you to apply some redundancy and backups so that you don’t lose all of the data so that you can still sort of assemble it, get access to it, see that it exists.

Dan, you mentioned that the reason for doing this data availability sampling is to prevent bad actors from doing certain things, like getting rid of some of the data. What exactly are we defending against here?

Dan: Yeah, that’s a great place to go next. So in fact, what happens is, with these proposals for solving the data problem on Ethereum, there’s gonna be a new transaction type that’s going to be introduced.

This is called a “blob-carrying transaction”, and what it would allow folks to do is basically embed blobs. Each blob is 128 kilobytes. So you embed blobs into blocks. So normally blocks are made up of transactions to do all sorts of things to the Ethereum state. So now, in addition to just the regular transactions that we know and love, there’s also going to be a blob-carrying transaction or a few blob-carrying transactions per block and as I said, each one is gonna be 128 kilobytes.

Now the long-term plan, and actually maybe Lera can talk about the transition to how we get there, but the long-term plan is that there could be quite a few data-carrying blobs in every block, which would actually make the block quite large, right? I mean, each blob is 128 kilobytes.

If you put a bunch of those together, you could end up, actually, you will end up with blocks that are 30 megabytes – and it’s just not reasonable to ask validators to store these huge blocks. Today, the blocks are only a hundred kilobytes or so, so these would be much, much larger blocks. And so the idea is to basically break up these large blocks into little pieces.

Every validator, every node will actually store only this little piece. And now the problem is, what happens if they say that they stored the piece, but in fact they didn’t? Right? What do we do then? And that’s exactly where data availability sampling comes in, which is a very efficient way to test at block creation time that, in fact, everybody received their pieces, everybody currently has their pieces, and currently the block can be reconstructed even though it was broken into many, many small pieces. And these pieces are stored, distributed across the network.

Robert: And just before Lera takes over, I just want to make sure that I understand something here. So you said blocks today are about 100 kilobytes. And the idea is that, after all these upgrades, they’re going to be about 30 megabytes. And part of the reason for that expansion of the block size is to accommodate this new type of data, this blob data, that has a different purpose than what you usually shove into blocks, which is just pure transaction data.

And this blob data is really related to helping these off-chain Layer 2 networks store some data in a more ephemeral manner. Did I get that right?

Dan: Yeah, I’m glad you reiterated that cause that’s important. So these blobs basically are gonna be used by these rollups, right? So the rollups have to store the rollup data.

And so instead today what they do is they store it as what’s called “call data”, which is somewhat expensive and not what call data was meant for. So instead, they’re gonna store this data as blobs in blocks. And what’s interesting is these blobs are actually not going to be available to the execution layer of Ethereum.

They’re just gonna be stored as blobs in blocks. The execution layer will only see hashes of these big blobs. They won’t be able to access individual bytes or elements in the blobs, which is the new mechanism. So today, the way this is stored, is, as we said, in call data, and call data, is all available to the execution layer of Ethereum.

So because of this simplification, storing blob data is going to be a lot cheaper than [storing] call data. So in principle, the goal of this is to reduce the costs of Layer 2 systems, right? Because today they have to pay quite a lot to store all their data as call data. In the future, once danksharding is deployed, or even once protodanksharding is deployed, the costs will be a lot lower.

So L2 systems will become much, much cheaper and easier to use.

Robert: That’s great. I love that we’re having a sophisticated conversation about cryptography and very technical software, and yet we keep using the word “blob”. It reminds me of the ‘80s sci-fi movie Attack of the Blob. But Lera, yeah, maybe you could chime in now and talk about this trend and how we get from where we are now to that vision in the future of expanded block sizes.

Lera: Yeah, for sure. Before I dive into that, just to add two comments to what Dan said, I want to mention that it’s important the blobs are going to expire and the validators don’t give any guarantee that they’re gonna be storing these blobs forever. Right now the expiry is set roughly to 30 to 60 days, that’s what the Ethereum Foundation is thinking about.

So after this period, during which you will have an opportunity to download all of these blobs and store it locally. You know, the network will drop them, but the commitments to these blobs will persist. And if you need, so if you have the blobs themselves, you can always resupply them to the execution layer with call data.

So as long as you store the blobs, you can prove that those are the correct blobs that you have by resupplying them, because the chain continues storing the hashes of those blobs, the commitments. I also want to mention another important thing is that the fee market for those blobs is gonna be different.

So there’s going to be another fee market. They’re gonna be priced a little differently. So Ethereum is gonna have these two pipes. One pipe, if it gets congested, you are paying larger fees, say for execution, and if a data pipe gets congested, you’re paying larger fees for storing the data. So we don’t yet know how expensive the storage is gonna be. The intuition is that it must be less expensive than call data today. But again, we have to experiment to see exactly how much cheaper it’s gonna be. And protodanksharding is actually, it’s a step towards full danksharding but it’s like an experiment that we are gonna carry out to see how validators are gonna handle this additional load and how expensive the fees are gonna be for storing those data blobs.

So on the way to danksharding, we’re gonna do this experiment with protodanksharding. In protodanksharding basically, you don’t apply any erasure coding or error correcting. All you do is you add this special transaction type that carries data blobs. And those blobs are gonna have an expiry and that’s it.

So the block size is gonna be increased by a little bit in Ethereum. So right now, as Dan was saying, it’s around 100 kilobytes. With protodanksharding it’s gonna be around 500 kilobytes or so. So it’s not that big of an increase, but we’re still gonna test like all of the hypotheses that hopefully will check out and Ethereum will continue moving to full danksharding.

Robert: So Lera, you said a number of things in there. I wanna make sure that all those points came across. You mentioned that the blob data is gonna have a different fee market. So is the right way to think about this, like you have different highway systems, or perhaps you have a lane on a highway where like you have an E-ZPass or something and maybe it’s cheaper for someone in this HOV, E-ZPass-style lane? You get lower fees to put this kind of data on the blockchain versus someone who’s just a regular commuter having to pay a toll. I know I’m kind of mixing some metaphors here, but I’m wondering if that’s a physical analogy to describe these differences in fee markets for different types of data. 

Lera: Yeah, I would say that it’s hard to imagine this new data’s fees being more expensive than what we currently have, so the hypothesis is it’s always gonna be cheaper to put your data in those blobs if you don’t care about accessing this data from the execution layer. So it’s as if opening another lane on your highway, that will just increase traffic, but for certain types of transactions, you’ll go into this lane and for those transactions that are kind of data heavy and for transactions that are execution heavy, you’ll continue going through the main lanes, if that makes sense.

Robert: Yes, it does. And it sounds like one of the reasons why it can be cheaper is because it has this expiration date, as you mentioned. I think you said that the current idea is that’s gonna last 30 to 60 days for this blob data, at which point it would simply vanish and there would just be a trace remaining – a sort of commitment as you described.

Lera: Yes, exactly. 

Dan: Maybe to add to that, basically what happens is when you submit transactions, there’s a fee market, as Lera was saying, in that if there are lots of transactions being submitted all at once, say there’s a big NFT mint, and everybody wants to issue transactions, then of course the price per transaction immediately goes up.

Well, blob data is gonna be a parallel fee market, so presumably there are fewer people submitting blobs than there are people submitting transactions. So hopefully there will be less congestion over blobs. But in principle, it could happen. Hopefully not, but it could happen that all of a sudden, for some reason, there’s huge congestion over blobs and then the fee market for blobs will actually go up.

But in principle, again, because there’s less demand for submitting blobs than there is for submitting transactions, the hope is that the cost for blobs will be lower than the cost for transactions. 

Robert: Okay. That’s really helpful to understand as well. So we’ve laid out a sort of timeline for these updates. Perhaps, people listening in, maybe you’re familiar with “The Merge”, which happened last year in the fall, this big Ethereum upgrade that basically eliminated the environmental impact of Ethereum in terms of its energy consumption. And now we’re entering this new period of upgrades, which Vitalik [Buterin], co-creator of Ethereum, has termed “The Surge”, meaning that all of a sudden, these updates are going to enable the blockchain to scale a lot more powerfully than it’s been able to before. So part of this update, one of the first ones, is protodanksharding. It’s happening later this year. 

What happens in between that upgrade and full on danksharding? What are the differences between the two and when do we get that fully formed vision of danksharding in the future?

Lera: That’s a great question. I think there are still some research problems to figure out along the way, especially around networking because when those validators are storing only fragments of the blob, they need to help each other reconstruct those fragments. If some validator falls asleep for a while, when it wakes up, it wants other validators to help it reconstruct its missing fragments.

So it’s quite an involved networking protocol that is still in the making for danksharding and there are other exciting research problems that potentially can improve the scheme, but I think so far it looks like quite a clear path to get from protodanksharding to danksharding. And the question is just maybe how to make it better, how to improve different aspects of it, make it more efficient.

Dan: Maybe it’s worthwhile adding that in the protodanksharding approach there are at most four blobs per block, which is why each blob is 128 kilobytes. Four times 128 gives us half a megabyte, which is why that’s the limit on block sizes in protodanksharding, and that’s actually imminent. That’s supposed to happen later this year.

And then, yeah, going all the way to danksharding still takes some work. In fact, there was a very big event recently with generating parameters jointly for danksharding. And so there’s a lot of work to do in preparation of getting to full danksharding.

Lera: Yeah, that’s quite exciting. I think they’re still accepting contributions to participate in the trusted setup ceremony. Yeah, so it’s still ongoing. It’s a large community effort that’s really fun to watch and people come up with lots of creative ideas to contribute. So check this out, definitely.

Robert: That’s super cool. I actually participated in some of the trusted setup ceremonies for some of the early privacy coins. Is this trusted setup ceremony as ostentatious as some of those, where you had people burning laptops and exploding hard drives and things like that?

Lera: From what I’ve seen, it’s just people coming up with different creative ways to generate entropy. Some use their pets, dogs and cats. Some are creating some sophisticated marble run machines and such. There was even one contribution done from a satellite, the highest altitude contribution there.

Dan: Actually we have to mention that. It was pretty cool actually. This company Cryptosat actually has satellites in orbit that sample noise up in space and then contribute and participate in the trusted setup protocol.

So that was pretty cool to see. 

Robert: Wow, that is awesome. Did not know there was crypto in space just yet. Dan, you said that protodanksharding is gonna have four blobs per block. What is danksharding gonna enable? How many blobs are we talking here? 

Dan: Yeah, so by the way, protodanksharding is up to four blobs per block.

They’re actually targeting two, but up to four. And then danksharding, as we said, they’re targeting at most 30 megabyte blocks. So just divide 30 megabytes by 128 kilobytes. And that tells you it’s on the order of a hundred, I guess it’s about a hundred blobs per block.

Lera: Yeah, I think the current plan is 128 blobs target and maybe up to 256 blobs per one block.

Robert: Great. 

Dan: And that’s exactly when the blocks become quite large. And then it becomes a little difficult for the validators to keep the entire blocks themselves. And that’s where we have to start breaking them up into pieces. And each validator will store one piece. 

Robert: Got it. I appreciate the basic division. Maybe some of the more technical math that you get into in your post would be a little bit harder to convey here, but that makes sense. Maybe we could talk a little bit about some of the proposals that you made in your recent post. So you did this research and you found that with some adjustments, you could potentially get even more efficiency out of EIP-4844, which is the more technical name for protodanksharding.

Lera: Yeah, the bulk of the work was just to understand the danksharding proposal in full, and then we observed some different ways to look into the mathematics – the cryptographic components of it – that hopefully unravel new toolkits that we can use there. So the rough idea, not going too much in depth, is you fit a polynomial through your block.

It’s a bivariate polynomial because your block is rectangular. And then you evaluate this polynomial at more points, kind of expanding the block. And the idea is that if you’ve used these bivariate polynomials – instead of as danksharding was doing it, as a list of multivariate polynomials – you can then apply techniques for bivariate evaluation, bivariate interpolation, and possibly even try to apply bivariate error-correcting codes to that. But it’s very much an open door for more research and exploration. So in this post we try to explain where more research can be done to improve the scheme in that direction.

Dan: Yeah. Well maybe I can add to that. I mean, danksharding is such a beautiful idea. Really. The Ethereum Foundation deserves a huge amount of credit for it. And, you know, Dankrad [Feist] in particular. It’s really quite an elegant construction.

Initially, we were just trying to understand the details and it took some time to recover exactly all the details of how everything works. And we figured maybe it’ll help the world to have another writeup that explains how the mechanism works. Initially, I guess the original danksharding, the way it’s described, is all using univariate polynomial commitments, where we take each row, we view it as a polynomial. Erasure coding is all done using polynomials.

Maybe I can even teach a little bit of erasure coding here in one sentence. Suppose you have two points in a plane and you want to do erasure coding on them. What you can do is you can just pass a line through those two points, and now you can just publish, instead of just those two points, you can publish those two points plus maybe two additional points on the line.

So now you have four points total. And you know that if two out of those four points make it to the receiver, the receiver can use the two points that it received to recover the line and then recover the original two points. That’s the whole idea of erasure coding. So of course, instead of lines, we use higher degree polynomials to achieve different thresholds, but that’s the idea.

Basically, we have two points. We pass a line. We get more points on the line. If only two points of the line make it to the receiver, the receiver can reconstruct the line and recover the original points. And so danksharding actually does this really, really quite elegantly by looking at the block as a matrix, as a rectangular set of data.

And then it basically extends, using these line ideas, both horizontally and vertically. And that actually gives the coded block – where then, pieces of that rectangle are sent to the different validators. What’s interesting is now that you have a rectangle, you can think of it as a two-dimensional object. That very naturally leads to thinking about this as a bivariate polynomial, exactly as Lera was saying.

What danksharding does is it gives a very interesting way to commit to these bivariate polynomials. So it builds a way to commit to bivariate polynomials by using commitments to univariate polynomials. And so it turns out that the reconstruction mechanism that was done in danksharding is also based on construction along lines and columns – also construction as done using univariate polynomials. And then as we were working through this, there was this realization that, hey, everything here really is about rectangles and bivariate polynomials. Maybe there’s a way in which the reconstruction can also be done by using interpolation of bivariate polynomials. 

Lera: So in fact, you take your block and you expand it to X by the factor of two in both directions. So you have 4X more points as a result. But those 4X points only encode a small quadrant.

So in principle, you only need one quadrant in order to interpolate and recover all the rest of this encoded block. So 25% of the points should be enough. But as danksharding works by doing univariate interpolations, it needs 75% of the block to do column and row-wise reconstruction.

If you do bivariate interpolation directly, you should be good with just 25% instead of 75%. So that will improve the number of elements you need in order to reconstruct, to recover the block. It will also improve communication and data availability sampling, but it’s all due to improved reconstruction.

And now the question, it becomes kind of a mathematical question of how do you do efficient bivariate interpolation? 

There is an obvious need for a better algorithm there. And we were researching this a little bit and it appears so far to be a little bit underexplored. So maybe there were no applications for bivariate interpolation before. Maybe it’s just a hard problem, or we don’t know. But that’s definitely an interesting direction to basically try and improve bilinear interpolation algorithms.

Dan: So what I love about this is just like Lera was saying, for the more algorithms folks in the audience, is that there’s been a ton of work on doing univariate interpolation. If I give you points on a univariate polynomial, like points on a line, and I ask you to reconstruct that polynomial, there are very good algorithms for univariate polynomial interpolation.

And it turns out the bivariate interpolation problem, somehow it seems like it received less attention. And what’s really cool here is all of a sudden, the blockchain, Ethereum, danksharding is creating an application for this really natural algorithmic problem of bivariate polynomial interpolation. 

We really need it here. If we had a good algorithm for bivariate polynomial interpolation, we could make danksharding better because reconstruction, as Lera was saying, would go down from 75% to 25%. So to me, this is really quite beautiful in that the blockchain, Ethereum, danksharding is creating this new area of research, or at least prioritizing this new area of research showing, we really need better algorithms, new algorithms, efficient algorithms to do bivariate polynomial interpolation.

So hopefully that will encourage and spur a lot more research on these types of algorithms and presumably they will be very useful for this problem. 

Robert: So I love that we dug into the methodology here and didn’t shy away from the mathematics. I know some of it might sound a little complicated – bivariate polynomials and univariate polynomials. But I especially appreciate how you described these in terms of geometry and shapes, cause I think everybody here can really picture a line going through some points or how a rectangle functions. So I think that really helps ground the kind of work that you’re doing.

I want to double click on this statistic that you mentioned where the current proposal as it exists would require 75% of samples in order to be reconstructed, whereas what you’re proposing would trim that down to 25%. So that’s a giant difference. 75% to 25%. But it also sounds like just for a casual observer, if you’re only having 25%, the fact that it’s just less than 50%, it sounds like, is that really enough to make assurances that this data is available and was made available?

When you go down to 25% it sounds like, I don’t know, you might be cutting some corners. So how do you assure people that in fact, just having 25% of data samples is actually enough and that things can work at that level? 

Lera: Yeah, that gets us to the topic of data availability sampling, and what it achieves, I guess, because this reconstruction threshold – 75% or 25% – basically determines how many samples you need to get high assurance that the data is there.

The way you do the sample is that you ask the network of validators to give you back one element, a random element of this encoded block. And if you get it back successfully – and you can verify also once the validators give you back the proof of the validity of this piece, and you can verify it against the commitments that the chain persistently stores – so when you get back the successful sample that makes you sure that the data is available with probability that is one minus either one quarter or three quarters, depending on how your reconstruction algorithm works, whether it requires 25% of the data or 75% of the data. 

So every time you do a random sample and it comes back successfully, basically your false positive [rate] – the probability that you think the data is available when it’s not – drops down exponentially. And the rate at which it drops down depends on how much data you are required in the reconstruction. So if your reconstruction requests 25% of the data, you do less samples and your assurance – the false positive rate – goes down quicker than if you only have a reconstruction algorithm that requires 75% of the data. 

So depending on how efficient your reconstruction is, you might need fewer samples in order to have the same assurance that the data is available. So that’s why you not only improve the reconstruction here, but you also improve the number of samples you need to do for your data availability sampling.

And data availability sampling is interesting because it’s probabilistic. So the more samples you do, the higher your assurance is that the data is available, right? And you can always amplify this probability by doing more samples to make it clear. 

Dan: I think Lera, what you just explained is really, really important.

That’s the heart of danksharding and data availability sampling. So I’ll say it one more time, just so that the audience will hear it twice, cause that’s really the heart of it. So maybe think of the block. We said the block is gonna be encoded as this rectangle, right? So somehow we go from a block to a rectangle.

The specifics of how that is done is using this erasure coding. But let’s pretend we went from a block of data to a rectangle. So now imagine this rectangle is literally a rectangle of dots. Every dot corresponds to one piece of the data that’s gonna be distributed to one validator. So now to do data availability sampling, somebody wants to verify that enough of the dots are actually available.

We know that if more than 75% of the dots are available, then the block can be reconstructed using the erasure coding method. Or maybe, if what we’re saying would be used, then only 25% of the dots are sufficient to reconstruct the original rectangle. But how do you know that 75% of the dots are available?

So that’s exactly the data availability sampling mechanism. What you do is you can kind of imagine you are throwing darts at this rectangle, right? So every time you throw a dart, you hit a random point in the rectangle and then the validator that holds that point has to prove, “Yes, I really have that point.”

Now you wanna verify that 75% of the dots are available. So imagine you have this rectangle. Maybe only 75% of the dots are there. Some of the dots disappeared for some reason. You wanna verify that 75% of the dots are available, cause if 75% are available, you can reconstruct the entire rectangle. And so what are you gonna do to verify that 75% are there?

You’re gonna throw a bunch of darts at the rectangle. And for every time the dart hits a dot, the validator that it hits has to prove that the dot really is there. And so if you throw a hundred darts and all hundred darts come back saying, “Yes, the data really is there”, that gives you a pretty good idea that more than 75% of the data is available.

Because you know, if less than 75% is available and you throw four darts, you expect one dart to hit a missing dot. If you throw a hundred darts, and all of them come back saying the data’s available, you have pretty good assurance that more than 75% of the dots are there. And so that’s the idea of data availability sampling.

You just try lots and lots and lots of random points, like a hundred of them. If all of them are there, then you have pretty good assurance that more than 75% are available and you can reconstruct the data. And you see, if you want to get 75% assurance, you need to throw a hundred darts. If you need only 25% assurance, you would need to throw fewer darts than that.

So that would basically reduce the amount of data that’s needed to satisfy the sampling mechanism. Now, maybe it’s worthwhile saying that once the data availability sampling check succeeds – so all the hundred darts come back saying, “Yes, the dots really are there” – then the validator says, “Ah, the data’s available”.

And then [the validator] goes ahead and signs the block as saying, “This block passed data availability sampling”. Yeah, and that’s used later on in consensus. So that’s what the test is. Basically, data availability sampling is a very efficient way to test that enough data is available to reconstruct the block without actually reconstructing the block completely.

So I think it’s good to hear it twice and maybe even a third and a fourth time. But that’s kind of the core idea that makes this all work. 

Robert: I love that, and I especially love the physical analogies that you’re using with a dart board and throwing darts. I think that really brings it home for people. So we’re nearing the top of the hour here.

I wanna get ready to close out. But before we do that, just maybe I’ll throw it to you both in one sentence. What is the upshot of all of this? Like why does it matter to get these efficiency gains? 

Lera: Well, I would say the ultimate goal, of course, is to scale the blockchain and these new techniques would allow it to do that, for Ethereum to achieve full scaling.

That’s a really interesting approach, I would say, because in the beginning, Ethereum was thinking about doing sharding, full sharding, and arriving at quite complicated designs. But having rollups helps scale the execution layer of Ethereum, and this leaves it up to Ethereum to scale its data availability layer. Basically, increase the space while rollups increase the execution capacity and those pieces together will give us cheaper and faster blockchains.

Dan: Yeah, Lera said it perfectly. I mean, really the scaling story for Ethereum is rollups and the way to get rollups to be more efficient and cheaper to use is by solving the data problem. And danksharding is a very elegant and efficient way to do that. So the upshot is a scalable version of Ethereum where rollups are much cheaper to use than they are today.

Robert: That’s great. And if you get cheaper transactions and the ability to make more transactions, I think that opens up Ethereum to do all sorts of new applications that weren’t possible before with high gas costs and fees. So this has been great. Thank you all for joining us. I hope that you all learned a little bit about data availability sampling and danksharding.

If you’d like to learn more, as mentioned, you can check out Dan and Lera’s post. It’s a really, really great piece, so I highly recommend reading it and checking out all the resources contained in there.

Thank you all for joining us. I’m looking forward to this weekend. I’m gonna go to my local bar and teach everybody about erasure coding at the dartboard. So thank you all and take care.

Lera: Sounds great. Thank you.

Dan: Thank you. This has been fun. Bye bye.

Thank you for listening to web3 with a16z. You can find show notes with links to resources, books, or papers discussed; transcripts, and more at a16zcrypto.com. This episode was technically edited by our audio editor Justin Golden. Credit also to Moonshot Design for the art. And all thanks to support from a16z crypto.

To follow more of our work and get updates, resources from us, and from others, be sure to subscribe to our web3 weekly newsletter — you can find it on our website at a16zcrypto.com.

 Thank you for listening, and for subscribing. Let’s go!

Vampire attacks: Why “blood sucking” platform competition could be good for consumers

https://a16zcrypto.com/content/article/vampire-attacks-competition/

In a “vampire attack,” a platform tries to directly incentivize its competitor’s customers to switch. This isn’t a new idea — airlines have had “status match” for ages — but it’s much easier in the world of crypto, where transactions are stored publicly on blockchains. With public transaction histories, a new entrant can just simply read its competitors’ transaction records and provide direct rewards or other incentives to top customers who switch.

What does this mean for consumers? The short version is that vampire attacks can result in virtuous platform competition that drives down prices. In short, “blood-sucking platform competition” can be good for consumers.

To arrive at this conclusion, in a new paper, we built a simple model in which platforms have some customers who are “captive,” and others who are mobile across platforms — and price-sensitive. In the real world, customers might be captive because of, for instance, search costs or lack of information about alternatives.

When platforms are only able to offer a single price to both types of consumers, the price curve declines in the share of the market that’s mobile, denoted by λ in the figure below. This makes sense, since the more of those consumers there are, the more firms want to compete for them.

Next, we consider what happens when firms introduce loyalty programs as a way of convincing mobile customers to stay put. Consumers who are part of a platform’s loyalty program receive a discount at that platform but have to pay the standard price at a competitor’s.

It turns out that loyalty programs are actually bad for all consumers — even the ones who get the discount. Why? First, firms now charge their captive consumers more, since they can protect their mobile consumers with a loyalty program. The resulting captive consumer price is the dashed green line in the figure below.

But if the platforms are monopolizing their captive customers, that means they have to drop their price a lot — and take a big loss — if they want to attract their competitors’ loyal customers… which reduces the likelihood that a platform will try to steal its competitor’s loyal customers. As a result, firms can charge even their loyal customers more; this is the solid green line in the figure below.

But when each firm can see who its competitors’ loyal customers are, vampire attacks are possible — that is, each firm can offer a competitor’s loyal customers the same price it’s giving its own loyal customers. This in turn leads to much more competition over those customers, driving the price they face down far below even the price before loyalty programs were introduced. This is the solid red line in the figure below.

In short: Vampire attacks restore competition for loyal customers.

This theory broadly accords with what we’ve seen among NFT trading platforms, with vampire attacks leading to intense competition for customer loyalty. A caveat, however, is that it’s not clear how sustainable the current vampire attack models are, especially since the legality and regulatory statuses of the strategies that many of them rely on are uncertain, and the relative success of such attacks may be different in scenarios where regulatory clarity and consistent enforcement exists.

Moreover, as Shai Bernstein and one of us (Kominers) discussed in a recent Harvard Business Review article, the competition induced by vampire attacks can only benefit consumers if it doesn’t incentivize firms to engage in malbehavior (such as with centralized crypto finance platforms taking on excessive risk in order to offer higher rates of return). 

That said, overall, our analysis suggests that vampire attacks may contribute to increased — and hopefully virtuous — platform competition. This has the potential to reshape the digital platform landscape in a way that’s better for everyone.

*   *   *

P.S. Any ideas for alternate names for the “vampire attack” concept? The name is evocative, but doesn’t on its face sound like something you’d necessarily want to help enable. 

P.P.S. There is in fact an economics literature on that other type of vampire attack.

This article is an adaptation of a recent Twitter thread summarizing our new economic theory paper, “A Simple Theory of Vampire Attacks.”

John William Hatfield is the Arthur Andersen & Co. Alumni Centennial Professor in Finance at the McCombs School of Business at the University of Texas at Austin

Scott Duke Kominers is a Professor of Business Administration at Harvard Business School, a Faculty Affiliate of the Harvard Department of Economics, and a Research Partner a16z crypto. He also advises a number of companies on marketplace and incentive design; for further disclosures, see his website.

Special thanks to our editor, Tim Sullivan, as well as Shai Bernstein, Josh Bobrowsky, Christian Catalini, Sonal Chokshi, Lauren Cohen, Chris Dixon, Piotr Dworczak, Jad Esber, Joshua Gans, Zachary Gray, Hanna Halaburda, Mason Hall, Miles Jennings, Michele Korver, Sriram Krishnan, Eddy Lazzrin, Collin McCune, Kim Milosevich, Vivek Ravishanker, Natalia Rigol, April Roth, Ben Roth, Tim Roughgarden, Bryan Routledge, Scott Walker, Carra Wu, Ali Yahya, the Lab for Economic Design, and presentation audiences at the American Economic Association meetings, Harvard Business School, and a16z crypto for helpful comments.

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not necessarily the views of a16z or its affiliates. a16z is an investment adviser registered with the U.S. Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party information; a16z has not reviewed such material and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities, digital assets, investment strategies or techniques are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments or investment strategies will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Additionally, this material is provided for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. Investing in pooled investment vehicles and/or digital assets includes many risks not fully discussed herein, including but not limited to, significant volatility, liquidity, technological, and regulatory risks. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

Of data availability & danksharding

https://a16zcrypto.com/content/podcast/data-availability-danksharding-web3-with-a16z-podcast/

Robert: Hi everyone, Robert Hackett back here again with another episode for web3 with a16z. I recently chatted live with some of our researchers on the topic of Data Availability Sampling and Danksharding – which is relevant to blockchain scaling, as well as paving the way for more advanced blockchain networks & user applications. While much of the discussion is especially relevant to Ethereum, some of the concepts we cover are also applicable to advances in computing and networking generally. 

This discussion involves some specific math, which we cover (and briefly explain) in the episode.

For quick context as you listen, “polynomial commitments”, which you’ll hear more about, are a tool that helps reduce the amount of data needed to verify complex computations. And “interpolation”, another term you’ll be hearing, is a way to reconstruct data from a limited set of data points.

Be sure to check out the paper referenced in this episode for a deeper explanation at a16zcrypto.com/das – that’s DAS for data availability sampling. As always, none of the following is investment, business, legal or tax advice. See a16z.com/disclosures for more important information, including a link to a list of our investments.

Robert: Okay, so today we’re going to be talking about data availability sampling and danksharding. Now, if you’re unfamiliar with those things, don’t be scared because I’m here with a couple of experts who are gonna break things down for you. We’ve got Dan Boneh, Stanford professor, eminent cryptographer, and a16z crypto research advisor, and Lera Nikolaenko, research partner at a16z crypto.

Lera: Great to be here. Thanks, Robert. 

Dan: Thanks, Robert. Great to be here. 

Robert: Thanks, Lera. Thanks, Dan. So, as mentioned, Dan and Lera recently wrote up a great post and have a proposal related to ways to improve protodanksharding, which is this upgrade that is planned for later this year in Ethereum. There is a lot of rich detail in that post, so we’ll dig in there.

But before we get into all the details, I thought maybe we could zoom out and back up and start with a bigger picture here. So let’s start with Dan. Help us understand the subject of today’s talk, data availability sampling. And maybe you could do it without breaking anyone’s minds here.

Maybe we could keep it as you know, succinct and accessible to a broader audience as you might be able to. 

Dan: Sure, so maybe we can zoom out a little bit before we talk about data availability sampling. Maybe let’s talk about the general goal here. So this is part of the efforts to scale Ethereum.

And so one of the ways to scale Ethereum is using rollups. And rollups actually need to push a lot of data on chain. So a rollup allows you to take, say, a hundred or a thousand transactions and process them all as a single transaction on the Ethereum Layer 1. And then all the data associated with those transactions basically gets pushed on chain.

And as a result, there’s a need to actually store quite a lot of data on chain. And the question is how to do that. And that’s exactly where danksharding comes into play. So it’s a really beautiful, beautiful idea. It came from the Ethereum Foundation, particularly from Dankrad Feist. It’s a really elegant construction and basically Lera and I wanted to give an overview of how this construction works and potentially look at some options in which it may be, can be improved a little bit.

Lera: Yeah. Rollups essentially allow execution to scale for Ethereum, but the question is, How do you make the data scale? And danksharding/data availability sampling basically adds this missing piece to Ethereum that will allow it to achieve full scaling. This post is kind of quite technical. Some aspects we don’t explore, like networking, which is also interesting to research and to write about, but we are mainly focusing on the cryptographic aspects of danksharding.

Dan: And I would actually even add that there’s really beautiful open questions that are still left to think about for researchers if you are interested in this area. There are beautiful questions around coding and polynomial commitments. There are really quite interesting questions. If they could be resolved, we would end up with even more efficient systems.

Robert: That’s excellent context. I definitely want to ask you about those open areas for potential inquiry in a little bit. But before we do that, let’s talk more about this proposal that would free up space on the Ethereum blockchain. So the blockchain is basically this giant record of transactions happening from the time that this system launched way back when to the Genesis block. How are developers thinking about freeing up space, making it get greater throughput, scalability, cheaper transactions, all of these things that sound great and would make the system much more usable. How do you actually get there? What is required to get these efficiency gains? 

Lera: I think the main challenge is that you cannot just ask validators to store more data. It won’t get you too far. If you want to scale the blockchain to increase the block size by several orders of magnitude, you have to split your block and disperse it to validators so that each validator only stores some fragment of the block. There’s where the ideas from error correcting and erasure coding comes in that allows you to do that.

So basically increasing the block size without putting too much burden on the validators. I would say that’s the main technical difficulty. That’s what danksharding is solving.

Robert: You mentioned erasure coding, and it sounds like that’s a key piece of this technology that enables it to work. Maybe you could provide some more detail there. What is erasure coding? How does it work, and how does it apply in this context? 

Lera: Sure, absolutely. So you basically take a block with users’ data and you expand it. You erasure code it, turn a smaller block into a larger block, and this larger block can tolerate some omissions from it. So you can lose some portion of the block.

In the case of danksharding, you can lose 25% of the block and still be able to reconstruct these missing pieces from what you have. So when you disperse this expanded block to the validators – and the validators go down because some of them are byzantine or faulty – you can still reconstruct from those validators and that’s okay if they go down and they lose their pieces, the rest of the validators who are still online and still honest can recover those missing pieces. And that’s why the properties of erasure coding are useful here, just to substitute for byzantine or faulty validators. 

Dan: And maybe we can even explain it a bit more with an analogy. It’s like, you know, if you’re watching a Netflix movie and say, you know, the Netflix servers are sending packets to your computer and you are watching the movie, well imagine 10% of the packets actually don’t make it through and your computer only gets to see 90% of the packets.

So, normally you would start to see all sorts of lossy video and degradation in performance. With erasure coding, what happens is if the movie is coded using an erasure code, even if only 90% of the packets get through the laptop has enough information to actually reconstruct the entire movie.

What’s interesting is erasure coding is used everywhere, like communication networks wouldn’t be able to work without erasure coding. And maybe again, just another example, when you have a deep space probe, it’s sending messages back to earth. There’s a lot of noise and a lot of them get either dropped or they get garbled, and yet on Earth we’re able to recover the signal and get those crisp images from Mars.

That actually also is done using a slightly stronger technique called error correcting codes, where we’re not only losing packets, but also we are recovering from packets being distorted, being changed in value. What’s interesting in the context of the blockchain is that all the data is being signed. So we actually don’t care so much about data corruption because the signature layer will detect data corruption.

So really all we care about, the only thing that a malicious node can do – someone who’s trying to prevent the data from being reconstructed – the only thing that that node can do is, in some sense, remove pieces that make up the data. So we don’t care so much about recovering from corruption of the data because that’s taken care of by the signature.

But we do worry quite a lot about pieces of the data simply missing, and as a result, maybe whoever’s trying to reconstruct the data is not able to do it. And so that’s exactly where erasure coding comes in, where we know that the only thing that can happen is that a certain piece of the data is missing. It can’t be garbled. So if we got it, we got the right one, and that’s because of the signature. But if it’s missing, we have to somehow recover. And that’s exactly as Lera was saying, that’s the idea of erasure coding. 

The way you do that is basically you take your original data – in the case of Ethereum, you would take a block and you would expand it a little bit.

Actually, you expand it by like a factor of four to get more data in the block, so that the data is now redundant in the block, and now you break it into little pieces. Now you can ask every validator, “Oh, you know, you don’t have to store the entire block – you only have to store this small piece of the block.”

And if enough validators do their job and store those little pieces – and when we need to reconstruct the block, they send those pieces back to us – if enough pieces get back, then we are able to reconstruct the entire block. In danksharding in particular – again, it’s a beautiful, beautiful proposal – the recovery rate is 75%. So if 75% of the validators respond and we’re able to recover 75% of the pieces, then we’re able to reconstruct the entire block. That’s kind of the core mechanism that’s used in danksharding.

Robert: That’s really useful. So it sounds like erasure coding is this kind of technique that enables you to apply some redundancy and backups so that you don’t lose all of the data so that you can still sort of assemble it, get access to it, see that it exists.

Dan, you mentioned that the reason for doing this data availability sampling is to prevent bad actors from doing certain things, like getting rid of some of the data. What exactly are we defending against here?

Dan: Yeah, that’s a great place to go next. So in fact, what happens is, with these proposals for solving the data problem on Ethereum, there’s gonna be a new transaction type that’s going to be introduced.

This is called a “blob-carrying transaction”, and what it would allow folks to do is basically embed blobs. Each blob is 128 kilobytes. So you embed blobs into blocks. So normally blocks are made up of transactions to do all sorts of things to the Ethereum state. So now, in addition to just the regular transactions that we know and love, there’s also going to be a blob-carrying transaction or a few blob-carrying transactions per block and as I said, each one is gonna be 128 kilobytes.

Now the long-term plan, and actually maybe Lera can talk about the transition to how we get there, but the long-term plan is that there could be quite a few data-carrying blobs in every block, which would actually make the block quite large, right? I mean, each blob is 128 kilobytes.

If you put a bunch of those together, you could end up, actually, you will end up with blocks that are 30 megabytes – and it’s just not reasonable to ask validators to store these huge blocks. Today, the blocks are only a hundred kilobytes or so, so these would be much, much larger blocks. And so the idea is to basically break up these large blocks into little pieces.

Every validator, every node will actually store only this little piece. And now the problem is, what happens if they say that they stored the piece, but in fact they didn’t? Right? What do we do then? And that’s exactly where data availability sampling comes in, which is a very efficient way to test at block creation time that, in fact, everybody received their pieces, everybody currently has their pieces, and currently the block can be reconstructed even though it was broken into many, many small pieces. And these pieces are stored, distributed across the network.

Robert: And just before Lera takes over, I just want to make sure that I understand something here. So you said blocks today are about 100 kilobytes. And the idea is that, after all these upgrades, they’re going to be about 30 megabytes. And part of the reason for that expansion of the block size is to accommodate this new type of data, this blob data, that has a different purpose than what you usually shove into blocks, which is just pure transaction data.

And this blob data is really related to helping these off-chain Layer 2 networks store some data in a more ephemeral manner. Did I get that right?

Dan: Yeah, I’m glad you reiterated that cause that’s important. So these blobs basically are gonna be used by these rollups, right? So the rollups have to store the rollup data.

And so instead today what they do is they store it as what’s called “call data”, which is somewhat expensive and not what call data was meant for. So instead, they’re gonna store this data as blobs in blocks. And what’s interesting is these blobs are actually not going to be available to the execution layer of Ethereum.

They’re just gonna be stored as blobs in blocks. The execution layer will only see hashes of these big blobs. They won’t be able to access individual bytes or elements in the blobs, which is the new mechanism. So today, the way this is stored, is, as we said, in call data, and call data, is all available to the execution layer of Ethereum.

So because of this simplification, storing blob data is going to be a lot cheaper than [storing] call data. So in principle, the goal of this is to reduce the costs of Layer 2 systems, right? Because today they have to pay quite a lot to store all their data as call data. In the future, once danksharding is deployed, or even once protodanksharding is deployed, the costs will be a lot lower.

So L2 systems will become much, much cheaper and easier to use.

Robert: That’s great. I love that we’re having a sophisticated conversation about cryptography and very technical software, and yet we keep using the word “blob”. It reminds me of the ‘80s sci-fi movie Attack of the Blob. But Lera, yeah, maybe you could chime in now and talk about this trend and how we get from where we are now to that vision in the future of expanded block sizes.

Lera: Yeah, for sure. Before I dive into that, just to add two comments to what Dan said, I want to mention that it’s important the blobs are going to expire and the validators don’t give any guarantee that they’re gonna be storing these blobs forever. Right now the expiry is set roughly to 30 to 60 days, that’s what the Ethereum Foundation is thinking about.

So after this period, during which you will have an opportunity to download all of these blobs and store it locally. You know, the network will drop them, but the commitments to these blobs will persist. And if you need, so if you have the blobs themselves, you can always resupply them to the execution layer with call data.

So as long as you store the blobs, you can prove that those are the correct blobs that you have by resupplying them, because the chain continues storing the hashes of those blobs, the commitments. I also want to mention another important thing is that the fee market for those blobs is gonna be different.

So there’s going to be another fee market. They’re gonna be priced a little differently. So Ethereum is gonna have these two pipes. One pipe, if it gets congested, you are paying larger fees, say for execution, and if a data pipe gets congested, you’re paying larger fees for storing the data. So we don’t yet know how expensive the storage is gonna be. The intuition is that it must be less expensive than call data today. But again, we have to experiment to see exactly how much cheaper it’s gonna be. And protodanksharding is actually, it’s a step towards full danksharding but it’s like an experiment that we are gonna carry out to see how validators are gonna handle this additional load and how expensive the fees are gonna be for storing those data blobs.

So on the way to danksharding, we’re gonna do this experiment with protodanksharding. In protodanksharding basically, you don’t apply any erasure coding or error correcting. All you do is you add this special transaction type that carries data blobs. And those blobs are gonna have an expiry and that’s it.

So the block size is gonna be increased by a little bit in Ethereum. So right now, as Dan was saying, it’s around 100 kilobytes. With protodanksharding it’s gonna be around 500 kilobytes or so. So it’s not that big of an increase, but we’re still gonna test like all of the hypotheses that hopefully will check out and Ethereum will continue moving to full danksharding.

Robert: So Lera, you said a number of things in there. I wanna make sure that all those points came across. You mentioned that the blob data is gonna have a different fee market. So is the right way to think about this, like you have different highway systems, or perhaps you have a lane on a highway where like you have an E-ZPass or something and maybe it’s cheaper for someone in this HOV, E-ZPass-style lane? You get lower fees to put this kind of data on the blockchain versus someone who’s just a regular commuter having to pay a toll. I know I’m kind of mixing some metaphors here, but I’m wondering if that’s a physical analogy to describe these differences in fee markets for different types of data. 

Lera: Yeah, I would say that it’s hard to imagine this new data’s fees being more expensive than what we currently have, so the hypothesis is it’s always gonna be cheaper to put your data in those blobs if you don’t care about accessing this data from the execution layer. So it’s as if opening another lane on your highway, that will just increase traffic, but for certain types of transactions, you’ll go into this lane and for those transactions that are kind of data heavy and for transactions that are execution heavy, you’ll continue going through the main lanes, if that makes sense.

Robert: Yes, it does. And it sounds like one of the reasons why it can be cheaper is because it has this expiration date, as you mentioned. I think you said that the current idea is that’s gonna last 30 to 60 days for this blob data, at which point it would simply vanish and there would just be a trace remaining – a sort of commitment as you described.

Lera: Yes, exactly. 

Dan: Maybe to add to that, basically what happens is when you submit transactions, there’s a fee market, as Lera was saying, in that if there are lots of transactions being submitted all at once, say there’s a big NFT mint, and everybody wants to issue transactions, then of course the price per transaction immediately goes up.

Well, blob data is gonna be a parallel fee market, so presumably there are fewer people submitting blobs than there are people submitting transactions. So hopefully there will be less congestion over blobs. But in principle, it could happen. Hopefully not, but it could happen that all of a sudden, for some reason, there’s huge congestion over blobs and then the fee market for blobs will actually go up.

But in principle, again, because there’s less demand for submitting blobs than there is for submitting transactions, the hope is that the cost for blobs will be lower than the cost for transactions. 

Robert: Okay. That’s really helpful to understand as well. So we’ve laid out a sort of timeline for these updates. Perhaps, people listening in, maybe you’re familiar with “The Merge”, which happened last year in the fall, this big Ethereum upgrade that basically eliminated the environmental impact of Ethereum in terms of its energy consumption. And now we’re entering this new period of upgrades, which Vitalik [Buterin], co-creator of Ethereum, has termed “The Surge”, meaning that all of a sudden, these updates are going to enable the blockchain to scale a lot more powerfully than it’s been able to before. So part of this update, one of the first ones, is protodanksharding. It’s happening later this year. 

What happens in between that upgrade and full on danksharding? What are the differences between the two and when do we get that fully formed vision of danksharding in the future?

Lera: That’s a great question. I think there are still some research problems to figure out along the way, especially around networking because when those validators are storing only fragments of the blob, they need to help each other reconstruct those fragments. If some validator falls asleep for a while, when it wakes up, it wants other validators to help it reconstruct its missing fragments.

So it’s quite an involved networking protocol that is still in the making for danksharding and there are other exciting research problems that potentially can improve the scheme, but I think so far it looks like quite a clear path to get from protodanksharding to danksharding. And the question is just maybe how to make it better, how to improve different aspects of it, make it more efficient.

Dan: Maybe it’s worthwhile adding that in the protodanksharding approach there are at most four blobs per block, which is why each blob is 128 kilobytes. Four times 128 gives us half a megabyte, which is why that’s the limit on block sizes in protodanksharding, and that’s actually imminent. That’s supposed to happen later this year.

And then, yeah, going all the way to danksharding still takes some work. In fact, there was a very big event recently with generating parameters jointly for danksharding. And so there’s a lot of work to do in preparation of getting to full danksharding.

Lera: Yeah, that’s quite exciting. I think they’re still accepting contributions to participate in the trusted setup ceremony. Yeah, so it’s still ongoing. It’s a large community effort that’s really fun to watch and people come up with lots of creative ideas to contribute. So check this out, definitely.

Robert: That’s super cool. I actually participated in some of the trusted setup ceremonies for some of the early privacy coins. Is this trusted setup ceremony as ostentatious as some of those, where you had people burning laptops and exploding hard drives and things like that?

Lera: From what I’ve seen, it’s just people coming up with different creative ways to generate entropy. Some use their pets, dogs and cats. Some are creating some sophisticated marble run machines and such. There was even one contribution done from a satellite, the highest altitude contribution there.

Dan: Actually we have to mention that. It was pretty cool actually. This company Cryptosat actually has satellites in orbit that sample noise up in space and then contribute and participate in the trusted setup protocol.

So that was pretty cool to see. 

Robert: Wow, that is awesome. Did not know there was crypto in space just yet. Dan, you said that protodanksharding is gonna have four blobs per block. What is danksharding gonna enable? How many blobs are we talking here? 

Dan: Yeah, so by the way, protodanksharding is up to four blobs per block.

They’re actually targeting two, but up to four. And then danksharding, as we said, they’re targeting at most 30 megabyte blocks. So just divide 30 megabytes by 128 kilobytes. And that tells you it’s on the order of a hundred, I guess it’s about a hundred blobs per block.

Lera: Yeah, I think the current plan is 128 blobs target and maybe up to 256 blobs per one block.

Robert: Great. 

Dan: And that’s exactly when the blocks become quite large. And then it becomes a little difficult for the validators to keep the entire blocks themselves. And that’s where we have to start breaking them up into pieces. And each validator will store one piece. 

Robert: Got it. I appreciate the basic division. Maybe some of the more technical math that you get into in your post would be a little bit harder to convey here, but that makes sense. Maybe we could talk a little bit about some of the proposals that you made in your recent post. So you did this research and you found that with some adjustments, you could potentially get even more efficiency out of EIP-4844, which is the more technical name for protodanksharding.

Lera: Yeah, the bulk of the work was just to understand the danksharding proposal in full, and then we observed some different ways to look into the mathematics – the cryptographic components of it – that hopefully unravel new toolkits that we can use there. So the rough idea, not going too much in depth, is you fit a polynomial through your block.

It’s a bivariate polynomial because your block is rectangular. And then you evaluate this polynomial at more points, kind of expanding the block. And the idea is that if you’ve used these bivariate polynomials – instead of as danksharding was doing it, as a list of multivariate polynomials – you can then apply techniques for bivariate evaluation, bivariate interpolation, and possibly even try to apply bivariate error-correcting codes to that. But it’s very much an open door for more research and exploration. So in this post we try to explain where more research can be done to improve the scheme in that direction.

Dan: Yeah. Well maybe I can add to that. I mean, danksharding is such a beautiful idea. Really. The Ethereum Foundation deserves a huge amount of credit for it. And, you know, Dankrad [Feist] in particular. It’s really quite an elegant construction.

Initially, we were just trying to understand the details and it took some time to recover exactly all the details of how everything works. And we figured maybe it’ll help the world to have another writeup that explains how the mechanism works. Initially, I guess the original danksharding, the way it’s described, is all using univariate polynomial commitments, where we take each row, we view it as a polynomial. Erasure coding is all done using polynomials.

Maybe I can even teach a little bit of erasure coding here in one sentence. Suppose you have two points in a plane and you want to do erasure coding on them. What you can do is you can just pass a line through those two points, and now you can just publish, instead of just those two points, you can publish those two points plus maybe two additional points on the line.

So now you have four points total. And you know that if two out of those four points make it to the receiver, the receiver can use the two points that it received to recover the line and then recover the original two points. That’s the whole idea of erasure coding. So of course, instead of lines, we use higher degree polynomials to achieve different thresholds, but that’s the idea.

Basically, we have two points. We pass a line. We get more points on the line. If only two points of the line make it to the receiver, the receiver can reconstruct the line and recover the original points. And so danksharding actually does this really, really quite elegantly by looking at the block as a matrix, as a rectangular set of data.

And then it basically extends, using these line ideas, both horizontally and vertically. And that actually gives the coded block – where then, pieces of that rectangle are sent to the different validators. What’s interesting is now that you have a rectangle, you can think of it as a two-dimensional object. That very naturally leads to thinking about this as a bivariate polynomial, exactly as Lera was saying.

What danksharding does is it gives a very interesting way to commit to these bivariate polynomials. So it builds a way to commit to bivariate polynomials by using commitments to univariate polynomials. And so it turns out that the reconstruction mechanism that was done in danksharding is also based on construction along lines and columns – also construction as done using univariate polynomials. And then as we were working through this, there was this realization that, hey, everything here really is about rectangles and bivariate polynomials. Maybe there’s a way in which the reconstruction can also be done by using interpolation of bivariate polynomials. 

Lera: So in fact, you take your block and you expand it to X by the factor of two in both directions. So you have 4X more points as a result. But those 4X points only encode a small quadrant.

So in principle, you only need one quadrant in order to interpolate and recover all the rest of this encoded block. So 25% of the points should be enough. But as danksharding works by doing univariate interpolations, it needs 75% of the block to do column and row-wise reconstruction.

If you do bivariate interpolation directly, you should be good with just 25% instead of 75%. So that will improve the number of elements you need in order to reconstruct, to recover the block. It will also improve communication and data availability sampling, but it’s all due to improved reconstruction.

And now the question, it becomes kind of a mathematical question of how do you do efficient bivariate interpolation? 

There is an obvious need for a better algorithm there. And we were researching this a little bit and it appears so far to be a little bit underexplored. So maybe there were no applications for bivariate interpolation before. Maybe it’s just a hard problem, or we don’t know. But that’s definitely an interesting direction to basically try and improve bilinear interpolation algorithms.

Dan: So what I love about this is just like Lera was saying, for the more algorithms folks in the audience, is that there’s been a ton of work on doing univariate interpolation. If I give you points on a univariate polynomial, like points on a line, and I ask you to reconstruct that polynomial, there are very good algorithms for univariate polynomial interpolation.

And it turns out the bivariate interpolation problem, somehow it seems like it received less attention. And what’s really cool here is all of a sudden, the blockchain, Ethereum, danksharding is creating an application for this really natural algorithmic problem of bivariate polynomial interpolation. 

We really need it here. If we had a good algorithm for bivariate polynomial interpolation, we could make danksharding better because reconstruction, as Lera was saying, would go down from 75% to 25%. So to me, this is really quite beautiful in that the blockchain, Ethereum, danksharding is creating this new area of research, or at least prioritizing this new area of research showing, we really need better algorithms, new algorithms, efficient algorithms to do bivariate polynomial interpolation.

So hopefully that will encourage and spur a lot more research on these types of algorithms and presumably they will be very useful for this problem. 

Robert: So I love that we dug into the methodology here and didn’t shy away from the mathematics. I know some of it might sound a little complicated – bivariate polynomials and univariate polynomials. But I especially appreciate how you described these in terms of geometry and shapes, cause I think everybody here can really picture a line going through some points or how a rectangle functions. So I think that really helps ground the kind of work that you’re doing.

I want to double click on this statistic that you mentioned where the current proposal as it exists would require 75% of samples in order to be reconstructed, whereas what you’re proposing would trim that down to 25%. So that’s a giant difference. 75% to 25%. But it also sounds like just for a casual observer, if you’re only having 25%, the fact that it’s just less than 50%, it sounds like, is that really enough to make assurances that this data is available and was made available?

When you go down to 25% it sounds like, I don’t know, you might be cutting some corners. So how do you assure people that in fact, just having 25% of data samples is actually enough and that things can work at that level? 

Lera: Yeah, that gets us to the topic of data availability sampling, and what it achieves, I guess, because this reconstruction threshold – 75% or 25% – basically determines how many samples you need to get high assurance that the data is there.

The way you do the sample is that you ask the network of validators to give you back one element, a random element of this encoded block. And if you get it back successfully – and you can verify also once the validators give you back the proof of the validity of this piece, and you can verify it against the commitments that the chain persistently stores – so when you get back the successful sample that makes you sure that the data is available with probability that is one minus either one quarter or three quarters, depending on how your reconstruction algorithm works, whether it requires 25% of the data or 75% of the data. 

So every time you do a random sample and it comes back successfully, basically your false positive [rate] – the probability that you think the data is available when it’s not – drops down exponentially. And the rate at which it drops down depends on how much data you are required in the reconstruction. So if your reconstruction requests 25% of the data, you do less samples and your assurance – the false positive rate – goes down quicker than if you only have a reconstruction algorithm that requires 75% of the data. 

So depending on how efficient your reconstruction is, you might need fewer samples in order to have the same assurance that the data is available. So that’s why you not only improve the reconstruction here, but you also improve the number of samples you need to do for your data availability sampling.

And data availability sampling is interesting because it’s probabilistic. So the more samples you do, the higher your assurance is that the data is available, right? And you can always amplify this probability by doing more samples to make it clear. 

Dan: I think Lera, what you just explained is really, really important.

That’s the heart of danksharding and data availability sampling. So I’ll say it one more time, just so that the audience will hear it twice, cause that’s really the heart of it. So maybe think of the block. We said the block is gonna be encoded as this rectangle, right? So somehow we go from a block to a rectangle.

The specifics of how that is done is using this erasure coding. But let’s pretend we went from a block of data to a rectangle. So now imagine this rectangle is literally a rectangle of dots. Every dot corresponds to one piece of the data that’s gonna be distributed to one validator. So now to do data availability sampling, somebody wants to verify that enough of the dots are actually available.

We know that if more than 75% of the dots are available, then the block can be reconstructed using the erasure coding method. Or maybe, if what we’re saying would be used, then only 25% of the dots are sufficient to reconstruct the original rectangle. But how do you know that 75% of the dots are available?

So that’s exactly the data availability sampling mechanism. What you do is you can kind of imagine you are throwing darts at this rectangle, right? So every time you throw a dart, you hit a random point in the rectangle and then the validator that holds that point has to prove, “Yes, I really have that point.”

Now you wanna verify that 75% of the dots are available. So imagine you have this rectangle. Maybe only 75% of the dots are there. Some of the dots disappeared for some reason. You wanna verify that 75% of the dots are available, cause if 75% are available, you can reconstruct the entire rectangle. And so what are you gonna do to verify that 75% are there?

You’re gonna throw a bunch of darts at the rectangle. And for every time the dart hits a dot, the validator that it hits has to prove that the dot really is there. And so if you throw a hundred darts and all hundred darts come back saying, “Yes, the data really is there”, that gives you a pretty good idea that more than 75% of the data is available.

Because you know, if less than 75% is available and you throw four darts, you expect one dart to hit a missing dot. If you throw a hundred darts, and all of them come back saying the data’s available, you have pretty good assurance that more than 75% of the dots are there. And so that’s the idea of data availability sampling.

You just try lots and lots and lots of random points, like a hundred of them. If all of them are there, then you have pretty good assurance that more than 75% are available and you can reconstruct the data. And you see, if you want to get 75% assurance, you need to throw a hundred darts. If you need only 25% assurance, you would need to throw fewer darts than that.

So that would basically reduce the amount of data that’s needed to satisfy the sampling mechanism. Now, maybe it’s worthwhile saying that once the data availability sampling check succeeds – so all the hundred darts come back saying, “Yes, the dots really are there” – then the validator says, “Ah, the data’s available”.

And then [the validator] goes ahead and signs the block as saying, “This block passed data availability sampling”. Yeah, and that’s used later on in consensus. So that’s what the test is. Basically, data availability sampling is a very efficient way to test that enough data is available to reconstruct the block without actually reconstructing the block completely.

So I think it’s good to hear it twice and maybe even a third and a fourth time. But that’s kind of the core idea that makes this all work. 

Robert: I love that, and I especially love the physical analogies that you’re using with a dart board and throwing darts. I think that really brings it home for people. So we’re nearing the top of the hour here.

I wanna get ready to close out. But before we do that, just maybe I’ll throw it to you both in one sentence. What is the upshot of all of this? Like why does it matter to get these efficiency gains? 

Lera: Well, I would say the ultimate goal, of course, is to scale the blockchain and these new techniques would allow it to do that, for Ethereum to achieve full scaling.

That’s a really interesting approach, I would say, because in the beginning, Ethereum was thinking about doing sharding, full sharding, and arriving at quite complicated designs. But having rollups helps scale the execution layer of Ethereum, and this leaves it up to Ethereum to scale its data availability layer. Basically, increase the space while rollups increase the execution capacity and those pieces together will give us cheaper and faster blockchains.

Dan: Yeah, Lera said it perfectly. I mean, really the scaling story for Ethereum is rollups and the way to get rollups to be more efficient and cheaper to use is by solving the data problem. And danksharding is a very elegant and efficient way to do that. So the upshot is a scalable version of Ethereum where rollups are much cheaper to use than they are today.

Robert: That’s great. And if you get cheaper transactions and the ability to make more transactions, I think that opens up Ethereum to do all sorts of new applications that weren’t possible before with high gas costs and fees. So this has been great. Thank you all for joining us. I hope that you all learned a little bit about data availability sampling and danksharding.

If you’d like to learn more, as mentioned, you can check out Dan and Lera’s post. It’s a really, really great piece, so I highly recommend reading it and checking out all the resources contained in there.

Thank you all for joining us. I’m looking forward to this weekend. I’m gonna go to my local bar and teach everybody about erasure coding at the dartboard. So thank you all and take care.

Lera: Sounds great. Thank you.

Dan: Thank you. This has been fun. Bye bye.

Thank you for listening to web3 with a16z. You can find show notes with links to resources, books, or papers discussed; transcripts, and more at a16zcrypto.com. This episode was technically edited by our audio editor Justin Golden. Credit also to Moonshot Design for the art. And all thanks to support from a16z crypto.

To follow more of our work and get updates, resources from us, and from others, be sure to subscribe to our web3 weekly newsletter — you can find it on our website at a16zcrypto.com.

 Thank you for listening, and for subscribing. Let’s go!

How to build in web3: New talks from Crypto Startup School ’23

https://a16zcrypto.com/content/videos/how-to-build-in-web3-new-talks-from-crypto-startup-school-23/

Web3 technologies have made rapid progress in recent years, with exciting research, maturing infrastructure, and new protocols now in the wild. But all of the industry-specific know-how, emerging tools, and rapidly evolving trends can be difficult for founders to navigate. Many are excited by the promise of web3, but aren’t sure how to get started or what to explore next – whether they’re new to crypto or more seasoned web3 founders.

That’s why we launched Crypto Startup School in 2020 – and recently relaunched it this year – with the goal of helping builders get started on new crypto projects. From go-to-market, community, and product-market fit to technical deep dives and the latest research, we’ve covered a wide range of topics relevant to all kinds of builders in the space.  

And now we’re publicly sharing the curriculum – 30+ new talks – with all of you. To start, here are five videos covering protocol design, go-to-market strategies for pricing, the basics of cryptography for blockchains, and more. We’ll share new talks in near-weekly “drops”, so be sure to subscribe to our channel and newsletter for the latest updates, resources, posts, and more. 

We hope that sharing these will continue to accelerate learning (we got so much wonderful feedback from so many of you on the 2020 talks, which you can watch here) and spark ideas as we all continue to grow this important ecosystem. 

Watch the first five talks

  1. Web 3 and how we got here, with Chris Dixon (Founder, Managing Partner, a16z crypto)
  2. Protocol design: why and how, with Eddy Lazzarin (CTO, a16z crypto)
  3. Cryptography for blockchains: Avoiding common mistakes, with Dan Boneh (Professor in Applied Cryptography and Computer Security, Stanford University, and a16z crypto Senior Advisor)
  4. Web3 pricing and business models, with Maggie Hsu (Partner and Go-to-market Lead, a16z crypto) and Jason Rosenthal (Operating Partner, a16z crypto)
  5. A conversation with Mary-Catherine Lader (COO, Uniswap Labs), moderated by Sonal Chokshi (Editor-in-Chief, a16z crypto)

Web3 and how we got here

Founder and General Partner Chris Dixon shares a sweeping origin story of the internet as we know it, from first protocols to monolithic social networks, and the potential of a truly open and decentralized web. So how did we get here? What were the compromises we made along the way? And how do we lay a new path toward a better internet?

Protocol design: Why and how

The internet is, and has been since its inception, a network of protocols – formal systems for interaction that facilitate complex group behaviors. Protocols are the foundation of web3, but we are still early in our journey. With tools and technologies (and the words and taxonomies we use describe them) still emerging, how can builders create decentralized protocols that last? In this talk, a16z crypto CTO Eddy Lazzarin shares mental models for reasoning about protocol design, along with guidelines for builders for designing economically sustainable protocols that resist centralization.

Cryptography for blockchains: Avoiding common mistakes

Cryptography underpins everything we do, in crypto and beyond – but not everyone has taken a cryptography course. So in this talk, Dan Boneh, a leading researcher in applied cryptography and a16z senior advisor, starts from the beginning, and focuses on key concepts web3 builders need to understand. Boneh defines cryptographic primitives, shares how to apply them correctly to blockchains, and, along the way, traces the evolution of blockchain cryptography, as exciting use cases emerge and new and improved techniques rise to the occasion.

Workshop: Web3 pricing and business models



a16z crypto’s Jason Rosenthal and Maggie Hsu share decades of experience coaching web2 and web3 companies of all sizes. Their Q&A-driven workshop explores practical frameworks, examples, and heuristics for web3 startups thinking through common questions about pricing; and shares popular pricing models, common pitfalls to avoid, and case studies relevant to any builder wondering how to approach their business model.

Leadership talk with Uniswap COO Mary-Catherine Lader



Uniswap Labs COO Mary-Catherine (MC) Lader joins us for an overview on everything she’s learned while leading Ethereum’s largest decentralized crypto exchange. Moderated by a16z Editor and Chief Sonal Chokshi, the conversation tours Lader’s top-of-mind advice, from web3 vs web2 go-to-market strategies, to the unique nuances, challenges, and opportunities of building critical infrastructure in public.

Crypto news & regulatory update: April 7, 2023 – April 28, 2023

https://a16zcrypto.com/content/article/crypto-news-regulatory-update-april-7-2023-april-28-2023/

Editor”s note: The a16z crypto Regulatory Update is a series that highlights the latest regulatory and policy happenings relevant to builders in web3 and crypto, as tracked and curated by the a16z crypto regulatory team. The roundups are based on recent news, the latest updates, new guidance, ongoing legislation, and frameworks released by regulatory agencies/bodies, industry consortia and professional associations, banks, governments, and other entities as they impact the crypto industry (or applications) around the world. We also occasionally include select other resources such as talks, posts, or other commentary – from us or from others – with the updates.

 

🧠 tl;dr

  1. The SEC charged Bittrex and its co-founder and former CEO William Shihara with operating an unregistered national securities exchange, broker, and clearing agency, while also alleging that the following tokens are securities: OMG, DASH, ALGO, TKN, NGC, and IHT.
  2. SEC Chair Gary Gensler testified before the U.S. House of Representatives Committee on Financial Services.
  3. The European Parliament passed the Markets in Crypto Assets (MiCA) regulation, making the European Union the first major market to regulate crypto asset transfers.

🏜️ Arizona

  • Arizona Governor Katie Hobbs vetoed legislation that would have prevented municipalities from taxing residential cryptocurrency mining operations.

🏞️ Arkansas

  • The Arkansas House of Representatives and Senate approved a bill to regulate bitcoin mining in Arkansas. The legislation will go into effect if the governor signs it.

🐻 California

  • The California Department of Financial Protection and Innovation issued desist and refrain orders against five entities that solicited funds from investors by falsely claiming to offer high yield investment programs that use artificial intelligence to trade crypto assets.

🌽 Commodity Futures Trading Commission

  • The CFTC filed a civil enforcement action against a New York resident, alleging that he fraudulently solicited retail investments in a digital asset trading fund and misappropriated at least $1 million in assets. The Department of Justice filed a parallel criminal action.
  • The CFTC charged 14 entities, including crypto companies, for fraudulently claiming to be CFTC-registered futures commission merchants and retail foreign exchange dealers.

🦅 Congress

  • SEC Chair Gary Gensler testified before the U.S. House of Representatives Committee on Financial Services. He faced questions about ether”s status as a security or commodity, the SEC”s fast-paced rulemaking agenda, and many other issues.
  • The U.S. House of Representatives Committee on Financial Services published a draft stablecoin bill.
  • Republican House Representatives sent a letter to SEC Chair Gary Gensler, calling his push to have crypto firms “come in and register” a “willful misrepresentation of the SEC”s non-existent registration process” and blaming the SEC for the lack of the crypto registrants.
  • Senator Elizabeth Warren (D-Mass.) and House Representative Alexandria Ocasio-Cortez (D-N.Y.) sent letters to BlockFi, Circle, and twelve non-crypto firms, asking about their companies” decisions to “bank with and maintain large, uninsured deposits at Silicon Valley Bank.”
  • Discussing Operation Choke Point 2.0, House Representative Byron Donalds (R-Fla.) said that “nothing is a coincidence” with government agencies, and that agencies are “finding various ways to squeeze an outcome that they want.”
  • House Representatives Patrick McHenry (R-N.C.) and Bill Huizenga (R-Mich.) wrote to SEC Chair Gary Gensler reiterating their request for documents relating to the SEC”s charges against SBF and noting the SEC”s failure to comply with previous requests.
  • House Representative Patrick McHenry (R-N.C.) said that Europe is “ahead of the game” in terms of web3 regulation in light of the passage of the Markets in Crypto Assets Regulation, and the United States is “behind.”
  • House Representative Warren Davidson (R-Ohio) released plans to introduce legislation that would remove SEC Chair Gary Gensler and replace his role with an Executive Director that reports to a board.

⚖️ Department of Justice

  • An individual involved in unlawfully obtaining approximately 50,000 bitcoins from the Silk Road dark web internet marketplace was sentenced to one year and one day in prison for committing wire fraud.
  • The DOJ charged two U.S. citizens and a South African national with conspiring to manipulate the market for HYDRO, a crypto asset issued by the Hydrogen Technology Corporation.
  • An individual who led a scheme to defraud banks and a cryptocurrency exchange of more than $4 million pled guilty to one count of conspiracy to commit wire fraud.

💵 Department of the Treasury

  • The Internal Revenue Service is sending four agents who specialize in investigating cybercrime to Australia, Singapore, Columbia, and Germany starting this summer as part of a pilot program “to help combat the use of cryptocurrency, decentralized finance and mixing services in international financial and tax crimes.”
  • The Office of Foreign Assets Control sanctioned three individuals for providing support to North Korea through illicit financing and malicious cyber activity. Two of the sanctioned individuals assisted North Korea”s Lazarus Group in converting stolen crypto into fiat currency. The Department of Justice charged one of the sanctioned individuals as well.

🏦 Federal Reserve

  • Federal Reserve Board Governor Michelle W. Bowman said that there could be “some promise for wholesale CBDCs” (central bank digital currencies), but that it would be “difficult to imagine a world” where the benefits of a direct access CBDC would outweigh the unintended consequences.
  • Federal Reserve Board Governor Christopher J. Waller said that there is “considerable promise” associated with tokenization and the use of smart contracts.

🏔️ Montana

  • Montana”s House of Representatives passed legislation that protects cryptocurrency mining operations from certain actions, such as prohibitions on at-home mining and discriminatory utility rates for miners. The legislation will go into effect if the governor signs it.

🗽 New York

  • The New York Department of Financial Services adopted a final regulation establishing how companies holding a Bitlicense will be assessed for costs of their supervision and examination.

📈 Securities and Exchange Commission

  • The SEC charged Bittrex and its co-founder and former CEO William Shihara with operating an unregistered national securities exchange, broker, and clearing agency, while also alleging that the following tokens are securities: OMG, DASH, ALGO, TKN, NGC, and IHT.
  • Chair Gary Gensler released an investor-education video relating to digital assets.
  • The SEC listed new job postings for its crypto enforcement unit in New York, San Francisco, and Washington D.C.

🌍 International

🇦🇷 Argentina

  • Argentina”s National Securities Commission authorized the launch of a bitcoin index-based futures contract.

🇧🇲 Bermuda

  • Coinbase received a regulatory license to operate in Bermuda.

🇸🇻 El Salvador

  • Bitfinex announced that it has received El Salvador”s first license for digital asset service providers.

🇪🇺 European Union

  • The European Parliament passed the Markets in Crypto Assets regulation, making the European Union the first major market to regulate crypto asset transfers.
  • European Central Bank Board member Fabio Panetta said that a digital euro will aim to “complement” cash and mimic its best features, but that it may not afford the same level of privacy.
  • European Banking Authority Chairman José Manuel Campa wrote that “the EBA will be paying special attention to diversification” of stablecoin reserves.

🇫🇷 France

  • A French Central Bank report acknowledged that regulations must take into account the specific features of decentralized finance (DeFi) and not “replicate the systems that currently govern traditional finance.”
  • The French Ministry of Economics and Finance published a public consultation regarding metaverse strategies, noting that its purpose is to “propose an alternative to the virtual online worlds today put forward by international giants.”

🇭🇰 Hong Kong

  • Hong Kong”s Financial Secretary Paul Chan said that now is the “right time” to push web3 adoption forward, despite volatility in crypto markets.
  • A Hong Kong court declared for the first time that cryptocurrencies are property and capable of being held in trust.

🇮🇳 India

  • Indian Finance Minister Nirmala Sitharaman said that effective regulation of crypto assets would require every nation”s consent and global consensus.

💶 International Monetary Fund

  • An IMF report said that the “collapse of multiple entities in the crypto asset ecosystem has again made the call more urgent for comprehensive and consistent regulation and adequate supervision.”

🇮🇪 Ireland

  • Kraken received regulatory approval from the Central Bank of Ireland to operate as a virtual asset service provider in the country.

🇮🇱 Israel

  • The Bank of Israel published a report outlining potential scenarios that would influence its decision on whether to issue a digital shekel. One such scenario involves a significant increase in stablecoin usage.

🇯🇵 Japan

  • Japan”s ruling Liberal Democratic Party”s web3 project team published a white paper that sets forth recommendations for boosting the country”s crypto industry.

🇲🇪 Montenegro

  • Ripple announced an agreement with the Central Bank of Montenegro to develop a pilot program for a CBDC.
  • Montenegrin prosecutors submitted an indictment proposal against Kwon Do-Hyung (“Do Kwon”), co-founder of Terraform Labs, the company behind cryptocurrencies TerraUSD and Luna, for forging personal documents.

🇳🇱 Netherlands

  • A Dutch court released, with electronic monitoring conditions, Alexey Pertsev, a Russian developer who worked on code for the Tornado Cash protocol, pending trial.

🇰🇷 South Korea

  • South Korean prosecutors indicted Terra co-founder Daniel Shin and nine others in connection with the collapse of the TerraUSD and Luna cryptocurrencies.

🇦🇪 United Arab Emirates

  • The UAE”s Securities and Commodities Authority announced that it will start accepting licensing applications from companies that wish to provide virtual asset services in the country.

🇬🇧 United Kingdom

  • Bank of England Governor Andrew Bailey called for stablecoins to be regulated like commercial bank money, and he said that the U.K. will likely need a CBDC to “anchor the value of all forms of money, including new digital ones.”
  • Bank of England Deputy Governor Jon Cunliffe said that the potential tokenization of money has “major implications” for the Bank of England, and that it should consider whether to put limits on stablecoins used for payments, among other things.
  • The Bank of England successfully tested the use of distributed ledger technology to run interbank transactions through the Bank of International Settlements.
  • Financial Conduct Authority (FCA) Executive Director Sarah Pritchard said that the FCA wants to work with crypto firms to shape regulation.

🇿🇲 Zambia

  • Zambia”s Science and Technology Minister Felix Mutati told Reuters that Zambia plans on finishing tests that simulate real-world cryptocurrency usage by June.

🇿🇼 Zimbabwe

  • Reserve Bank of Zimbabwe Governor Dr. John Magudya told the Sunday Mail that Zimbabwe plans to introduce gold-backed digital tokens.

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

6 questions every founder should ask about pricing strategy

https://a16zcrypto.com/content/article/pricing-strategy-questions-every-founder-should-ask/

Deciding how to price a new product or service is one of the key challenges that founders need to tackle early in the product development lifecycle. In a relatively new market, like crypto, it can be difficult to know even where to start. 

The strategy you choose will depend on your circumstances. Fully decentralized projects which have their own tokens – like many decentralized apps and web3 protocols – tend to have their own unique logic and design considerations. But for founders building other crypto-related businesses – like infrastructure providers, services-based startups, and such – some universal principles provide a template for thinking through product pricing.

Getting one”s approach to pricing right can have a dramatic impact on a company or project”s early customer wins, go-to-market strategy, and long term success. The key is ensuring that a product”s capabilities and features align to a philosophy that allows for optimal pricing. Conversely, once a product is in market, early customer expectations” become set. After that, changing prices can be hard.

One of the coauthors learned this lesson the hard way while he was CEO of Ning, an early social company. In 2005, Ning began as an ad-supported platform that allowed anyone to create their own vertical social network. Several years into the company, it became clear, however, that a subscription-based business model was a better approach for Ning”s target market. While the transition ultimately succeeded, the path to get there was not easy.

Determining the right pricing for a new product in a new market doesn”t have to feel so overwhelming. Below, we pose six questions every founder should ask that can make the process more manageable… and dare we say satisfying?

1. What are all of the different ways I could price, and which approach makes the most sense for my market?

The first step is to identify all of the different possible approaches to pricing. By weighing each option, you can better find the approach – or combination of approaches – most suited to your circumstances. This exercise can unlock new insights or hybrid pricing approaches that better align with the needs of customers or the state of the market.

Here are some common pricing models along with brief descriptions of how they apply, plus a few traditional and crypto-specific examples for each. (It”s worth reiterating that these pricing frameworks largely apply to web3-adjacent projects that don’t have a token and aren’t fully decentralized.)

Now that you know the most common pricing methods, you can begin to apply this framework in practice.

Consider some examples from the crypto industry. While not every customer action is on-chain, you could use on-chain data to better forecast the differences between each pricing approach. For an insurance product, you could price based on the net asset value in a customer”s wallet. For a crypto security service, you could price based on an NFT collection”s floor price multiplied by the number of NFT holders.

You may find compelling ways to mix and match pricing methods. While examples above might be listed in a single category, they often cover multiple pricing model types. Alchemy uses both usage- and feature-based approaches, for example, and The Block uses both usage- and subscription-based ones. Choose the combination that works best for your purposes.

2. Are there opportunities for me to price differentiate within my product line?

Most technology products appeal to a range of customers. The gamut spans from casual first-timers with basic needs to self-sufficient pros seeking more advanced features to enterprise customers who often want the highest level of service and capability that a company can offer. This range of needs and desires creates an opportunity to tailor product offerings to different customer segments through a Good, Better, Best framework.

Here are some examples of the framework in action:

ProductGood (least expensive)BetterBest (most expensive)
GM car brandsChevroletBuickCadillac
Amex credit cardsGreenGoldPlatinum/Black
Lyft ride typesWait & savePreferredLux
Apple phonesiPhone SEiPhone [latest]iPhone [latest] Pro
Netflix subscriptionsStandard with adsBasic/StandardPremium
Alchemy plansFreeGrowthEnterprise
Bored Ape Yacht Club NFTsBored Ape Kennel Club Mutant Ape Yacht ClubBored Ape Yacht Club

The first offering – Good – provides a solid set of capabilities at an attractive price point. This typically least expensive option gets the job done for a large swath of the customer base. Better builds on Good and usually adds a few additional core capabilities that cater to customers who see the offering as a key tool in their daily workflow. Best appeals to customers who require a specialized feature-set to accomplish their goals or who simply want to know that they are purchasing the most sophisticated offering available. 

Within this framework, founders will often want to experiment and iterate on the product capabilities that go into each bucket. 

Pro tip: When selling to enterprises, ensuring that the Best offering includes an appropriate level of support, security, and onboarding and deployment assistance is key. Without this level of product robustness and focus on customer success, implementations will often fail to gain traction. In these cases, you may lose a customer or reference, which can make it significantly harder to sign the next major customer.

Think about the customer you”re trying to target. Who is this person? What does the person need or desire? One helpful exercise is to map the various stakeholders within a prospective customer”s organization to understand all the places where value accrues. For example, a marketing automation tool might directly benefit the marketing team, but it might also free up analytic resources on adjacent teams, like a data science group, that can now be used on other projects. Focus not just on benefits for a single stakeholder but to address differentiated selling points for all types of stakeholders.

3. Am I charging the right price?

While founders often worry about overpricing their product, a more common mistake is to underprice

For a new technology product to gain market traction, it has become conventional wisdom to talk about the new product needing to improve that which preceded it by 10X. In today”s tech market – which has increasingly sophisticated customers, a plethora of competing products and services, and entrenched providers who enjoy large moats and lock-in – a 10X improvement is likely the low bar of what new entrants need to offer to break through. An improvement of this magnitude is basically table stakes.

When a team builds a new product that really does deliver at this level and beyond, the company has the opportunity to capture meaningful economic value in exchange for the productivity improvement, growth acceleration, or cost savings they are delivering to their customers. Discovering where the efficient frontier of this value exchange lies is a key factor in successfully defining pricing for a new product. 

One effective and simple way to conduct this exploration is to separate prospective customers into two cohorts – Cohort A and Cohort B – and to share dramatically different price points with each of them. Imagine that for Cohort A, a startup offers their Good offering for X dollars per month, Better for Y dollars per month, and Best for Z dollars per month. Simultaneously, the startup pitches Cohort B the same offering at 5X for Good, 5Y for Better, and 5Z for Best. In this example, Cohort B”s pricing is literally five times higher than that of Cohort A. What the startup should observe, in this case, is the variation in win rate between the two cohorts. If four out of five prospects in Cohort B still sign up for the service, this suggests that the pricing shown to Cohort A undervalues the offering. 

You might be surprised by how much people are willing to dish out on your product or service!

Another consideration is the denomination of your pricing. In other words, you need to decide what combination of fiat, stablecoins, and/or tokens you accept. While it may be more customer-friendly to accept a wide variety of payment methods, this may also increase your exposure to market fluctuations. Consider that, if your expenses are primarily in fiat, it might make sense only to accept fiat and stablecoins. You must be able to meet your expenses.

4. How do other products or services in adjacent markets price?

When thinking through how to price a new product or service, a good place to start is to look at the approach taken by peers in adjacent markets. Surveying comparable products can provide a helpful starting point for your own pricing strategy. 

For founders building in web3-related categories such as infrastructure, gaming, developer tooling, and so on, look at how analogous web2 products price. A number of web3 startups are innovating in the security and scam prevention space, for example. By looking at the pricing pages of the many companies that offer similar services for web2 consumers and enterprises – like Aura, Bitwarden and CrowdStrike – founders can quickly identify a number of potential pricing models that are already well understood in the market.

5. Do my unit economics work?

Losing money on every customer but making it up in volume is a strategy that doesn”t work for very long, particularly in an economic environment where there has been a broad pullback in startup funding, like the one in which we find ourselves currently. This is a recipe for bankruptcy.

Founders should understand the unit economics of their company from early on in the startup”s life. They should also develop a strong point of view about how these unit economics will change, evolve, and improve over time. With a little preparation, founders can significantly increase their chances for success.

In the early dotcom era, founders pursued a number of ambitious ideas such as grocery and one hour delivery of convenience items. Many of these startups went bust – but then, years later, businesses with strikingly similar models, like Instacart and Doordash, succeeded. A combination of at least four factors made the unit economics of these newer businesses work whereas prior attempts failed. These differences included:

  • A 10x+ increase in the number of internet users
  • The advent of the ubiquitous, always-connected smartphone
  • The development of sophisticated routing and logistics software
  • The willingness of suppliers, like grocery stores and restaurants, to integrate with this software (based on a realization that doing so could drive incremental business)

Having the foresight, tenacity, and ability to change the trajectory of one”s unit economics can lead to a long term sustained advantage that others will struggle to emulate. The superior margins that Apple enjoys in its iPhone business are, for example, a function of at least three factors: 1) enormous volume which enables a superior cost structure for the business, 2) a supply chain that the company has built, refined, and heavily negotiated over more than a decade, and 3) a premium brand. While gross margins don”t determine the overall fate of a startup, they can have a long lasting impact on the degree to which companies can invest in R&D, marketing, and other critical functions that determine long term growth rate and success.

6. If I imagined that my market was going to grow 10X-100X and I captured 100% market share, would I end up with an interesting business?

As a founder/CEO, it”s normal to spend countless hours and sleepless nights obsessing over the strategy and details of your company in an effort to find product-market fit, overcome numerous obstacles, and achieve long term success. In the back (and frequently the front) of every founder’s brain is the ever-present fear of failure. 

However, in our experience, there are outcomes much more painful than failure. What”s even worse is to spend years building a product and company and make just about every decision within one”s control correctly only to discover that the market opportunity isn”t large enough to justify the incredible investment of time, talent, and capital that the founder and team poured into the pursuit. Beware Pyrrhic victory. 

A simple way to test whether this failure mode is likely (and hopefully to avoid it!) is to do a simple top-down spreadsheet exercise early in the history of the company. Imagine for a moment that the market you are pursuing grows 10X to 100X and your company captures 100% share of this future market.

Next make an assumption about the average revenue per customer and multiply future market size * average revenue per customer * market share (in this case 100%). How does the math look? Does the imaginable market opportunity look large enough that one can build an interesting company? If so, excellent! If the market doesn”t look large enough to be interesting, perhaps there are changes to be made in market sizing or average revenue assumptions. As Disney CEO, Bob Iger, so aptly put it, the best business advice he ever received was to avoid getting into the business of manufacturing “trombone oil” – a limited market, to be sure. While it might be possible to become the best trombone oil manufacturer in the world, the world only consumes a few quarts of it per year.

Very few founders take the time to do this math – but this simple exercise can prevent heartache, pain, and missed opportunity down the road.

***

Jason Rosenthal is an Operating Partner at a16z crypto, where he works closely with the CEOs and founders in our portfolio to provide leadership guidance, help them plan for and react to the aspects of running companies in rapidly-changing markets, and generally help them become the best version of their professional selves. Jason has spent more than 25 years as an internet entrepreneur and executive, including more than 10 years as a startup CEO, and his career has been dedicated to the development of transformational new platforms. 

Maggie Hsu is a partner at Andreessen Horowitz leading go-to-market for the crypto portfolio. She previously led go-to-market for Amazon Managed Blockchain at Amazon Web Services, and before that led business development for AirSwap, a decentralized exchange. Maggie has held executive positions at Zappos.com and Hilton Worldwide. She was also a consultant at McKinsey and Company. Maggie is also a cofounder of Gold House, a nonprofit collective of pioneering Asian leaders. 

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not necessarily the views of a16z or its affiliates. a16z is an investment adviser registered with the U.S. Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party information; a16z has not reviewed such material and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities, digital assets, investment strategies or techniques are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments or investment strategies will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Additionally, this material is provided for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. Investing in pooled investment vehicles and/or digital assets includes many risks not fully discussed herein, including but not limited to, significant volatility, liquidity, technological, and regulatory risks. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures/ for additional important information.

Data availability sampling and danksharding: An overview and a proposal for improvements

https://a16zcrypto.com/content/article/an-overview-of-danksharding-and-a-proposal-for-improvement-of-das/

Danksharding is an approach to scaling the amount of data on chain for a future version of Ethereum. The goal of this upgrade is to ensure that the data on chain was made available to archiving parties when it was first posted. It does so using a technique called Data Availability Sampling, or simply DAS.

In this post we survey how data availability in danksharding works and propose some modifications to the underlying technique. In particular, we explore a small modification that may improve data recovery: the current proposal requires 75% of the shares to recover a block, whereas the modification may reduce this bound to 25%.

Protodanksharding

Danksharding is scheduled to follow protodanksharding (EIP-4844). Protodanksharding will allow clients to write more data to the blockchain by introducing a new transaction type called a “blob-carrying transaction.” Initially, this new transaction type will carry with it a list of up to four data blobs, where each blob is at most 128 KB (4096 field elements of 32 bytes each), adding up to 512 KB additional data per block (targeting 256 KB on average), compared to current Ethereum’s block size, which is around 100 KB on average

These data blobs will be handled differently: (i) they will only be stored for a limited time, for example, 30-60 days, and (ii) the data will not be directly accessible to smart contracts despite being part of the transaction data. Instead, only a short commitment to the blob data, called DATAHASH, will be available to smart contracts. The additional burden on the validators seems tolerable: validators currently store less than 100 GB to maintain the blockchain’s state. After protodanksharding they will have to store an additional 50-100 GB.

Danksharding will follow. It will increase the data available to clients by 60x over protodanksharding by increasing the maximum number of blobs per block. Blocks will grow from 0.5 MB per block to 30 MB. But because validators cannot be made to store 60x more data, the data will be dispersed among them such that each validator will store only a small subset of data. Yet they can come to agreement on whether they collectively store all of the data through a data-availability sampling (DAS) protocol.

The blob-data will be priced via a mechanism similar to EIP-1559, and will target about 1 data-gas per byte. Calldata, which is the current cheapest alternative, is priced at 16 gas per byte. But since there will be two different fee markets, these costs are not directly comparable. Roll-up clients will benefit from those upgrades, because currently more than 90% of their client’s fees go to paying Ethereum data fees.

Other projects, such as Celestia and EigenLayer, are also adopting DAS techniques to increase the available data space. These designs are much simpler than fully sharding the Ethereum network.

The goal of data availability sampling

We describe the scheme assuming a proposer-builder separation (PBS) design: 

  • Clients submit their blob-carrying transactions to block builders.
  • A block builder forms a block B by selecting N client data-blobs. Data-blob number i comes with a short commitment Ci signed by the client that sent it. Let C = (C1, . . . , CN) be the list of all N signed commitments in the block B.
  • Block builders submit their proposed blocks to the current block proposer (one of the validators). The block proposer selects one of the blocks and posts it to the network as is.

The challenge is to ensure that the block B can be reconstructed at a later time. To do so, the builder replicates the block across a large network of V validators. One could ask each and every validator to store the entire block, but that is considered too expensive. Instead, the block builder

  1. encodes the block B into a larger block E using erasure coding;
  2. breaks the block E into V overlapping fragments P1, . . . , PV;
  3. sends to validator number i the pair (Pi, C).

Every validator checks that the fragment Pi that it received is consistent with the list C of signed commitments. The block builder provides proofs for the validators to facilitate these checks.

With this setup, a Data Availability Sampling scheme has two protocols: 

  1. A sampling protocol runs between a sampling verifier and the validator set. The sampling verifier takes the list C as input and requests randomly selected elements of the block E from the validator set. The sampling verifier outputs success if it received all the requested elements and all are consistent with C
  2. A reconstruction protocol runs between a reconstruction agent and the validator set. The reconstruction agent takes C as input and requests elements of block E from the validator set. Once it collects more than 75% of the elements, and all are valid, the reconstruction agent computes and outputs the block B

(We discuss an approach below that may reduce the number of elements needed for reconstruction.)

The requirement is that if the sampling verifier outputs success, then the reconstruction agent will output the block B, provided it receives over three quarters of the elements as input. Reconstruction should succeed as long as enough elements are provided, even if the provided elements are selected adversarially.

To recap, the following parties participate in danksharding: 

  • Client: sends data blobs (which are either transactions or bundles) to a builder.
  • Builder: makes a block and sends fragments of this block to validators.
  • Block proposer (one of the validators): posts the block to the network.
  • Sampling verifier (any of the validators): runs the sampling protocol and signs the block header if the protocol outputs success. 
  • Reconstruction agent: reconstructs a previously posted block when needed by interacting with the entire validator set. Reconstruction succeeds if the validators respond with more than three-quarters of valid elements.

Erasure coding and polynomial commitments 

There are two building blocks to the scheme that we explain next: erasure coding and polynomial commitments.

Building block #1: Erasure-coding

Erasure coding dates back to the 1960s, motivated by the need to transmit information over a lossy channel. It is used in danksharding to protect against validators that lose their data fragments. The technique expands the data from N elements to M elements (M > N) such that the original data can be reconstructed from any intact N out of M elements of the expanded data. Imagine taking N elements (original data), encoding them into M = 2N elements, and giving one encoded element to each of the 2N validators. If a majority of validators are honest, they can always jointly reconstruct the original data. This technique protects against crash faults of any half of the validators. It can be extended to protect against byzantine behavior of a half of the validators using polynomial commitments discussed in the next section.

Here is how expansion works in more detail. To expand the data from N field elements d1, d2, . . . , dN ∈ 𝔽p to M encoded elements e1, e2, . . . , eM ∈ 𝔽p, and we interpolate the unique degree N – 1 polynomial p(x) that satisfies p(i) = di for i = 1, . . . , N. The polynomial is then evaluated at M points: ei = (i,p(i)) for i = 1,. . . , M. The original polynomial, p(x), can be reconstructed from any N out of the M encoded elements using Lagrange interpolation. This expansion method is called Reed-Solomon encoding or low-degree polynomial expansion.

Building block #2: Polynomial commitments

Polynomial commitment is a building lock of a DAS scheme. It allows the scheme to do two things:

  1. commit to a univariate polynomial p in 𝔽p[X] of bounded degree D using a short commitment string, C.
  2. open the committed polynomial at a given point x ∈ 𝔽p.

More precisely, given x,y ∈ 𝔽p, the committer can create a short proof π that the committed polynomial p satisfies p(x) = y and that its degree is at most D. The proof π is verified with respect to the commitment string C and the values x, y. This π is called an evaluation proof.

Security guarantees that the commitment is binding to the polynomial and that an adversary cannot create a false proof.

A number of practical polynomial commitment schemes can be used here. Danksharding uses the KZG commitment scheme that requires a trusted setup ceremony to generate public parameters (called an SRS) but has constant-size commitments and constant-size evaluation proofs. KZG commitments have a number of properties that make them especially well suited for danksharding:

  • the commitment is homomorphic: if C1 is a commitment to p1 and C2 is a commitment to p2, then C1 + C2 is a commitment to p1 + p2;
  • the commitment is unique: if two people independently compute a commitment to a polynomial p, they obtain the same commitment;
  • evaluation proofs are homomorphic: for a given x, if π1 is a proof that p1(x) = y1 and π2 is a proof that p2(x) = y2, then π1 + π2 is a proof that (p1 + p2)(x) = y1 + y2.

We can now explain how a client commits to its data-blob. First, the client interprets the data-blob as a vector of m field elements d1, . . . , dm ∈ 𝔽p, for m ≤ 4096. Next, it interpolates a univariate polynomial p ∈ 𝔽p[X] of degree at most m – 1 such that p(i) = di for i = 1, . . . , m. (Technically, danksharding uses 1, ω, ω2, . . . , ωm-1 ∈ 𝔽p as the evaluation points, where ω ∈ 𝔽p is an mth root of unity, and a reverse-bit order; this is done for efficiency, but we will not consider it here for simplicity.) Finally, the client constructs the KZG polynomial commitment to the polynomial p. It signs the commitment and sends the commitment-signature pair to the builder. This process requires public parameters (an SRS) containing 4096 group elements. 

Data Dispersal

Next, we explain how the block builder encodes a block and splits it into fragments to send to the validators. Fix some 256-bit prime p. The block builder does the following:

input: A block B in danksharding can contain up to 256 data blobs (64 times more blobs than in protodanksharding), where each data blob is a vector of 4096 elements in 𝔽p. Thus, we can represent a block as a 256 × 4096 matrix of elements in 𝔽p. Each row in this matrix corresponds to one data-blob from a client. Recall that each client sends to the builder one row of B along with a signed KZG polynomial commitment for that row. The builder collects 256 signed polynomial commitments C0, . . . , C255, one commitment per row.

step 1: The builder interpolates a bivariate polynomial d(X,Y) such that d(i,j) = B[i,j] for i = 0, . . . , 255 and j = 0, . . . , 4095. This bivariate polynomial has degree at most 255 in X and degree at most 4095 in Y.

step 2: The builder uses the erasure coding method described above to expand the block by a factor of two in each direction. That is, it forms a 512 × 8192 matrix E of field elements by setting E[i,j] ← d(i,j) for i = 0, . . . , 511 and j = 0, . . . , 8191. This is illustrated in the following figure.

step 3: The builder verifies that each signed Ci is a KZG commitment to the univariate polynomial di(Y) := d(i,Y), for all i = 0, . . . , 255. Observe that the polynomial di(Y) is an interpolation of row i of B, and therefore must be the same as the polynomial committed to by client i. The builder rejects all data-blobs for which Ci is malformed.

Now the builder uses C = (C0, . . . , C255) as a commitment to the block B, or more precisely, to the bivariate polynomial d(X,Y). 

Let’s show that C = (C0, . . . , C255) is indeed a polynomial commitment to the polynomial d(X,Y). For some given x,y,z ∈ 𝔽p, let us construct an evaluation proof that convinces the verifier that d(x,y) = z with respect to the commitment C. Since the degree of X in d(X,Y) is at most 255, there are constants λ0, . . . , λ255 ∈ 𝔽p that depend only on x such that 

d(x,Y) = λ0 · d(0,Y) + . . . + λ255 · d(255,Y).

Then, by the homomorphic property of KZG commitments it follows that 

Cx := λ0 · C0 + . . . + λ255 · C255 

is a KZG commitment to the univariate polynomial dx(Y) := d(x,Y). Hence, the verifier can construct Cx from C on its own. Let π be the KZG evaluation proof for the polynomial dx(Y) that convinces the verifier that dx(y) = z with respect to the commitment Cx. This π is the required evaluation proof for d(X,Y) at the point (x,y).

This argument shows that C is a polynomial commitment to d(X,Y). It is worth noting that while each client signs a row of B independently of the other clients, the collection of all client signatures functions as a signature on the polynomial commitment to d(X,Y).

step 4: The minimum unit of communication in this DAS scheme is a sample, which is a 16-elements row vector. The matrix E with 512 × 8192 elements is treated as a square matrix of 512 × 512 samples. Let V be the number of validators. Then the block builder breaks the matrix E into V overlapping fragments P1, . . . , PV, where Pi comprises exactly two rows and two columns of samples of E, chosen at random from the 512 rows and 512 columns of samples. Thus, Pi contains 2 × 512 × 16 + 2 × 8192 = 9216 field elements in 𝔽p. This is much smaller than the full block B, which is about a million field elements. The minimum number of validators in Ethereum is 128 (and currently around ~500,000), so enough validators exist to ensure that the whole block is sufficiently well covered. 

step 5: The block builder sends the triple (Pi, C, πi) to validator number i, where πi is a list of evaluation proofs for all the elements in Pi: one proof for each cell d(x,y) in the two rows and two columns of samples in Pi. The proofs in πi enable the validator to verify that every field element in Pi is consistent with the commitment C.

In danksharding the number of validators can greatly exceed the number of columns or rows in the block. Therefore, some columns and rows might be stored by multiple validators. As such, danksharding uses both replication and Reed-Solomon erasure coding to ensure that the data can be reconstructed. 

Full block’s reconstruction

When a reconstruction agent needs to reconstruct the entire block, it has C, and asks the validator set to send it their fragments. In response, an honest validator i sends (Pi, πi). The reconstruction agent checks the opening proofs in πi, and if valid, it accepts the values in Pi. Hence, byzantine validators cannot send false data. However, Byzantine validators may refuse to send their data and not respond to the reconstruction agent. 

When some data is missing, danksharding needs at least 75% of the matrix E to reconstruct the block. Since the validators only store complete rows and columns, the worst-case scenario is the following one where 75% – ε of the block’s elements are present, yet it is impossible to reconstruct the missing elements.

To see why the data cannot be reconstructed in this case, observe that there is a non-zero bivariate polynomial δ(X,Y) that evaluates to zero on the entire “present” region and has the required degree bounds on X and Y. Therefore, during reconstruction it is not possible to tell if the missing white area comes from the correct polynomial d(X,Y) or the polynomial d(X,Y) + δ(X,Y); both polynomials agree on the present data. As a result, when the missing data follows this pattern (up to a permutation of rows and columns) the missing data cannot be reconstructed. Note that if a validator drops some of its data, thereby becoming “byzantine,” it drops it all, both rows and both columns.

So less than 75% of the block is not enough in the worst case. However when 75% or more of the block is present the reconstruction is guaranteed to succeed by a simple greedy algorithm.

The reconstruction algorithm: Reconstruction works by iteratively finding an incomplete row (or a column) with at least 50% of available elements in it and using univariate interpolation to reconstruct the missing elements of the row (or column). To see why this procedure eventually reconstructs the whole block, let”s assume the block is still missing some elements, yet we can find no row or column that can be reconstructed. This means that each row and column either has 50% of unavailable elements, the columns through those elements also have to have >50% unavailable elements, which immediately implies that >25% of the block is unavailable, which contradicts the original assumption.

To render one block unreconstructable, the adversary needs to attack at least 15/16 of the validator set. The attackers in this case would pick a quadrant to erase, and attack those validators that have at least one of their two rows or one of their two columns in that quadrant. 

Local reconstruction of validator’s data

Suppose that an honest validator i crashes and loses its data. When it comes back online it wants to reconstruct its two rows and two columns Pi as well as the opening proofs πi. Danksharding enables a validator to do that without reconstructing the entire block. The validator can download its data from another validator that stores the same exact set of points, or by obtaining 50% of the elements of its row (or column) from other validators, and reconstructing the rest of the row through univariate interpolation. This process does not have to go through full reconstruction.

For example, imagine a validator stores at least 50% of column number 42, but less than 100% and it needs to reconstruct the missing elements. That is, the validator holds pairs (E(i,42), πi) ∈ (𝔽, 𝔾) for all i ∈ S, for a set S where 256 ≤ |S| < 512. To reconstruct the missing pairs, the validator does the following:

  1. Using Lagrange interpolation over the field 𝔽 constructs a polynomial p(x) of degree 255 s.t. p(i) = E(i,42) for all i ∈ S.
  2. Evaluates the polynomial to obtain the missing elements: E(i,42) := p(i) for all i ∈ {0..511}S. Note that the points are guaranteed to lie on a 255-degree polynomial because the validator has checked them against the commitments C.
  3. Obtain the missing proofs via a multi-exponentiation by doing polynomial interpolation “in the exponent”: for all i in {0..511}S compute πi := ∑j∈S πj · Λj,S(i), where Λj,S(i) is the Lagrange coefficient Λj,S(i) := ∏k∈S,kj(i k)/(j k).

Step 3 uses the fact that KZG polynomial commitments have homomorphic proofs. To the best of our knowledge, KZG is the only polynomial commitment scheme with this property.

Sampling for data availability in danksharding

To determine whether enough of the expanded block E is available for reconstructing the original block B, the sampling client queries for random samples of E. In response to each query, it gets a sample (16-elements row vector) and an evaluation proof to check the sample against the commitment C. If all its queries are successfully answered, the client accepts the data as available. If the client makes a total of Q queries, then it falsely accepts unavailable data with probability (3/4)Q which drops exponentially quickly with the number of queries Q. The rate 3/4 corresponds to the reconstruction threshold of 75% from the previous section. 

In the Ethereum network, sampling will be done by three types of parties: (i) full-nodes who can’t afford to store the blob-data in full, (ii) light clients, and (iii) the validators themselves. The validators have to certify that the data is available before casting any votes on a block and its predecessors. Each epoch has a fixed set of validators, which are split into 32 committees randomly. Each committee is assigned a 12 seconds slot of the epoch (there are 32 slots per epoch). One block is expected to be confirmed per slot. Each validator receives the fragments (2 rows and 2 columns of samples) of each of the blocks, even for blocks when the validator is not part of the attesting committee. In addition, each validator samples the current block and all the previous blocks to ensure that all the blocks are both valid and available before selecting a block for attestation according to a fork-choice rule.

The protocol guarantees that the data will be available for an epoch of 32 slots, that is, for 6 minutes. Additional approaches such as proofs of retrievability or incentive mechanisms can help ensure the data is made available for longer periods of time, e.g., 30-60 days when the blobs should be available before expiring.

A proposal for 25% reconstruction

In this section we explain how to make reconstruction succeed if only 25% of the data is available (compared to 75% as in current danksharding explained above).

When the fraction of available points is below 75%, then the greedy column-wise and row-wise univariate interpolation can fail. Instead, we propose to do reconstruction directly through bivariate polynomial interpolation. In this case, full reconstruction is possible even if only 25% of the points of E are available. However, this requires a small modification to the way that fragments are assigned to validators. Instead of giving each validator two complete rows and two complete columns of samples, we propose that each validator is assigned to store 

  • one complete row (512 samples), one complete column (512 samples), and 
  • a collection of 1024 samples randomly (or pseudorandomly) spread around the matrix E.

The overall storage requirement per validator is unchanged, but now reconstruction is possible even if fewer than 75% of the samples are available. Moreover, multiple validators store the same set of points, so that if any validator needs to reconstruct its fragment, it can request it from any other validator that stores the exact same fragment. If no such validator is available, it can do local or full reconstruction to obtain its fragment.

This hybrid proposal allows cheap reconstruction in the happy-case when the number of byzantine validators is small, so that more than 75% of the samples of E are available. In the unhappy case, when the number of byzantine validators becomes too high, the data can be reconstructed using full-bivariate interpolation from only 25% of the samples thanks to the random dispersion of samples throughout the matrix E.

Bivariate interpolation can naively be done by solving a linear system of equations on the coefficients of the polynomial. This simple approach requires constructing an interpolation matrix with 220 × 220 field elements (32TB). This is quite large but not infeasible. However, there are better methods that can be used. (See for example the survey from P.J.Olver-2016.) While the cost of bivariate interpolation is non-trivial, it is only needed when the fraction of recovered points is below 75%, and can be treated as a safety measure. To enable this safety measure, danksharding needs to be slightly modified to assign fragments to validators as outlined above. 

Danksharding with IPA commitments

The main drawback of the previous construction is the need for a trusted setup for the KZG polynomial commitment scheme. Otherwise the scheme is blazingly fast. However, the trusted setup typically needs a lot of work and coordination. (Our work on on-chain KZG setups can help simplify the ceremony for future projects). Some other polynomial commitment schemes do not need a trusted setup (e.g., Bulletproofs), although they do not have homomorphic proofs required for the efficiency of validators when reconstructing the data they need to store (as pointed out by Vitalik). 

The construction can be altered, however, to avoid the need for homomorphic proofs and still have light-weight validators. The high-level idea is to make block builders compute commitments to the columns of the block-matrix E. With this approach, the validators won’t need to reconstruct the column-proofs; they will simply reconstruct their columns in full themselves and recompute the proofs against the column commitment from scratch. Through consensus, the validators would make sure they agree on the same set of columns-commitments.

More precisely, the steps 1-4 are the same as in danksharding explained above. Step 5, however, is different.

step 5: The block builder computes polynomial-commitments to the columns of B: denote them by T = (T0, T1, . . . , T4095), each Tj is a commitment to d(X, j) (concretely it could be Pedersen vector commitment to the vector of coefficients of d(X, j)). Next, it creates a proof that C and T commit to the same matrix as follows. The block’s builder chooses a (pseudo)random point (, ŷ), uses the homomorphism of the commitments to interpolate and compute the column commitment Tŷ to a univariate polynomial d(X, ŷ), and a row commitment C to a univariate polynomial d(, Y) and create two proofs: π – the polynomial evaluation proof of d(, ŷ) against C, and πŷ – the polynomial evaluation proof of d(, ŷ) against Tŷ. The point (, ŷ) is universal for all of the validators, and can be obtained for example from a random beacon output or generated pseudo-randomly with a Fiat-Shamir transform. The block’s builder then sends (Pi, C, T, π, πŷ, d(, ŷ)) to validator number i, where Pi is the two rows and two columns of E. The builder computes no proofs in this construction.

The validator verifies the proofs π, πŷ to catch a malicious block”s builder who generates the column commitments T incorrectly. The validator then recommits to two rows and columns in Pi and verifies that those match the corresponding elements in C and T. If the row, denote x, in Pi is within range of the original matrix, B: x ∈ {0..255}, the validator simply verifies that the commitment matches the corresponding element Cx from C; however if the row is in the expanded portion: x ∈ 256..511, the validator first interpolates through the commitments in C to obtain Cx:

Cx := λ0 · C0 + . . . + λ255 · C255

Note that interpolation is possible because the IPA commitments (not their proofs though) are homomorphic. The validator verifies the columns of Pi in a similar way against T, interpolating if needed.

In this construction, the block’s builder does not have to compute the proofs for the validators. The validators can compute all the proofs themselves. It’s an interesting research problem, however, to figure out an efficient algorithm to compute a large number of proofs at once (similar to how it’s done for KZG, or using Vitalik’s ideas for IPA).

The main bottleneck of this approach, as we see it, is the loss in efficiency for sampling clients, as the proofs for IPA-based schemes are logarithmic in size (in contrast to constant-size for KZG). Moreover, the sampling clients might need to download column commitments in addition to the row commitments, incurring an overhead in communication. However, we believe this to be a promising direction for further exploration.

Danksharding with Merkle commitments

Another plausible approach Vitalik explored recently to avoid the trusted setup is to use Merkle commitments instead of polynomial commitments. Merkle commitment is not a polynomial commitment scheme, so it’s more challenging to make it work with erasure-coding. The block’s builder will erasure code the data and commit to the expanded data with Merkle-tree roots. The main challenge is to detect incorrectly expanded data without downloading the full data. Fraud proofs can be used to get around this issue, but that relies on the existence of a client that would download the full data, check that it was erasure coded correctly and correctly committed to, and would raise an alarm if it is not by providing a fraud proof.

Alternatively, a FRI proof can be used to check that the leaves of the Merkle tree are close to a Reed-Solomon codeword (i.e., check that the data underlying the Merkle commitment was erasure-coded correctly). The sampling clients would check the FRI proof and sample a sufficient fraction of the leaves by requesting their Merkle proofs to ensure the data is available and can be reconstructed.

***

Data availability sampling, and danksharding as one of its concrete instantiations, would bring the cost of data-storage down, leading to more scalable and cheaper blockchains. The coding aspects of DAS is a potentially rich area for research with many possible directions to explore. We suggested one of the possible avenues: improving the reconstruction protocol to use fewer samples (25% instead of 75%). Another exciting direction is to explore alternative commitment schemes that do not require a trusted setup.

***

Read more

Protodanksharding

DAS for Ethereum and danksharding

Research on DAS

KZG commitments

***

Thanks to Ansgar Dietrichs for providing many insightful comments that informed the post.

***

Valeria Nikolaenko is a Research Partner at a16z crypto. Her research focuses on cryptography and blockchain security. She has also worked on topics such as long-range attacks in PoS consensus protocols, signature schemes, post-quantum security, and multi-party computation. She holds a PhD in Cryptography from Stanford University under advisorship of Professor Dan Boneh, and worked on the Diem blockchain as part of the core research team.

Dan Boneh leads the applied cryptography group at Stanford University and co-directs the computer security lab. Dan”s research focuses on applications of cryptography to computer security, and includes cryptosystems with novel properties, web security, security for mobile devices, and cryptanalysis. He is the author of over one hundred publications and is a recipient of the 2014 ACM prize, the 2013 Godel prize, and the 2011 Ishii award for industry education innovation.

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not necessarily the views of a16z or its affiliates. a16z is an investment adviser registered with the U.S. Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party information; a16z has not reviewed such material and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities, digital assets, investment strategies or techniques are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments or investment strategies will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Additionally, this material is provided for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. Investing in pooled investment vehicles and/or digital assets includes many risks not fully discussed herein, including but not limited to, significant volatility, liquidity, technological, and regulatory risks. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

Hiring in web3: 4 principles for finding talent in volatile times

https://a16zcrypto.com/content/article/hiring-in-web3-in-volatile-times/

Crypto markets can be volatile – but crypto innovation follows an underlying order. Builders brought in when prices were high have stuck around, resulting in a steady flow of new ideas, code, and projects. A new generation of web3 startups is working on the next wave of advances, and many are actively hiring

Meanwhile, the tech talent landscape has shifted dramatically in recent months. Layoffs across all sectors, but especially in larger tech companies, have left hundreds of thousands of workers looking for new challenges and opportunities. And, as a result, web3 startups with cash on hand and an optimistic outlook are seeing a very different talent pool than they were a year ago. But how can companies set themselves up to make smart, well-timed hires in periods of volatility? No matter the season, hiring the right people at the right time is critical to growing a resilient team.

In this post, we go over a few principles and best practices for navigating this new talent landscape as a web3 startup. As former leaders at high-growth web2 and web3 organizations, we have seen a wide variety of scale, talent needs, and market fluctuations. So here are our thoughts on how teams can make the most of their headcount (and budget) as they transform the proverbial hiring abyss into a functional and efficient hiring funnel.

Do the work up front

Hiring fast takes foresight. Without some thorough planning, it can take longer to fill a role that a team already (maybe even desperately) needs. A few principles for getting started: 

  • Be realistic about hiring needs. A full-time hire isn’t necessarily a cure-all, especially when teams aren’t sprinting through a bull run. Teams may want to consider starting with agencies, freelancers, or other contingent workers in order to be able to scale up and down as budget and workload ebbs and flows. 
  • Work backwards from business needs to define the role. If it isn’t clear whether to hire a senior individual contributor and a senior director, then take a moment to unpack the company’s needs. Some questions to ask: What will the person in this role do in their first week? What will they do in six months, or a year? And will they need to build a team? Or build out the nuts and bolts of their discipline?
  • Define responsibilities and map them to skill sets. Can one person do it all, or does the team need to make multiple hires? For example, a token economist (or mechanism designer) may have the analytical and economic acumen to create a token program but you may want to consider hiring a software engineer to deploy and maintain these models in production.
  • Avoid over-hiring by staying focused on immediate needs, now and in the near future. Members of smaller teams often end up doing more than what’s outlined in their job descriptions – trends, technologies, and market conditions can move fast in web3. Focus can help organizations stay nimble and goal-oriented, while identifying obvious skill gaps and opportunities for growth.
  • Consult an expert on unfamiliar skills. Small companies must make a lot of “first” hires – particularly in newer, more niche web3 roles that didn’t exist a few years ago. Hiring managers seeking out skill sets that they don’t have (whether that’s writing Solidity or managing NFT communities) might opt to find an external advisor.
  • Reconsider “web3 native” as a required skill. Many hiring managers are asking candidates to come in with web3 experience, a qualification that can limit teams to a select cohort of candidates (a small enough pool that challenges many companies, despite recent layoffs). Worst case, teams may be seeking people with a combination of skills that don’t actually exist. Instead, consider which roles are appropriate for deeply experienced web2 professionals, or enthusiastic new talent looking for challenging, career-defining experiences.

These are just a few best practices, but there are, of course, many more to keep in mind before you post that job description. It’s also okay if pieces of this process aren’t quite working — hiring managers should always debrief and make adjustments as they go.

Quality, not quantity

A year ago, many companies (in web3 and beyond) were breathlessly filling seats to keep pace with market pressures. Now the same companies are reducing headcount, or slowing their hiring. Teams looking to fill key roles might need to make some hard choices, and prioritize their hiring plans accordingly. When there are fewer roles to fill, hiring the right person is even more important.

At the same time, the talent pool has deepened, and excellent candidates that weren’t available last year may be interested in exploring new roles, opportunities, and risks. But despite the influx of talent, a good principle to keep in mind isn’t just finding the best of the best – but also finding people who are in it for the long run. If someone is willing to join a project in rockier times, they will stick around in good times, too.

Connecting over closing

As recruiters, we sometimes like to think of ourselves as sales people (and we can be!) – so it’s easy to slip into the habit of treating the time between receiving and signing an offer like a hard sell. This mindset falls short when it focuses more on rattling off selling points of a company, and less on identifying a person’s unique needs, wants, and expectations. (Also: talent is a long game. You want to keep people, not just companies, centered because over time as your relationships deepen with the talent pool they will change jobs multiple times.)

Closing actually starts from that first phone call, and lasts throughout the interview process by learning about a candidate. This is especially important in more volatile times as a focus on quality of connection will help close candidates faster (read: less time spent hiring) and result in less churn and attrition after employees join (read: less time spent backfilling). When big market shifts happen, you may even have to “re-close” a candidate, giving them a call to help them tease apart perception of media headlines from reality. 

Closing should focus on two areas in particular:

  • Professional fulfillment: What makes a candidate passionate? And what problems are they trying to solve? With so many smart, idea-driven candidates in web3, there’s plenty to talk about. 
  • Work-life balance: Every company likes to pitch stellar work-life balance to potential hires; but in reality, startups can come with longer hours, greater uncertainty, and a slew of tasks that fall outside of a candidate’s job description. How do candidates rank work-life balance among other benefits? And what can your company realistically deliver?

And also include a few key considerations that are even more important in more volatile times:

  • Risk appetite: Is the candidate bought into the space and opportunity? This will matter more than usual to ensure there aren’t too many distractions in the new hire’s journey. Take the time to connect them to the role: Why them? Why here? Why now?
  • Experience with ups and downs: Has the candidate experienced a volatile market before? A candidate with less experience may need to hear how a company deals with market cycles: How do you do more with less, and focus on the work vs. external noise?
  • Base pay vs. equity: Make sure candidates understand the nuances to their compensation packages. Candidates may rightfully scrutinize packages that include tokens or equity, which can shift with the market. How can you provide information and mental models that help the candidate apply these to different outcomes? 

There are many ways to build trust throughout the interview process, and make the best possible impression on potential hires (get to know them, stay transparent about the hiring process, regularly communicate updates and expectations, and more). These insights can also paint a picture of why this role at this company at this time is the best fit. Not every offer will work out, but the right candidates will.

Focus on the future state

Speaking generally, the earlier a company is in its journey, the more work there is to do to help candidates “see the future,” especially if a candidate is more easily swayed by brand recognition. What is the company looking to accomplish? How does this hire fit into that vision? In an industry where change and innovation move quickly, it’s key that leaders keep their candidates (and employees) focused on the problem they are solving.

A prospective hire should be invested in the work a team is looking to do; they should also understand what success looks like, and what path the company is taking to be successful. Building in web3, for example, is dynamic and ever-changing — this can lead to a longer time focused on solving hard problems and being able to pivot quickly as the market evolves. 

Often (and especially in web3) this may require taking a step back and explaining a broader world view or ecosystem. Set the stage, and then dive into what matters: What needs to happen for there to be a successful exit here? Maybe a candidate would trade liquid compensation for more long-term incentives like equity. What does this growth story look like? 

To truly paint this picture, meet candidates where they are. Do they truly grasp the mission of web3 and understand where this company fits in? Consider trying a mental model for mapping the company’s growth to their personal growth (how can their role grow with the company, long term).  

***

Although every company, candidate, and hiring journey is unique, focusing on building a consistent process and hiring the right candidates will prevent churn in a time when teams need stability and precision the most. Being diligent when hiring is slow can be like building a muscle: The more teams practice, the stronger they’ll be when hiring speeds back up.

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

Tokenology: Moving beyond ‘tokenomics’

https://a16zcrypto.com/content/article/beyond-tokenomics-tokenology/

“Tokens” are a hot-button topic for those in crypto and web3. Even beyond web3, tokens are capturing mindshare from anyone interested in art, cryptography, design, economics, gaming, math, psychology, and more.

Tokens have been described as everything from a “breakthrough in open network design” to a “new digital primitive” (analogous to the website). We can generally define tokens “as an internet-native unit of value”. But here’s the key point: tokens represent not just monetary value, but also social, reputational, and so on. There are many forms of value.

Tokens represent multidimensional value; they are a vector, not a scalar. Borrowing from mathematical notation, scalars have only magnitude — vectors have both magnitude and direction. A specific token can represent ownership, membership, identity, and more. Crucially, tokens matter because they allow builders to keep the high-dimensionality inherent in any representation of value, and thus open up a rich new design space.

This is also why I believe the term “tokenomics” — for describing the study and design of tokens — is very limited and inherently constraining. It fails to capture and convey the full dimensionality of that rich design space. Tokens can be used to coordinate and organize people not only within, but beyond, a purely economic context.

Token design is still in a nascent phase, so reducing the dimensionality of tokens to a purely economic context limits what’s possible to build here. That’s why I think we need another term. I propose “Tokenology”, designating “the study” — of how to coordinate people, organizations, and/or computation towards a common goal — primarily through the use of cryptography and mechanism design.

But first, a bit more about what tokens are, and why they matter

To put tokens in the broader context of blockchains & crypto: Blockchains are a new computational paradigm that have created a novel way to organize people, societies, and capital. Two key benefits of blockchains are composability and tokens. Here, I’m focusing on tokens.

The terms people use in the crypto industry also include “token model”, “token mechanism”, or “token design” (we use that last one, too; see here). But they all refer to how a token interacts with an associated protocol, system, or mechanism. For example: Ethereum’s token model specifies how ether works in the protocol; the token model is a subset of the protocol as a whole.

So, more concretely, how can we use tokens? There are many ways, but to summarize a few current use cases, we can use tokens:

For ownership. Blockchains are the first way to launch user-owned-and-operated open source services at scale; Ethereum is an excellent example of a user-owned-and-operated “world computer”. Tokens also give users digital “property rights”, which is another important concept here. Finally, tokens allow for the ownership of hyperstructures, defined as “crypto protocols that can run for free and forever, without maintenance, interruption or intermediaries.” In this context, they would also create value that is accessible and destructible by the owners; but that value need not be monetary only and can be extremely valuable in other ways.

For alignment. Creator coins, social tokens, and NFTs allow fans to directly interact with the artists they love the most, and to prove their fandom — whether as an early adopter, to show intensity of support, or for community and meaning. Dogecoin’s strength, for instance, is the meme, community, and “religion” it represents. Tokens can also be extended to not only represent membership in this community, but to establish a digital cultural identity; in this context, holders of such tokens can also vote on creative decisions in decentralized collaboration between creator and community.

For incentive structures. Incentive design has often been described as the key to understanding and motivating human behavior, but it can also align systems, organizations, and networks. Tokens help coordinate validators and miners in Ethereum and Bitcoin, respectively. They also enable decentralized governance in DeFi protocols like Uniswap, Compound, and others. Tokens can assist in membership growth and derivative creation for NFT projects like BAYC or digitally native DAOs and communities like FWB.

For accessing goods and services. Smart contract platforms like Ethereum sell the service of computation; Helium allows anyone to purchase LoRaWAN and 5G transit; and Filecoin allows anyone to pay for data storage… to name just a few examples. Many NFTs are also used to “gate” digital and physical experiences. Token gating can be used to prioritize early, or more active community members, or via other criteria as a way to distinguish between casual and more dedicated participants and to ensure a richer experience overall.

This list of use cases for tokens is nowhere near a comprehensive list, of course; it’s still just the beginning. But as you can see, this is an extremely rich design space — and slices across the arts, economics, and much, much more.

The case for a new term

Tokens clearly matter — and not just to the crypto industry, but beyond. Once they are combined with the other key feature and benefit of blockchains — that of composability — then one can really understand more deeply how the direction, and not just the magnitude, tokens can represent so much more.

Tokens, again, represent value as a vector, not a scalar. It is only when we recognize tokens as the native way of representing value vectors that one can begin to convey the rich design space here.

“Value” is an abstraction and most folks conflate “value” and “money”. Tokens let you make value explicit without being money. I’d argue that because the modern western economy denominates value almost entirely in U.S. dollars, it loses information by reducing a high-dimensionality vector into a scalar. For the most part today value is either 1) implicit, or 2) only explicit in the form of U.S. dollars. Value can be represented in several other ways, including around the U.S. dollar. All interactions actually transfer value, most easily seen in the form of time and information.

The key is developers should be able to make implicit value explicit using a new token design.

So whether we call it “Tokenology” as I suggest here, simply drawing on the study of tokens; or something else — I’m curious for what you’d propose as an alternative as well! — we need to move beyond just tokenomics = tokens + economics. It’s tokenology = tokens x economics x art x …. A new term may also help us usher in a new and richer era of token design as well.

 

editors: Sonal Chokshi, Steph Zinn

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

 

7 Sanity Checks Before Designing a Token

https://a16zcrypto.com/content/article/designing-tokens-sanity-checks-principles-guidance/

Tokens are a powerful new primitive that can be defined in many ways; I’ve argued why we ought to consider the study and design of tokens as much broader than just “tokenomics” here.

Tokens clearly allow for a very rich design space. But we’re still in the early stages of exploring, let alone improving token design. The holy grail to attain here would be the modern equivalent of what’s commonly referred to as “the Dragon Book” by computer scientists. This book, which refers to Compilers: Principles, Techniques, and Tools (by Alfred Aho, Monica Lam, Ravi Sethi, and Jeffrey Ullman; and could sometimes also refer to earlier editions of that book or Aho’s and Ullman’s older work Principles of Compiler Design) — unified, defined, and influenced the study of compiler design for generations of computer scientists. So much so that two of the authors were named ACM Turing Award winners just a few years ago “for fundamental algorithms and theory underlying programming language implementation” and “for synthesizing these results and those of others in their highly influential books, which educated generations of computer scientists”.

We are far from the Dragon Book for token design — it is too early to produce a definitive text on tokens. Our Head of Research Tim Roughgarden notes though that we are likely a little under a decade away from this, it’s an ongoing body of work. Because the Dragon Book helped turn the “impossibly messy”, big computer science problem of the 1950s — of compiler design — into a more well-solved problem that could be tackled in stages, applying rigorous principles at each stage.

But some of the early opportunities, and pitfalls, are already becoming clear — so I thought it would be helpful to builders out there if I curated a list of some of the sanity checks our team often discusses when working with others on token design. I also encourage you to watch this recent talk on token design by Eddy Lazzarin, which covers mental models, common patterns and pitfalls, current token capabilities, and many design spaces left to explore.

The practical reality is that many teams endeavoring to find the “right” token design for their projects are often working without a tested framework for design — and thus run into the same challenges others have encountered before them. Fortunately, there are also early successes and examples of “good” token design. Most effective token models will have elements unique to their objective, but most flawed token designs share a number of common pitfalls. So here’s a list of instructive tips to avoid the most common failure modes.

#1 Have a clear objective  

The greatest pain in token design comes from building a complex model before explicitly stating an objective. There is no such thing as a good token model or a bad token model — there is only a token model that achieves your objective, or a token model that doesn’t.

The first step should always be to rigorously interrogate the objective, and ensure you (and your team) fully understand it: what is it, why does it matter, and what are you really trying to accomplish? Failure to rigorously define an objective usually results in a redesign and lost time. Clearly defining an objective also helps avoid the “tokenomics for the sake of tokenomics” issue that’s a common (and not unfair) criticism of some token designs.

Furthermore, the objective should be specific to the token. This may seem obvious, but is often overlooked. Examples of such objectives could include:

  • A game that wants to design a token model that best enables extensibility and supports modding.
  • A DeFi protocol that wants to design a token model that optimally distributes risk amongst participants.
  • A reputation protocol that wants to guarantee money is not directly fungible for reputation (for instance, by separating liquidity from reputation signaling).
  • A storage network that wants to guarantee that files are available with low latency.
  • A staking network that wants to provide the maximum economic security.
  • A governance mechanism that wants to elicit true preferences or maximum participation.

…the list could go on and on. Tokens can support any use case and objective — not the other way around.

So how do you start to define a clear objective? Well-defined objectives often arise from a mission statement. While a mission statement tends to be high level and abstract, an objective should be concrete and reduced to the most elemental form.

Let’s consider EIP-1559 as an example. One clear objective of EIP-1559 is perhaps best stated by Roughgarden as: “EIP-1559 should improve the user experience through easy fee estimation, in the form of an ‘obvious optimal bid,’ outside of periods of rapidly increasing demand.”

He continues by codifying another clear objective: “Could we redesign Ethereum’s transaction fee mechanism so that setting a transaction’s gas price is more like shopping on Amazon? Ideal would be a posted-price mechanism, meaning a mechanism that offers each user a take-it-or-leave-it gas price for inclusion in the next block. We’ll see … that the transaction fee mechanism proposed in EIP-1559 acts like a posted-price mechanism except when there is a large and sudden increase in demand…”

What both of these examples share in common is stating a high-level objective; offering a relatable analogy (possible in this case) to help others understand that objective; and then proceeding to outline the design that best supports that objective.

#2 Evaluate existing work from first principles

When creating something new it’s always a good idea to study what already exists. As you evaluate incumbent protocols and existing literature, you should evaluate them objectively on the basis of their technical merits.

Token models are often evaluated based on the price of the token or the popularity of the associated project. These factors can be unrelated to the ability of the token model to meet its stated objectives. Valuation, popularity, or other naive ways of evaluating a token model can lead builders astray.

If you assume other token models function correctly when they don’t, you can create a broken token model. If you repurpose a token model with a different objective, you can implicitly inherit assumptions that don’t make sense for your token model.

#3 Articulate your assumptions

Be explicit about articulating your assumptions. It’s easy to take basic assumptions for granted when you are focused on building a token. It’s also easy to incorrectly articulate the assumptions you’re really making.

Let’s take the example of a new protocol that assumes that its hardware bottleneck is compute speed. Using that assumption as part of the token model — by bounding the hardware cost required to participate in the protocol, for instance — could help align the design to the desired behavior.

But if the protocol and token designers don’t state their assumptions — or are wrong in their stated assumptions — then it’s possible that participants who realize that mismatch will be able to extract value from the protocol. A “hacker” is often someone who simply understands a system better than the people who built it in the first place.

Articulating your assumptions makes it simpler to understand your token design and ensure it works properly. Without articulating your assumptions, you also can’t validate your assumptions…

#4 Validate your assumptions

As the popular saying goes, “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.” [Often attributed to Mark Twain and others, this quote evolved incrementally over time.]

Token models often make a set of assumptions. This approach comes in part from the history of Byzantine system design as an inspiration for blockchains. The system makes an assumption and builds a function where, if the assumption is true, some output is guaranteed. For example: Bitcoin guarantees liveness in the synchronous network model, and consistency if 51% of hashpower in the network is honest. Several smaller blockchains have been 51%-attacked, violating the honest majority assumption Nakamoto consensus requires for a blockchain to function correctly.

Token designers can validate their assumptions in a variety of ways. Rigorous statistical modeling, often in the form of an agent-based model, can help test those assumptions. Assumptions about human behavior can also often be validated through talking with users, and better yet, by observing what people actually do (vs. say they do). This is possible especially through incentivized testnets that generate empirical results in a sandbox environment.

Formal verification or intensive audits will also help ensure that a codebase acts as intended.

#5 Define clear abstraction barriers

An “abstraction barrier” is an interface between different levels of a system or protocol. It is used to separate the different components of a system, allowing each component to be designed, implemented, and modified independently. Clear abstraction barriers are useful in all fields of engineering and especially software design, but are even more a requirement for decentralized development and large teams building complex systems that a single person can’t grok.

In token design, the goal of clear abstraction barriers is to minimize complexity. Reducing the (inter)dependencies between different components of a token model results in cleaner code, fewer bugs, and better token design.

Here’s an example: Many blockchains are built by large engineering teams. One team might make an assumption about the cost of hardware over time, and use that to determine how many miners contribute hardware to the blockchain for a given token price. If another team relies on token price as a parameter, but isn’t aware of the first team’s assumptions about hardware cost, they could easily make conflicting assumptions.

At the application layer, clear abstraction barriers are essential for enabling composability. The ability to adapt, build on top of, extend, and remix will only grow more important as more protocols compose with each other. Greater composition leads to greater possibility but also greater complexity. When applications want to compose, they have to understand the details of the protocol they compose with.

Opaque assumptions and interfaces have occasionally led to obscure bugs, particularly in early DeFi protocols. Murky abstraction barriers also extend development times by increasing the required communication between teams working on different components of the protocol. Murky abstraction barriers also increase the complexity of the protocol overall, making it difficult for any single person to fully understand the mechanism.

By creating clear abstraction barriers, token designers are making it easier to anticipate how specific changes will affect each part of the token design. Clear abstraction barriers also make it easier to extend one’s token or protocol, and to create a more inclusive and expansive community of builders.

#6 Reduce dependence on exogenous parameters

Exogenous parameters that are not inherent to the system, but that affect overall performance and success — such as the cost, throughput, or latency of a computing resource — are often used in the creation of token models.

Dangerously, unexpected behavior can arise when a token model only functions while a parameter remains in a limited range. For example, consider a protocol that sells a service and provides a rebate in the form of a fixed token reward: If the price of the token is unexpectedly high, the value of the token reward could be greater than the cost of the service. In this case it’s profitable to purchase an infinite amount of service from the protocol, leading to either the reward being exhausted or the service being fully utilized.

Or to take another example: Decentralized networks are often reliant on cryptographic or computational puzzles that are very hard, but not impossible, to solve. The difficulty of these puzzles is generally dependent on an exogenous variable — like how fast a computer can compute a hash function or a zero knowledge proof. Imagine a protocol that makes an assumption about how fast it’s possible to compute a given hash function and pays out token rewards accordingly. If someone invents a new way to compute that hash function more quickly, or simply has outsized resources to throw at the problem disproportionate to their actual work in the system, they can earn an unexpectedly large token reward.

#7 Re-validates assumptions

Designing a token should be approached like designing an adversarial system. Assume Byzantine behavior. Users’ behavior will change with changes to how the token works.

A common mistake is to adjust one’s token model without ensuring that arbitrary user behavior still results in an acceptable outcome. Do not assume user behavior will remain constant for variations in your token model. Usually this mistake happens late in the design process: Someone has spent a lot of time defining the objectives for the token, defining its function, and validating to ensure it works as intended. They then identify an edge case and shift the token design to accommodate for it… but forget to re-validate the overall token model. By fixing one edge case, they created another (or several other) unintended consequences.

Don’t let the hard work be for naught: Anytime a project changes its token model, re-validate that it works as intended.

*   *   *

If you find a better way to design tokens or like to think creatively and solve problems, I’d like to talk with you.


A big thank you to Scott Kominers, Tim Roughgarden, Sam Ragsdale, Ali Yayha, Eddy Lazzarin, Elena Burger, Carra, Wu, Michael Blau, and especially to Sonal Chokshi and Stephanie Zinn for their feedback on this piece.

Editors: Sonal Chokshi & Steph Zinn

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the current or enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

 

 

 

 

 

 

 

 

 

Building Magi: A new rollup client for Optimism

https://a16zcrypto.com/content/article/building-magi-a-new-rollup-client-for-optimism/

Decentralization is crypto’s defining promise: Networks that have no single point of failure are more secure and more resilient than their centralized counterparts. In practice, we often see emerging points of centralization as potentially vulnerable – their failures can compromise an entire network. Ethereum’s switch to Proof of Stake, for example, drew attention to the potential risk of relying on a single client at any layer. And fortunately, the openness of the ecosystem, along with a push for greater client diversity, has resulted in more clients and an increasingly more secure network. 

This effort isn’t limited to Ethereum L1. Just as client diversity is important for Ethereum; it’s critically important for rollups. Multiple independent client implementations can help ensure the safety and liveliness of the network. They’re also a whole lot of fun to work on (thanks to some very detailed specifications). 

That’s why we’re excited to take our very first step into the Optimism Collective by releasing Magi, a blazing fast OP Stack rollup client written in Rust. Magi acts as the consensus client (often called a rollup client in the context of the OP Stack) in the traditional execution/consensus split of Ethereum – it feeds new blocks to the execution client in order to advance the chain. Magi performs the same core functionality as the reference implementation (op-node) and works alongside an execution node (such as op-geth) to sync to any OP Stack chain, including Optimism and Base.

In this post, we share why we built Magi, and what’s in store for the future. Magi is still in its very early stages, so we welcome your ideas, feedback, and contributions, as we shape it into a production-ready client.

Bringing diversity to rollup clients

Client diversity is needed on both the execution and consensus sides, but most of the development has so far been focused on execution clients.

Any existing Ethereum execution client can be compatible with Optimism by implementing a modest set of changes, and several new projects are already adapting these clients to help achieve client diversity. OP Labs modified Geth to build op-geth; and other groups are currently building op-erigon and op-reth.

This is tougher on the rollup client side, since the rollup client is a brand new piece of software. So far, just one implementation exists: op-node, which is maintained by OP Labs and written in Go. Magi aims to be an independently developed, drop-in replacement for op-node, adding to the rollup’s client diversity. We hope that building out this new, Rust-based client will encourage greater safety and liveliness throughout the OP Stack, and bring more contributors into the ecosystem.

What’s next: Welcoming contributors for new features, fixes, and more

Magi is very new, and likely months of development away from being a feasible alternative to op-node. Some of the features and improvements we plan to add in the near future:

  • Tracking the unsafe head (unconfirmed blocks) to lower latency.
  • New sync mechanisms to improve initial sync speed.
  • Alternative data availability layer support.
  • Better frameworks for testing Magi, op-node, and any future clients.

There’s a long way to go, but we’re committed to getting there. We would greatly appreciate contributors, so check out the client or reach out with your ideas, feedback, and code – we’re excited to work together to keep moving Magi and the OP Stack ecosystem forward.

Acknowledgments

None of this would be possible without the hard work of the OP Labs team: I want to thank Refcell, who’s been contributing to Magi since nearly the beginning, as well as Protolambda, Joshua, and Vex who have provided valuable help in understanding some of the complexities of the Bedrock specification.