Inside Lighting Cat: A Generative AI Framework for Smart Contract Audits

One of the most comprehensive results about applying generative AI to smart contract security.

Created Using DALL-E

The rapid race of generative AI has sparked the imagination of web3 developers about its many applications to the blockchain space. Smart contract audits via AI agents have been among the most prominent use cases that are constantly mentioned at the intersection of AI and web3. Recently, a group of AI researchers published a paper in Nature detailing Lighting Cat, a framework to use generative AI models for smart contract audits.

Traditional threat detection methods for smart contracts include manual reviews, static analysis, fuzz testing, and formal verification. Tools such as Oyente, Mythril, Securify, Slither, and Smartcheck are widely used for this purpose. They scan contract code for common security flaws like reentrancy issues, authorization errors using tx.origin, dependencies on timestamps, and unhandled exceptions. Yet, these tools are not foolproof, often generating false positives or missing vulnerabilities due to their reliance on preset rules and a limited understanding of complex code.

Lightning Cat relies on generative AI to enhance smart contract vulnerability detection. It incorporates three advanced deep learning models: an optimized version of CodeBERT, an Optimized-LSTM, and an Optimized-CNN. These models are specifically trained to identify vulnerabilities within smart contracts. The process involves analyzing code snippets containing vulnerabilities to pinpoint critical features.

The CodeBERT model excels in its ability to understand the nuances of programming languages. It bridges the gap between natural language and programming syntax, showing significant promise in detecting software vulnerabilities. In contrast to other models like Word2Vec, FastText, and GloVe, CodeBERT has shown higher accuracy rates in this field. For smart contracts written in Solidity, an optimized version of CodeBERT is employed in this research. Alongside, CNN and LSTM models, known for their proficiency in processing text and image data and their capability to handle long text sequences, are used for comparison. Previous studies have demonstrated their effectiveness in identifying code vulnerabilities.

Lightning Cat’s development process comprises three key stages. The initial phase involves compiling and preparing a dataset of vulnerable Solidity code. The next stage is dedicated to training the three models and comparing their effectiveness. The final stage tests the chosen model against the Sodifi-benchmark dataset to evaluate its ability to accurately detect vulnerabilities in smart contracts.

Image Credit:

The Data Collection Process

The data used in this study is derived from three primary sources, combining to form a comprehensive training set. This dataset includes 10,000 contracts from the Slither Audited Smart Contracts Dataset, 20,000 from smartbugs-wild, and an additional 1,000 contracts known for their vulnerabilities as identified by expert audits. In total, the dataset encompasses 31,000 smart contracts.

In processing this data, the research focuses on the SolidiFI-benchmark test set, which includes three static detection tools: Slither, Mythril, and Smatcheck. This set also covers four prevalent types of vulnerabilities found in smart contracts: Re-entrancy, Timestamp-Dependency, Unhandled-Exception, and tx.origin.

One challenge in handling this data is the variable length of smart contracts, which is often influenced by their complexity and functionality. Some of the more complex contracts can span several thousand tokens.

To manage this variability, the dataset is segmented into smaller parts. The approach involves dividing the data into blocks of 510 tokens each. These segments are then uniformly labeled. For instance, if a section of code demonstrating a Re-entrancy vulnerability is 2000 tokens long, it would be split into four parts, with each part comprising 510 tokens.

Image Credit:

The Models

The current phase of this research involves the application of three distinct machine learning models: Optimized-CodeBERT, Optimized-LSTM, and Optimized-CNN. The CodeBERT model has been specifically adjusted to better suit the task of detecting vulnerabilities in smart contracts. It takes preprocessed input IDs and attention masks as its input. On the other hand, the Optimized-LSTM and Optimized-CNN models do not use the CodeBERT model for data preprocessing.

The first model, Optimized-CodeBERT, leverages the Transformer model to learn representations in code-related tasks. This study focuses on adapting CodeBERT for smart contract vulnerability detection. Based on the Transformer architecture, which includes multiple encoder layers, the model processes input data through an embedding stage before it reaches these encoders. After encoding, fully connected layers are added for classification.

Image Credit:

The second model, Optimized-LSTM, is adept at handling sequential data, recognizing temporal dependencies and syntactic-semantic information. For detecting vulnerabilities in smart contracts, this model serializes Solidity source code, taking into account the order of statements and function calls. It understands the code’s syntax, semantics, and dependencies, providing insight into its logical structure and flow. The Optimized-LSTM model, with its gated cell mechanism, effectively addresses the challenges of vanishing or exploding gradients in long sequences, a common issue in traditional RNNs.

Finally, the third model, Optimized-CNN, is a convolutional neural network well-suited for processing two-dimensional data. In this case, the code token sequence is transformed into a matrix format. The CNN efficiently extracts local features and captures the spatial structure of the code, including syntax, relationships between code blocks, and key patterns.

The Results

The provided figure offers a comparative analysis of the recall results from different classification models. These results measure each model’s ability to correctly identify true positive samples. The comparison includes six methods: Mythril, Smartcheck, Slither, Optimized-CodeBERT, Optimized-LSTM, and Optimized-CNN. Among these, the Optimized-CodeBERT model stands out with the highest recall rate of 93.55%, which is 11.85% higher than that of Slither. This superior recall rate underlines the Optimized-CodeBERT model’s effectiveness and reliability in accurately detecting true positive samples.

In contrast, the Optimized-LSTM and Optimized-CNN models show lower recall rates, at 64.06% and 71.36% respectively. This indicates that they might face challenges or have limitations in consistently recognizing true positive samples.

Image Credit:

Significantly, the Optimized-CodeBERT model also excels over traditional static detection tools. It achieves an impressive f1-score of 93.53%, demonstrating its strong ability to understand both the syntax and semantics of the code. This performance solidifies its position as an effective tool for auditing blockchain code.

Inside Lighting Cat: A Generative AI Framework for Smart Contract Audits was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

Meet ITB Perspectives: Our Most Ambitious Crypto Analytics Release

Highly curated interactive research dashboards that provide insights about the latest trends in the crypto market.

Image Credit: IntoTheBlock

Today, IntoTheBlock (ITB) is announcing the general availability of Perspectives, arguably the most important release of our analytics suite. ITB has been one of the pioneers in the digital assets analytics space, powering hundreds of organizations in the crypto market to help both retail and institutional investors become more informed. However, despite our progress, we’ve regularly struggled with the fundamental mismatch between the rapid pace of the crypto market and the challenges of providing rapid, actionable, easy-to-understand, data-centric perspectives on current trends.

Let me explain. Suppose you are trying to understand a specific trend in crypto, such as memecoins or Ethereum L2 activity. Typically, you’ll pull data from multiple sources, consult various blogs, or, if you’re lucky, use dashboards created by third parties, which are difficult to evaluate in terms of their quality and maintenance over time. We experience this friction within our organization. When discussing a market thesis in our research calls about a specific trend, it can take us days to formulate a data-centric perspective on the argument. This contrasts with our institutional DeFi division, where a trading thesis needs to be validated in a matter of hours to capitalize on the market opportunity.

Essentially, in the current crypto analytics landscape, there’s a fundamental challenge in finding high-quality, actionable analytics that reflect current market trends and are trustworthy.

The Five Pillars of Effective Crypto Analytics

What differentiates great crypto analytics from mediocre ones? When it comes to crypto, the answer to this question is far from straightforward. From my experience, there are some fundamental characteristics of highly effective crypto analytics:

  1. Current: Effective crypto analytics should mirror present market trends.
  2. Deep: While simplicity is commendable, the reality is that most crucial insights in crypto markets aren’t immediately apparent and demand detailed analysis.
  3. Thematic: The most effective analyses in crypto markets are thesis-driven rather than asset-driven. This is because most significant movements in crypto markets are propelled by trends (e.g., DeFi, NFTs, ZKs) rather than individual assets.
  4. Path to Actionability: The best analytics should pave the way for an investment decision. This doesn’t necessarily mean they have to serve as a trading signal, but they should offer insights that can be converted into actionable market trends.
  5. Trusted Sources: In my experience, the most effective crypto analytics originate from research teams that can robustly support their thesis, version it, and maintain its relevancy over time.

ITB Perspectives is our initial step toward reinforcing these pillars by offering a meticulously curated catalog of dashboards that shed light on current trends in the crypto market.

ITB Perspectives

ITB Perspectives is a suite of analytic dashboards that offer insights into the latest trends in the crypto market. The name originates from the solution’s ability to allow the ITB research team to formulate a thesis, source the data, articulate their analytical perspective, and transition it to production in an entirely self-service manner. This process enables us to move from research to production in just hours. As a result, every few days you will discover new perspectives offering a sophisticated, research-first, actionable view of the prevailing trends in the crypto market.

The inaugural catalog of Perspectives is truly noteworthy, encompassing analyses of trends such as L2s, stablecoins, memecoins, and many more.

Image Credit: IntoTheBlock

Each perspective is presented as a dashboard that consists of a textual description accompanied by a series of charts that shed light on that particular trend. Consider, for instance, the following dashboard centered on stablecoin activity:

Image Credit: IntoTheBlock

Why You Should Use ITB Perspectives?

Before developing Perspectives, we engaged in extensive discussions about whether its value proposition was sufficiently distinct in a market teeming with various analytic solutions, including those from ITB. Perspectives emerged from our own challenges in swiftly formulating actionable theses about diverse market trends. In my view, there are several distinguishing factors that set Perspectives apart:

  • Depth: The insights within the Perspectives catalog are directly derived from the ITB research team, and soon, from other research entities as well.
  • Current: Perspectives dashboards accurately represent the ongoing trends in the crypto market.
  • Growing Catalog: Every few days, new dashboards are introduced to the catalog.
  • Trusted: The ITB research team consistently reviews and refines the various dashboards.
  • Simple to Understand: It’s not a paradox — while our goal is in-depth analysis, we strive to ensure that the visualizations are straightforward and accompanied by clarifications, enabling investors to grasp the various perspectives.

Perspectives ranks as one of our most audacious analytic launches to date. It represents our inaugural effort to bridge the gap between research insights and actionable analytics for investors of all magnitudes. We invite you to explore it and welcome your feedback.

Meet ITB Perspectives: Our Most Ambitious Crypto Analytics Release was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why Celestia is a Blockchain You Should Know About: Part I

One of the most innovative blockchain runtimes ever created.

Created Using Midjourney

The space of blockchain runtimes hasn’t stopped innovating during the bear market. We have seen relatively success with trends such as L2s that improve the scalability of the Ethereum ecosystem, zk-blockchains that focus on privacy and scalability as a core building blocks or even completely new ecosystems such as the Move-blockchains. Seeing this trend, there are two key questions that jump off the page:

1) How many blockchains are going to be relevant to enable the mainstream adoption of Web3 architectures?

2) What is the next fundamental architecture improvement in blockchain runtimes?

The two points are somewhat related as we need to keep evolving the core architecture of blockchains to unlock new use cases. From the newcomer projects I have seen in the market, Celestia has been one of the most fascinating in terms of the boldness of the vision and the quality of the execution. In principle, Celestia tries to change the core architecture of blockchains by decoupling the consensus and execution layers and make it more modular.

When comes to blockchain runtimes, the pioneering monolithic approach was the initial blueprint for crafting these digital ledgers. It championed the idea that a blockchain should serve as a Swiss Army knife, capable of executing myriad tasks. This all-encompassing approach includes transaction processing, verification of correctness, and the , obviously, consensus. However, this monolithic architecture ushered in its own set of challenges, most notably concerning scalability while maintaining the sacred tenet of decentralization.

The Modular Blockchain Approach

Contrasting with the traditional blockchain architectures, modular blockchains represent a paradigm shift away from the monolithic norm. Instead of expecting a single blockchain to bear the weight of all responsibilities, modular blockchains are designed to specialize in specific functions. Notably, these forward-thinking modular systems introduce the concept of disentangling consensus from transaction execution. In practical terms, one blockchain focuses on executing transactions, while another takes on the mantle of consensus.

Monolithic blockchains grapple with a host of issues stemming from their catch-all nature:

1. Demanding Hardware: Monolithic chains can indeed scale up their transaction throughput, but at a considerable hardware cost. This elevated demand for processing power imposes a significant burden on network nodes.

2. Validator Bootstrapping: Introducing a new monolithic blockchain necessitates the cumbersome process of bootstrapping a secure set of validators, adding to the complexity of maintaining a reliable consensus network.

3. Restricted Autonomy: Applications deployed on monolithic chains must adhere to the predefined rules governing the chain itself. These rules extend to programming models, forking capabilities, and adherence to the prevailing community culture, among other constraints.

Enter Celestia

Enter modular blockchains, offering a solution by disentangling these functions across a multi-layered modular framework. This separation of concerns within the stack unleashes newfound flexibility, allowing for various configurations. For instance, one plausible arrangement bifurcates the four functions into three distinct layers.

The foundation layer, encompassing Data Availability (DA) and consensus, aptly earns its title as the “Consensus and DA Layer” (or simply, the DA layer). Meanwhile, transaction settlement and execution each find their own dedicated layers higher up in the hierarchy. This approach empowers each layer to specialize in performing its core function optimally, thereby augmenting the system’s overall throughput. Furthermore, this modular model facilitates the integration of multiple execution layers, such as rollups, which can leverage the same settlement and DA layers.

Celestia’s Data Availability Layer

Celestia shines as a pioneering data availability (DA) layer, providing a scalable antidote to the data availability conundrum. Given the permissionless nature of blockchain networks, Celestia’s DA layer must furnish a mechanism for the execution and settlement layers to verify transaction data availability in a trust-minimized fashion.

Two pivotal features underpin Celestia’s DA layer: Data Availability Sampling (DAS) and Namespaced Merkle Trees (NMTs). DAS empowers lightweight nodes to verify data availability without the onerous task of downloading entire blocks, while NMTs enable the execution and settlement layers of Celestia to selectively access transactions that pertain solely to their operations.

We will continue diving into Celestia’s components in future posts.

Why Celestia is a Blockchain You Should Know About: Part I was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

AI Should Be Decentralized, But How?

The intersection of Web3 and artificial intelligence (AI), specifically in the form of generative AI, has become one of the hottest topics of debate within the crypto community. After all, generative AI is revolutionizing all areas of traditional software stacks, and Web3 is no exception. Given that decentralization is the core value proposition of Web3, many of the emergent Web3-generative-AI projects and scenarios project some form of decentralized generative AI value proposition.

Jesus Rodriguez is the CEO of IntoTheBlock.

In Web3, we have a long history of looking at every domain through a decentralization lens, but the reality is that not all domains can benefit from decentralization, and for every domain, there is a spectrum of decentralization scenarios. Breaking down that idea from a first principles standpoint leads us to two key questions:

  1. Does generative AI deserve to be decentralized?

  2. Why hasn’t decentralized AI worked at scale before, and what’s different with generative AI?

  3. What are the different dimensions of decentralization in generative AI?

These questions are far from trivial, and each one can spark passionate debates. However, I believe that thinking through these questions is essential to develop a comprehensive thesis about the opportunities and challenges at the intersection of Web3 and generative AI.

Does AI Deserve to be Decentralized?

The philosophical case for decentralizing AI is simple. AI is digital knowledge, and knowledge might be the number one construct of the digital world that deserves to be decentralized. Throughout the history of Web3, we have made many attempts to decentralize things that work extremely well in a centralized architecture, and where decentralization didn’t provide obvious benefits. Knowledge is not one of the natural candidates for decentralization from both the technical and economic standpoint.

The level of control being accumulated by the big AI providers is creating a massive gap with the rest of the competition to the point that it is becoming scary. AI does not evolve linearly or even exponentially; it follows a multi-exponential curve.

GPT-4 represents a massive improvement over GPT 3.5 across many dimensions, and that trajectory is likely to continue. At some point, it becomes unfeasible to try to compete with centralized AI providers. A well-designed decentralized network model could enable an ecosystem in which different parties collaborate to improve the quality of models, which enables democratic access to knowledge and sharing of the benefits.

Transparency is the second factor that can be considered when evaluating the merits of decentralization in AI. Foundation model architectures involve millions of interconnected neurons across several layers, making it impractical to understand using traditional monitoring practices. Nobody really understands what happens inside GPT-4, and OpenAI has no incentives to be more transparent in that area. Decentralized AI networks could enable open testing benchmarks and guardrails that provide visibility into the functioning of foundation models without requiring trust in a specific provider.

Why Hasn’t Decentralized AI Worked Until Now?

If the case for decentralized AI is so clear, then why haven’t we seen any successful attempts in this area? After all, decentralized AI is not a new idea, and many of its principles date back to the early 1990s. Without getting into technicalities, the main reason for the lack of success of decentralized AI approaches is that the value proposition was questionable at best.

Before large foundation models came into the scene, the dominant architecture paradigm was different forms of supervised learning that required highly curated and labeled datasets, which resided mostly within corporate boundaries. Additionally, the models were small enough to be easily interpretable using mainstream tools. Finally, the case for control was also very weak, as no models were strong enough to cause any level of concern.

In a somewhat paradoxical twist, the prominence of large-scale generative AI and foundation models in a centralized manner helped make the case for decentralized AI viable for the first time in history.

Now that we understand that AI deserves to be decentralized and that this time is somewhat different from previous attempts, we can start thinking about which specific elements require decentralization.

The Dimensions of Decentralization in AI

When it comes to generative AI, there is no single approach to decentralization. Instead, decentralization should be considered in the context of the different phases of the lifecycle of foundation models. Here are three main stages in the operational lifespan of foundation models that are relevant to decentralization:

  1. Pre-training is the stage in which a model is trained on large volumes of unlabeled and labeled datasets.

  2. Fine-tuning, which is typically optional, is the phase in which a model is “retrained” on domain-specific datasets to optimize its performance on different tasks.

  3. Inference is the stage in which a model outputs predictions based on specific inputs.

Throughout these three phases, there are different dimensions that are good candidates for decentralization.

The Compute Decentralization Dimension

Decentralized computing can be incredibly relevant during pre-training and finetuning and may be less relevant during inference. Foundation models notoriously require large cycles of GPU compute, which are typically executed in centralized data centers. The notion of a decentralized GPU compute network in which different parties can supply compute for the pre-training and finetuning of models could help remove the control that large cloud providers have over the creation of foundation models.

The Data Decentralization Dimension

Data decentralization could play an incredibly important role during the pre-training and fine-tuning phases. Currently, there is very little transparency around the concrete composition of datasets used to pretrain and finetune foundation models. A decentralized data network could incentivize different parties to supply datasets with appropriate disclosures and track their usage in pretraining and fine-tuning foundation models.

The Optimization Decentralization Dimension

Many phases during the lifecycle of foundation models require validations, often in the form of human intervention. Notably, techniques such as reinforcement learning with human feedback (RLHF) enable the transition from GPT-3 to ChatGPT by having humans validate the outputs of the model to provide better alignment with human interests. This level of validation is particularly relevant during the fine-tuning phases, and currently, there is very little transparency around it. A decentralized network of human and AI validators that perform specific tasks, whose results are immediately traceable, could be a significant improvement in this area.

The Evaluation Decentralization Dimension

If I were to ask you to select the best language model for a specific task, you would have to guess the answer. AI benchmarks are fundamentally broken, there is very little transparency around them, and they require quite a bit of trust in the parties who created them. Decentralizing the evaluation of foundation models for different tasks is an incredibly important task to increase transparency in the space. This dimension is particularly relevant during the inference phase.

The Model Execution Decentralization Dimension

Finally, the most obvious area of decentralization. Using foundation models today requires trust in infrastructures controlled by a centralized party. Providing a network in which inference workloads can be distributed across different parties is quite an interesting challenge that can bring a tremendous amount of value to the adoption of foundation models.

Foundation models propelled AI to mainstream adoption and also accelerated all the challenges that come with the rapidly increasing capabilities of these models. Among these challenges, the case for decentralization has never been stronger.

Digital knowledge deserves to be decentralized across all its dimensions: data, compute, validation, optimization, execution. No centralized entity deserves to have that much power over the future of intelligence. The case for decentralized AI is clear, but the technical challenges are tremendous. Decentralizing AI is going to require more than one technical breakthrough, but the goal is certainly achievable. In the era of foundation models, decentralized AI is the right way to approach AI.

Inside Privacy Pools: Vitalik Butterin’s New Proposal to Balance Blockchain Privacy and Regulatory…

Inside Privacy Pools: Vitalik Butterin’s New Proposal to Balance Blockchain Privacy and Regulatory Compliance

The protocol combines some of the ideas of Tornado Cash with ZK-SNARKS to provide a native, non-invasive approach to enable regulatory checkpoints in blockchainprotocols.

Created Using Midjourney

The friction between the anonymity of blockchain protocols, especially DeFi, and regulatory requirements has been one of those endless debates in the digital assets space. There is very little doubt that some level of regulatory checkpoints in the form of KYC/AML is needed to unlock the mainstream adoption of crypto. At the same token, privacy and anonymity are some of the most desired properties that are at the core of the crypto ethos.

One of the most creative angles of the privacy-vs-regulation debate is to explore some novel ways to enable regulatory checkpoints without the need of brute force KYC/AML. A few months ago, I wrote an article about these ideas in CoinDesk. Recently, Ethereum creator Vitalik Butterin collaborated with early Tornado Cash contributor Ameen Soleimani, Chainalysis’ Jacob Illum and two researchers from the University of Basel on a paper that outlines a new interesting idea that balances privacy and regulatory compliance in blockchain protocols. Under the catchy name of privacy pool, the approach borrows a few ideas from Tornado Cash and combines them with ZK-SNARK into a very intuitive protocol.

Let’s explore.

Inside Privacy Pools

At its essence, Privacy Pools operate based on the principle of enabling a user to demonstrate membership in a predetermined association set rather than simply presenting zero-knowledge evidence of a link between a withdrawal and a prior deposit.

The conceptual framework of these pools encompasses various forms of association sets: from the entirety of previous deposits to an isolated, user-specific deposit, or any intermediary range. Crucially, the chosen set is indicated through the public input of a Merkle root.

In the interest of clarity, it’s important to note that a direct proof verifying that the association set is a genuine subset of preceding deposits isn’t provided. Instead, two zero-knowledge proofs of Merkle branches are necessitated, both of which employ the identical coin ID as a leaf:

I. A Merkle branch tied to R, signifying the complete set of coin IDs.

II. A Merkle branch connected to the provided association set root, RA.

The underlying intention mandates the entire association set to be accessible, potentially on-chain or through an alternative medium. This foundational principle enables users to offer a potential source range for their funds without stipulating a precise deposit or, at the opposite end, without presenting any information except a non-double-spending proof.

Image Source:

Methods of Association Set Formulation

Conceptually, there exist two primary techniques to generate these sets:

I. Inclusion (or membership): This involves identifying a distinct collection of deposits perceived as low-risk based on concrete evidence and subsequently crafting an association set limited to these deposits.

II. Exclusion: Here, specific deposits deemed high-risk based on tangible evidence are identified. The resultant association set encompasses all deposits excluding the high-risk ones.

Image Source:

A Practical Illustration

To further elucidate, consider a scenario involving five participants: Alice, Bob, Carl, David, and Eve. The quartet, comprising Alice, Bob, Carl, and David, are recognized as genuine users, valuing privacy, while Eve is known for her illicit activities. Though her real identity remains a mystery, there’s substantial evidence linking “Eve” to stolen assets, a situation reminiscent of illicit funds identified in Tornado Cash, which frequently originate from public blockchain-visible DeFi protocol exploits.

Upon withdrawing, every user possesses the autonomy to designate their association set. It’s mandatory to encompass their individual deposit, but the inclusion of other addresses remains discretionary. Considering the motivations of Alice, Bob, Carl, and David, the dual objectives are clear: maximizing privacy while ensuring their assets aren’t deemed dubious. Their optimal strategy becomes apparent: exclude Eve from their association set, resulting in an association set comprising {Alice, Bob, Carl, David}.

Eve, motivated by a similar desire to augment her association set, finds herself in a quandary. Excluding her deposit isn’t an option, thereby compelling her to adopt an association set that includes all five deposits.

Image Source:

Privacy Pools is one of the most interesting approaches to balance privacy and regulatory compliance in blockchain protocols. The idea of association sets is non-invasive and can be easily added into most blockchain protocols such as DeFi. Compared to brute force KYC/AML, privacy pools are a native on-chain mechanism that can start bridging the gap between regulation and the crypto world.

Inside Privacy Pools: Vitalik Butterin’s New Proposal to Balance Blockchain Privacy and Regulatory… was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

DeFi Risk in Different Levels

A taxonomy to evaluate risk in DeFi protocols.

Created Using Midjourney

Understanding risk in DeFi requires analysis across different dimensions. Obviously, people associate risk with technical exploits given that those are very binary events. However, technical vulnerabilities are far from providing a complete picture of the risks faced when interacting with a DeFi protocol.

Risk in DeFi should be seen across two main dimensions:

I. Protocol Lifecycle: From development time to production vulnerabilities.

II. Functional: From technical to governance to the economic behavior of the protocol.

For each one of those areas, there are different solutions that can help mitigate risk in DeFi protocols.

Breaking down the risk dimensions into more granular levels give us a perspective of the different risk vectors across the lifecycle of DeFi protocols as well as mitigation strategies.

The previous taaxonomy is far from being exhaustive but provides a solid baseline to understand the different risk dimensions in DeFi protocols.

DeFi Risk in Different Levels was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Emergence of DeFi Micro-Primitives: Slidedeck and Video

The materials explore one of the most interesting trends in DeFi.

Yesterday, I had the opportunity of presenting a webinar about one of my new favorite topics: DeFi Micro-Primitives. By micro-primitives we refer to this consistent trend in the DeFi space in which protocols are becoming more granular, more modular and more fragmented. While micro-primitives is not an official trend, its happening all over the DeF space. The webinar discusses unique characteristics of micro-primitives, opportunities, challenges and practice examples of marquee protocols that are embracing this trend.

The slide deck and recording can be found below:

The Emergence of DeFi Micro-Primitives: Slidedeck and Video was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

A Deep Dive Into Flashbot’s SUAVE Centauri, the First Implementation of the New Vision for MEV

The release is the first step towards Flashbots’ SUAVE vision.

Created Using Midjourney

A few months ago, I wrote about Flashbot’s Single Unifying Auctions for Value Expression(SUAVE), a new architecture that addresses some of the important challenges in the maximal extractable value(MEV) space. Conceptually, SUAVE introduces a new architecture based on a plug-and-play mempool and decentralized block builder. Steadily, Flashbots has been getting ready for Centauri, the first major release towards the SUAVE vision. Recently, they unveiled some details about the Centauri release which I would like to explore in more details.

To enable the Centauri release, Flashbots focus on two key components:

1. The MEVM chain.

2. The Centauri release itself.

To understand Flashbots’ SUAVE, it might be good to approach it from the use case angle. SUAVE has set its focus on a range of utilization scenarios tied to the MEV supply chain, encompassing:

· Deployments necessitating confidential data access, such as auctions and block construction.

· Situations mandating synchronized operations within block intervals, as seen in block construction, trade pathing, and execution.

· Applications reliant on real-time off-chain data, like trading strategies predicated on centralized exchange valuations or transactions contingent on prior transactions.

· Operations deemed impractical to conduct on-chain due to resource-intensive computations, for instance, block construction.

SUAVE constitutes a decentralized framework meticulously crafted to facilitate the development of these applications, each tailored to their requisite attributes: low latency, confidentiality, verifiable computation, and composability. This framework offers the developers within the existing MEV supply chain an avenue to restructure their applications as intelligent contracts on SUAVE, enabling them to harness its decentralized nature and verifiable computation capacities. Central to this framework is the MEVM, which presents a straightforward, adaptable, and familiar interface for developers — the EVM, along with its accompanying toolset (solidity, foundry, etc.).


The MEVM, a potent adaptation of the EVM with newly integrated precompiles catering to MEV use cases, is an innovation that merits attention. Through the MEVM, developers are empowered to encode MEV applications as intelligent contracts, all within a flexible, expressive programming milieu, akin to the conventional EVM. The MEVM’s aspiration lies in rendering every component of the MEV supply chain as a precompile, effectively transforming any existing centralized MEV infrastructure into a smart contract on a decentralized blockchain. A cursory overview of the MEVM and its alignment with SUAVE’s objectives will precede an illustrative code example, accompanied by the reference architecture of SUAVE.

The MEVM seamlessly advances several of Flashbots’ overarching objectives. By substantially lowering the hurdles to engender novel MEV applications, it fosters heightened competition across diverse mechanisms. This, in turn, envisions a proliferation of varied block construction paradigms (e.g., PROF) and orderflow auction strategies (e.g., MEVBlocker) on SUAVE. This culture of unbridled innovation holds the promise of enhancing end-user outcomes and refining block proposals. The MEVM’s comprehensive expressiveness empowers the utilization of every facet within the MEV supply chain within smart contracts. Ultimately, it plays a pivotal role in decentralizing the MEV supply chain, paving the way for centralized infrastructure entities (builders, relays, centralized RFQ routing, etc.) to be encoded as smart contracts on a decentralized blockchain.


Centauri can be seen as the first release of the SUAVE Devnet , scheduled fro the forthcoming quarter of this year (2023), introducing the initial rendition of the MEVM. The establishment of this developmental network serves the purpose of accommodating community-driven experimentation and rigorous stress testing, preluding the imminent mainnet launch as an integral facet of the ensuing SUAVE Andromeda release.

In a macroscopic perspective, the preliminary Centauri release is underpinned by the subsequent architectural framework:

Image Credit: Flashbots

The following components are part of the architecture:

· Execution Node: This node functions as a purveyor of credible and confidential off-chain computation, extendable to smart contracts within SUAVE via specialized precompiles. Initially, the responsibility of running an execution node will be assumed by Flashbots or an alternate third-party entity. In due course, the landscape envisions the incorporation of trusted execution environments and cryptographic mechanisms, thereby dispensing with the reliance on centralized entities.

· Confidential Data Repository: This designated repository serves as a repository for safeguarding confidential data, earmarked for utilization within execution nodes.

· Bids: Representing a novel transaction type within SUAVE, bids encompass confidential data that users seek to be executed, along with an index of contracts granted permission to interact with the said confidential data.

· SUAVE Chain: An adapted EVM chain, thoughtfully tailored with MEV-specific precompiles, catering to concurrent credible off-chain execution enabled by execution nodes. This chain assumes the role of a conduit for smart contract deployments and cooperative interactions among stakeholders. In the days ahead, the chain’s scope will extend to encompass fund escrow for payment preferences, oracle updates, and data availability.

From the role perspective, Centauri includes the following participants:

1. Developers: These architects of innovation contribute to the creation of smart contracts atop the SUAVE Chain, delineating the regulations underpinning MEV applications like Order Flow Auctions (OFAs) and block construction. This endeavor empowers developers with a streamlined and efficient avenue for crafting applications, coupled with access to user order flow that would otherwise remain inaccessible.

2. Users: Participants within this category engage in the submission of bids to SUAVE. Each bid encapsulates confidential data alongside an array of contracts, affording the user authority to interact with their respective bids. Confidential data finds its abode within an off-chain confidential data repository. Users stand to benefit from SUAVE’s market of competitive mechanisms, vying to deliver optimal execution. Moreover, this ecosystem allows users to engage with MEV applications sans the need to navigate RPC switches as is customary.

3. Executors: Tasked with the execution of bids, Executors operate via intelligent contracts that prescribe the permissible interactions with users’ confidential data. This engagement can involve employing a backrun contract to facilitate arbitrage transactions aligned with users’ preferences. Executors reap dividends through bid execution, alongside the strategic leverage of order flow access that might otherwise remain elusive.

4. Proposers: This contingent monitors SUAVE for fully-fledged blocks, deriving value from their access to invaluable blocks within the system.

The Centauri release will be the first step towards the SUAVE release. This release is definitely going to push the frontiers of the MEV space.

A Deep Dive Into Flashbot’s SUAVE Centauri, the First Implementation of the New Vision for MEV was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Emergence of DeFi Micro-Primitives: How DeFi is Going Smaller to Get Bigger

An interesting trend is shaping up in the DeFi space.

Created Using Midjourney

In a recent article in CoinDesk , I touched upon one of the most fascinating trends in the DeFi space which, for the lack of a better term, I chose to call it DeFi micro-primitives. Today, I would like to expand on this idea and the potential implications for the space.

In recent months, we have witnessed several attempts within the DeFi ecosystem that signal an important architecture trend of decomposing existing protocols into smaller, more granular and extensible functionalities. Take examples like UniSwap v2 with its Hooks feature, Eigen Layer with restaking primitives as well as some headlines about the new versions of protocols like Euler. All these examples, signal an unspoken trend to partition protocols into smaller primitives that can be used to build new forms of functionalities.

The fascinating thing about DeFi micro-primitives is that is a systematic phenomenon across the entire space and, yet, but hasn’t been triggered by a common event or coined as an official trend. Its like if all these DeFi protocols sort of agree that going smaller was definitely the way to go. This could result quite puzzling if you think the DeFi space spent the last couple of years building what we defined as core financial primitives in areas such as lending or market making. Shouldn’t the normal step be to build new applications and higher level functionalities on top of those primitives?

While, conceptually, the idea of building higher order functionality on top of core DeFi primitives makes sense, it seems that the current generation of DeFi protocols is too big, complex and monolithic to build complete financial applications. Many protocols, this is particularly visible in the lending space, combine the core financial primitive with application level functionality in the same protocol. For instance, protocols like Aave or Compound, mixed the lending functionality with the parameter tuning of each market at the same architecture level. Similarly, AMMs like UniSwap constrained users to a single way to express trading preferences. Are those really DeFi primitives or complete financial applications?

Expanding that line of thinking take us through the path that DeFi needs to get smaller to get bigger. Functionalities like Uniswap Hooks or smaller lending primitives are required to build a new generation of financial applications.

Micro-Services: An Analogy from the Distributed Programming

For the last few decades, the distributed programming space evolved from the so-called service-oriented-architectures to REST-based APIs. This trend created entire new sets of platforms such as API gateways or enterprise service bus(ESB). However, many organizations discovered that they were building monolithic applications that were quite difficult to extend. As a result, the industry transitioned to a new architecture pattern known as micro-services which consists on smaller, atomic, programmable functionalities that could be combined to build more complex applications. Every major tech company has jumped on the microservices trend and new companies have been created to build platforms in this area.

Iamge Credit:

Even though the DeFi micro-primitives trend is still very nascent, the analogies with micro-services is a very interesting one.

The Challenges: Security and Complexity

The idea of building simpler, more granular architectures via DeFi micro-primitives is quite compelling but it comes with very critical challenges. Among those, security and complexity jump off the page. Partitioning a DeFi protocol into smaller units drastically increases the surface attack and vulnerabilities. From that perspective, the risk vectors of DeFi protocols are only going to get bigger. At the same token, it is possible that the vulnerabilities might be more isolated to specific micro-primitives.

Complexity would be the second factor to consider when thinking about DeFi micro-primitives. DeFi protocols following this trend are definitely going to be an order of magnitude more complex than its predecessors. At the same token, duilding applications that combine different DeFi micro-primitives is going to become more complex.

Likely or not, the DeFi micro-primitives trend is real and is likely to play a pivotal role in the next phase of DeFi. In the next part of this article, we will try to outline some more technical principles about micro-primitives.

The Emergence of DeFi Micro-Primitives: How DeFi is Going Smaller to Get Bigger was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

How DeFi Protocols are Building More Granular and Extensible Capabilities

In the coming months, we can expect more DeFi protocols to divide their functionalities into extensible, programmable micro-primitives. This will lead to a DeFi landscape that is more granular, flexible, and developer-friendly, but also more complex. If DeFi is to become a true parallel alternative to traditional finance, primitives such as lending and AMMs won’t be sufficient to build sophisticated financial services. Smaller, more targeted, and programmable micro-primitives are necessary. As DeFi continues to evolve, the adoption of these smaller, focused building blocks will likely play a significant role in shaping the future of decentralized finance.

A New Blockchain for Generative AI?

Generative artificial intelligence (AI) has quickly become one of the hottest and arguably the most transformational technology trends of the last few decades. The impact of generative AI is evident in all areas of the technology stack, ranging from infrastructure to applications.

Since the release of ChatGPT and the subsequent GPT-4, the Web3 community has been speculating about the potential intersection of generative AI and Web3. While there are many obvious use cases, such as conversational wallets or language exploration, there are more sophisticated theses worth exploring.

Jesus Rodriguez is the CEO of IntoTheBlock.

What if generative AI deserves its own blockchain?

Open-source momentum versus centralized control

To analyze the viability of a blockchain for generative AI, it is important to understand the current state of affairs regarding foundation models, particularly the emergence of open-source alternatives to API-based tech like GPT-4, and the increasing concerns surrounding centralized control of those foundation models.

Until a few months ago, the gap between API-based and open-source foundation models was significant. Models such as OpenAI’s GPT-4, Anthropic’s Claude in the language space, DALL-E, and Midjourney in the computer vision space, seemed significantly advanced compared to open-source alternatives. However, a change started to occur late last year with the surprising open-source release of Stable Diffusion, which provided a viable alternative to API-based text-to-image models. Despite this, large language models (LLMs) continued to be the focal point of generative AI, and in that domain, open-source models paled in comparison to API-based alternatives in terms of quality.

Earlier this year, Meta AI Research published a paper introducing LLaMA, an LLM that matched the performance of GPT-3 while being significantly smaller. Initially, the model was not intended to be open-sourced, but something unexpected happened. A week after its publication, the model was leaked on 4chan and rapidly downloaded by thousands of people. The LLaMA “accident” made a foundation LLM available to anyone and sparked an unexpected momentum in open-source innovation.

Shortly after the leak, new open-source foundation models with amusing animal names started to emerge everywhere. Stanford University released Alpaca, Databricks unveiled Dolly, Berkeley University open-sourced Koala, UC Berkeley and Carnegie Mellon University collaborated on the release of Vicuna, Together announced the Red Pajama project, and the list goes on. Stable Diffusion and LLaMA have helped shift the scales of open-source generative AI and have generated significant momentum. Moreover, open-source foundation models are rapidly closing the gap with commercial incumbents in terms of quality.

Another factor contributing to the emergence of a generative AI blockchain is the concern surrounding the lack of transparency and centralized control of foundation models. The size and complexity of the neural architectures powering foundation models make exact interpretability nearly impossible. As a result, the industry must rely on intermediate steps such as more open architectures and thoughtful regulation. That a few centralized entities control the most powerful models in the market adds another layer of concern regarding the feasibility of achieving real accountability, transparency, and interpretability in generative AI.

The combination of open-source innovation in foundation models and growing concerns about centralized control in the field creates a unique window of opportunity for Web3 architectures. The abundance of high-quality open-source models reduces the barriers to adoption in Web3 platforms. Solving the transparency and control risks in generative AI is far from trivial, but there is little doubt that blockchain architectures possess key properties that can help in this area.

Building a generative AI foundation in Web3

The explosion of innovation in open-source foundation models has significantly lowered the barrier of entry for Web3 platforms to incorporate generative AI capabilities. The adoption of foundation models in Web3 platforms can follow two fundamental, and likely sequential, paths:

  1. Building DApps that enable intelligent capabilities powered by generative AI.

  2. Constructing new Web3 platforms designed with generative AI as a foundational component.

In the first scenario, we are likely to witness tools like exchanges, explorers, or wallets incorporating conversational capabilities powered by large language models. Additionally, a new generation of DApps will be built with generative models as their cornerstone. In this scenario, Web3 primarily acts as a consumer of generative AI capabilities, with models running on traditional Web2 cloud infrastructures.

More intriguing alternatives emerge when considering Web3 platforms that can inherently support generative AI models. Imagine open-source foundation models like LLaMA, Dolly, or Alpaca running on nodes within a distributed blockchain. The ultimate realization of this vision is a blockchain specifically designed for generative AI.

The concept of a new blockchain optimized for a technology paradigm like generative AI may sound appealing, but it is undeniably controversial. After all, there were no new blockchains created solely for DeFi or NFTs. So, what makes generative AI so different?

The answer lies in the architectural mismatch between the requirements to run foundation models and blockchain runtimes. A typical pre-trained foundation model consists of millions of neurons spread across tens of thousands of interconnected layers, executing on clusters of GPUs or specialized deep learning hardware topologies. No smart contract in the history of Web3 even comes close to that level of complexity. Thus, it is logical to conclude that a new type of architecture is needed. Even Web2 infrastructures are evolving to support large-scale generative AI models, illustrating the magnitude of the required changes in Web3 architectures.

When contemplating a new blockchain for generative AI, the possibilities appear endless. But, the simplest iteration of this idea should encompass a set of core capabilities. The ability to run nodes that execute foundation models is paramount for a blockchain dedicated to generative AI. The same applies to the ability to execute pretraining, fine-tuning, and inference workflows, which are the three primary stages in the life cycle of foundation models. Publishing and sharing datasets used for pretraining or fine-tuning models is also a desired feature. Once we establish a blockchain runtime as the foundational layer, numerous capabilities in the areas of transparency and interpretability can be enabled. For instance, we can envision a proof-of-knowledge protocol that offers transparency regarding the specific weights of a model, validating that non-toxic or biased datasets were used for pretraining.

The concept of a specialized blockchain for generative AI is enticing, but is it truly necessary? There is a valid value proposition in integrating generative AI capabilities into existing blockchain runtimes. However, the history of software demonstrates a recurring trend of new architecture paradigms influencing infrastructure technologies. Recent trends like cloud computing or big data serve as examples. Foundation models represent fundamentally different architecture paradigms that likely necessitate more specialized blockchain infrastructures to operate effectively.

Furthermore, we cannot overlook the potential for generative AI to transform the lower layers of the blockchain stack. It is not far-fetched to envision a proof-of-stake blockchain where validators process transactions based on natural language. Similarly, smart contracts could utilize language as the fundamental means of exchanging messages.

Generative AI has the potential to drive changes throughout the entire blockchain stack. From this perspective, it seems logical to adopt a first principles approach by enabling a new runtime with the flexibility to incorporate these changes.

The risk of ignoring generative AI in Web3

The idea of a generative AI blockchain can indeed be controversial and not without its challenges. However, I encourage exploring this idea using a via negative argument.

What could happen if we neglect to build new blockchains for generative AI?

Currently, generative AI has created a significant technological gap between Web2 and Web3 architectures. This gap continues to widen in the absence of native generative AI capabilities in Web3. Generative AI is reshaping fundamental aspects of software development, and new frameworks and platforms are rapidly emerging to support this paradigm shift.

Developing native generative AI capabilities is nothing short of an existential challenge for Web3, as it is crucial to enable new waves of innovation in the field. A native generative AI blockchain represents just one of the many approaches that can facilitate this transition into the world of foundation models. Building a new blockchain comes with numerous challenges, but the rapid evolution of L2 runtimes, platforms like Cosmos, and the emergence of high-performance L1 ecosystems like Aptos or Sui make the possibility of a generative AI blockchain much more achievable than in previous years.

Understanding Polygon’s zkEVM in Four Simple Components

Consensus protocol, zkNode, zkProver and the zkEVM bridge constiture the foundation of the architecture.

Image Credit:

Polygon’s zkEVM is rapidly becoming one of the most interesting zero-knowledge blockchains in the current market. Since its mainnet release a few weeks ago, Polygon’s zkEVM can become one of the more mainstream options for scaling blockchain applications. At first glance, understanding zero-knowledge blockchains might seem intimidating but Polygon’s zkEVM challenges that assumption with a very straightforward architecture.

Make no mistake, Polygon’s zkEVM is an incredibly complex platform. However, the architecture can be summarized to four fundamental components:

· Consensus Contract

· zkNodes

· zkProver

· Bridge

Image Credit: Polygon

1) Consensus Contract

The Consensus Contract model utilizes the Proof of Donation (PoD) consensus mechanism, which enables the participation of multiple coordinators in the production of batches in L2. These batches are created from the rolled-up transactions of L1. The Consensus Contract (PolygonZkEVM.sol) employs a simpler technique, which is preferred due to its greater efficiency in resolving the challenges associated with PoD.

The strategic implementation of the contract-based consensus ensures that the network maintains its Permissionless feature to produce L2 batches, while achieving a high level of efficiency, which is a critical criterion for the overall network performance. The model also achieves an acceptable degree of decentralization and protects the network from malicious attacks, particularly by validators. Additionally, the Consensus Contract model maintains a fair balance between overall validation effort and network value.

2) zkNode

In order to run a zkEVM node, one requires a software called zkNode. The network mandates the use of this client to synchronize and manage the roles of the participants, namely Sequencers or Aggregators. The participants of Polygon’s zkEVM have two options for their involvement:

I. To function as a node to acquire the network’s state.

II. To take part in the process of batch production, with the option to choose either of the two roles — Sequencer or Aggregator.

When operating as a Sequencer, one would receive L2 transactions from users, carry out the necessary preprocessing to create a new L2 batch, and subsequently suggest the batch as a valid L2 transaction to the PoE smart contract. The Sequencer obtains transaction fees from all published transactions, making it financially motivated to post legitimate transactions to maximize profit.

In contrast, an Aggregator receives all the transaction information from a Sequencer and transmits it to the Prover (or zkProver), which then delivers a byte-sized zk-Proof via complex polynomial computations. The Smart Contract authenticates this proof. Thus, an Aggregator gathers the data, forwards it to the Prover, receives its output, and eventually transmits the information to the smart contract for validation of the Prover’s Validity Proof.

Image Credit: Polygon

3) zkProver

The complex mathematical computations, including polynomials and assembly language, are performed by the zkProver. These computations are later verified on a smart contract, and can be perceived as constraints that a transaction must comply with in order to modify the state tree or the exit tree. The zkProver’s workflow comprises of four fundamental steps:

I. The content of Merkle trees is sent by the Node to the Database to be stored there.

II. Subsequently, the input transactions are transmitted by the Node to the zkProver.

III. The zkProver acquires the necessary information from the Database to produce verifiable proofs of the transactions sent by the Node. This information includes the Merkle roots, keys, hashes of relevant siblings, and more.

IV. Finally, the zkProver generates proofs of transactions and transmits these proofs back to the Node.

4) zkEVM Bridge

The zkEVM Bridge serves as a component that facilitates communication and asset migration between the Polygon zkEVM network and other networks, including the N1 (Ethereum Mainnet) or any L2 built on top of Ethereum.

The functionality of the zkEVM Bridge can be depicted in a simple workflow involving two networks, N1 and N2. To bridge an asset between N1 and N2, a user must first lock the asset in the origin network (Layer 1). The Bridge smart contract then generates an equivalent value representative asset in the destination network, N2, which is referred to as a Wrapped Token.

Once the minting process is complete, the user or recipient can claim the asset in the destination network (N2).

Conversely, it is possible to execute the reverse operation. Following the burning of the Wrapped Token, the Bridge smart contract unlocks the original asset in the origin network.

Furthermore, in addition to bridging and claiming assets, the Bridge smart contract can also be employed for cross-chain messaging, permitting data payloads to be sent from one network to another via the Bridge and Claim operations.

Image Credit: Polygon

There are many other components relevant to Polygon’s zkEVM architecture but the aforementioned four represent the core foundation of the runtime. We will deep dive into other aspects of the architecture in future posts.

Understanding Polygon’s zkEVM in Four Simple Components was originally published in IntoTheBlock on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Next ChatGPT Won’t Be in Web3 Unless Some Things Change

The core idea behind the potential negative impact of generative AI in the Web3 space is relatively simple. Generative AI has the potential to change every aspect of how software and content are developed and consumed, from the infrastructure to the application layer. These days we are seeing every major technology and content provider incorporating generative AI into their platforms. If the core of that revolution is taking place outside Web3, it is likely to have an impact on the innovation, talent and funding gap between Web2 and Web3 technologies. Furthermore, if not addressed quickly, this gap is likely to continue expanding at a multi-exponential growth rate. The solutions to this problem are certainly far from trivial, but there are some first-principles ideas that can be explored to start addressing that gap.

Inside OPStack and Bedrock: The Infrastructure Powering Coinbase’s New Base Blockchain

Some details about the architecture powering Coinbase’s new blockchain.

Continue reading on IntoTheBlock »

Monitoring Economic Risk in DeFi: IntoTheBlock Releases an Alpha Preview of DeFi Risk Radar

ITB DeFi Risk Radar enables real time monitoring of economic risk conditions in DeFi protocols. Today is just a preview of what’s coming….

Continue reading on IntoTheBlock »