Lightweight and Flexible Data Access for Algorand

Algorand has released a new tool for blockchain data access: Conduit. Conduit is a modular plugin-based tool and a powerful upgrade to the one-size-fits-all Indexer. Conduit allows dapps to get exactly the data they need in an affordable deployment.

Useful, but bulky: the Indexer

The Indexer is a ready-to-go open-source tool that pulls the data from the blockchain, stores it in a database, and offers an API to serve that data. The existence of the Indexer has been a significant boon for the Algorand ecosystem, allowing anybody to easily read the Algorand blockchain.

However, the Indexer has historically had one major drawback: it is expensive to run. There are two main reasons for this:

  1. Running an Indexer requires also running an archival node that stores every block since the beginning of the blockchain.
  2. The Indexer collects the entire blockchain history (every transaction since block zero) in a Postgres database.

These facts make the Indexer a multi-Terabyte deployment. A typical Indexer requires a number of expensive resources, and these multiply for production deployments needing redundancy, load-balancing, and covering multiple regions.

The scale of the Indexer also makes it slow to initialize, and only capable of serving the specific queries for which it is indexed. As the Algorand blockchain has grown, it has become impractical for smaller projects to maintain their own Indexers.

Consequently, the ecosystem mostly relies on a few API/data providers. These providers run Indexers and charge dapps for their API calls. This is more economical and practical than each group running their own Indexer, but it presents other inflexibilities.

Dapps should have an accessible option to own their own data access infrastructure. This is what Conduit was built for.

Conduit, the Basics

Conduit is a new solution with several major advantages:

  1. Conduit does not require running an archival algod node.
  2. Conduit lets users filter incoming blockchain data, allowing them to collect strictly the data they need for their applications.
  3. Conduit offers a data pruning feature that allows users to automatically delete old transactions when pruning is enabled.
  4. With Conduit, users can build custom data exporters that use the data destination of their choice.
  5. Conduit is designed as an extensible plugin architecture. Any community-contributed plugin can be integrated by anyone.

Conduit allows users to configure their own data pipelines for filtering, aggregation, and storage of transactions and accounts on any Algorand network.

A Conduit pipeline is composed of an importer, optional processor(s), and exporter plugins. Along with the Conduit release, the following noteworthy plugins are made available.

  • Algod importer — fetches blocks from an algod REST API.
  • Filter processor — filters data based on transaction fields.
  • Postgres exporter — writes the data to a Postgres database.
  • File writer exporter — writes the data to a file.

Configuring a Conduit pipeline requires defining which plugins to use, and if necessary, configuring the plugins. For example, the filter processor requires a definition of what to filter.

This is best demonstrated with an example. See a basic walkthrough here.

Conduit’s Filter Processor

The filter processor is a key new feature introduced with Conduit. It allows users to filter the transaction data based on any transaction field — transaction type, app ID, asset ID, sender, receiver, amount, etc. These filters can also be combined.

Since many transactions are submitted as grouped transactions, the filter processor allows users to choose whether or not to include the entire transaction group when the filter conditions are met.

The filter processor will always include inner transactions for transactions that match the specified filter conditions.

Full details on the filter processor are here.

A New Node Configuration for Conduit: Follow Mode

Conduit is used to track data from the blockchain and make it available to the off-chain world. Every time a new block is created on-chain, Conduit is informed about every change to every piece of state since the prior block, such as new accounts created, app states updated, boxes deleted, etc.

Some dapps use an object called the ApplyData to track some kinds of state changes, however this approach is technically limited. Not all changes are reflected in this object, and ApplyData are only cached for 4 rounds on non-archival nodes, meaning that delayed handling of ApplyData updates for more than 15 or so seconds will result in an unrecoverable state error.

The old Indexer architecture solved these challenges by requiring access to an archival algod node. Indexer used a “local ledger” to track the state changes from round to round, and thus avoided the incomplete ApplyData object. The drawback of this design is the need for an expensive archival node.

Conduit instead requires access to a node in a new lightweight “follow mode” configuration which replaces the need for the archival configuration. Conduit can pause and unpause this node’s round updates as required. The pause functionality ensures that the Conduit process will not miss out on any blockchain state updates. Conduit also makes use of a new “state delta” endpoint introduced in the node to eliminate the requirement for a large local ledger.

A node with follow mode enabled cannot participate in consensus, as votes based on paused state information would be rejected. Similarly, submitting transactions to such a node is not possible, as acceptance based on paused, outdated state information might be judged invalid by the rest of the blockchain.

Conduit as an Extensible Tool

Focusing on open-source principles and decentralization, Conduit’s design encourages custom-built solutions, setting it apart from the Indexer. In our initial release, we encourage new plugin submissions via PRs to the Conduit repository. We aim for the plugin framework to inspire community involvement, allowing everyone to benefit from shared efforts. Currently, we’re engaging the community to identify optimal management for externally-supported plugins long-term (join the conversation on Discord #conduit channel!)

We have already seen the development of a Kafka plugin by a community member (Iridium#4127 on Discord), who has this to say about Conduit:

“… it [Conduit] allows [you] to choose your … targeted product (e.g. Kafka) to quickly build a plugin and let the data flow. Mainly it’s just importing the correct library — configure your connection and use the library to send messages to your system. Receiving is already handled by Conduit.”

Comparing Deployments: Legacy Indexer vs. Conduit Architecture

Indexer, legacy architecture

  • Requires an archival algod node, which requires at least 1.1 TB of storage.
  • Requires a Postgres database with full historical data, or 1.5 TB of storage.

Source for the above:

Conduit architecture

  • Requires a node with “follow mode” enabled, which requires 40 GB of storage (like other non-archival nodes).
  • Conduit can use a Postgres database, or a different data store. The user can store full historical data, or a subset. This is at most 1.5 TB if storing the full history, and could be as little as a few GB.

The costs of these deployments will vary depending on whether users are self-hosted or using cloud providers (and vary greatly by provider). However, the storage costs will be strictly less for a Conduit-backed deployment.

Note that storage will likely be the major cost factor, and bandwidth and compute requirements are similar across both architectures.

Continued Indexer Support

We are continuing to support the existing releases of Indexer which run its old architecture (using the archival node) at this time. If users would like to continue using the Indexer but also want to save costs by removing the need for an archival node, they have the option to run an Indexer backed by Conduit. The Indexer interface remains the same. See our migration guide here.

Conduit Builds Better Apps

Conduit was designed to be flexible and extensible, intended to allow developers to build whatever data solution fits their needs. As such, Conduit has countless applications.

Want to run Conduit to support your dapp reporting needs?

Want to extend the Indexer API?

Want to power an event-driven system based on on-chain events?

Want to scale your API Provider service by using CockroachDB?

Want to dump everything to S3 and just query that?

The limitations imposed by the Indexer’s rigidity no longer apply. While Conduit doesn’t provide everything for free, it offers users the flexibility to build what they need.

Lightweight and Flexible Data Access for Algorand was originally published in Algorand on Medium, where people are continuing the conversation by highlighting and responding to this story.

Try Before You Buy on Algorand

Simulate Smart Contract Evaluation on Algorand

The release of go-algorand 3.15 introduces a new mode of interaction for Algorand: simulate. Simulate allows transactions to be evaluated by a node without altering the state of the blockchain.

Simulate has a myriad of uses, including free read access to contract state, efficient testing, and easier debugging.

Simulate Basics

Simulate is a new endpoint that mirrors the transaction submission endpoint. It can be called using the same exact format on any network, including Mainnet.

Simulate evaluates the transaction group submitted using the network’s current state. Simulate’s response will include:

  • would-succeed: a boolean indicating whether the transaction group is valid.
  • failure-message: if the transaction group is not valid, simulate returns the error.
  • missing-signatures: simulate will complete the evaluation even if the transaction group is missing signatures. This field will indicate whether any signatures are missing.
  • txn-results: simulate will return partial information about what effects this transaction group would have had on the blockchain state: any new assets or apps created, global/local state changes, etc. In future versions of simulate, this will also explicitly include changes to Algo balances and box states.
    If the evaluation ran into an error, simulate will return the partial information to help debugging efforts: any changes that were already calculated up to the point when the evaluation failed.

Reading State for Free with Simulate

Imagine that you want to know a smart contract’s state (global, local, or box state).

For example, you’re part of a DAO and you want to know whether alice.algo has voted on the latest proposal yet. How can you find out?

The voting smart contract will store voting information in alice.algo’s local state. Using the API for algod or for an indexer, you can get the local state data. However, the data will likely not be very useful to you — it’ll be encoded however the smart contract organized it, so it’ll be gobbledygook.

You can read the smart contract to understand the encoding, making it possible to decode Alice’s voting status. Although — can you read smart contracts on chain?

Rather than trying to read the smart contract code, why can’t the smart contract just tell us the answer, since it knows how to decode its own data? It could have a method did_they_vote(account)bool that returns a boolean representing the account’s voting status. That makes it easy: just issue a transaction to call that method to get your answer. Unfortunately, calling the app costs a network fee, unlike the read-and-decode solution.

Simulate offers us the best of both worlds. You can call did_they_vote(alice.algo) using simulate, which will give you the answer without charging a network fee.

A more complex example, where the smart contract is not only decoding their state for you but also doing some calculation, is calling simulate on get_current_slippage(trade_size, pair) to an AMM.

Another expected usage is to run a transaction using simulate right before submitting it to the network. This allows you to verify that the transaction resulted in the expected changes to the blockchain. However, there is an important subtlety here to be aware of: state could change between the time when you call simulate and when your transaction is accepted by the network.

Streamlined Testing with Simulate

There are many ways to test Algorand Smart Contracts. One of the most common is to spin up a local network, issue transactions to set up a known blockchain state for the test, call the app, and then assert against the resulting state. This is a powerful test that is quite faithful to how the app would behave on Mainnet, but it is heavy. It can take time (too much time) to write and run each test, which causes some developers to write and run fewer tests than they might otherwise.

Perhaps the most time-consuming part of this test is setting up the state for the test. This state-setting can consist of: creating and funding accounts, creating assets, opting accounts into the assets, deploying apps, opting accounts into the apps, and making a series of initial app calls to set up global and local states. All this before the actual single test app call is made and the resulting state can be verified.

Running the next test requires resetting the local network to zero and starting over again, repeating the whole state-setting process.

How does simulate help us with this costly and inefficient testing process? Simulate allows you to run several tests against the same state. After setting up the state, instead of making the app call, thereby altering the state and needing to reset it for further tests, you can use simulate to see what would happen if the app call were submitted. You can thus run many tests against the same state — calling different contract methods, with many different arguments, even fuzz testing. You will also get richer output from simulate than you would from a normal execution, so you can run more detailed dapp-specific test assertions.

For example, imagine that Maria is running an auction dapp and wants to test it. Maria sets up her blockchain state such that the auction is in progress. Maria’s test then runs several simulate calls:

  • One call that bids on the item for auction with a price above the previous bid plus the minimum bid increment (happy path, should succeed).
  • One call that bids on the item with a price above the previous bid but not higher than the previous bid plus the minimum bid increment (should fail).
  • One call that bids on the item with a price that is lower than the previous bid (should fail).
  • One call that attempts to claim the item (even though the auction is still in progress, so should fail).

These are just a few examples. A thorough developer would likely include many more cases, but hopefully it demonstrates how tests can now be more efficiently run.

As John Clarke of Algofi says, “Simulate will dramatically improve the efficiency of … development, enabling more robust test suites to be built for AVM smart contracts.”

Improved Debugging with Simulate

As a github external contributor once commented, “debugging is always urgent.” Indeed, debugging is at the center of every developer’s workflow (I sometimes hear of mythical programmers who write perfect code on the first try, but I don’t believe in fairytales). Debugging Algorand smart contracts has been, let’s be honest, a middling experience so far. It’s time to get good — simulate is the first, and key, step.

From Dryrun to Simulate

Before simulate, a core tool used for debugging was “dryrun”. On the surface, dryrun seems similar to simulate: it will evaluate your program without committing anything to the blockchain.

However, dryrun is built with a different architecture which severely limits it. Dryrun does not use the real ledger from the blockchain, nor the real block evaluator. It uses a thin version of the evaluation logic against external state passed in through the endpoint.

Dryrun does fine for evaluating a single app call, but cannot evaluate a transaction group properly. It cannot update the state from one transaction to the next in the transaction group, since it is just using the passed-in state as-is.

Most dapps use transaction groups, so this limitation is quite problematic. Simulate solves the situation by handling transactions individually or as group transactions. When evaluating a transaction group, simulate properly tracks the state changes from one transaction to the next in the group.

Another limitation of dryrun’s architecture is that it needs to be updated each time there is an update to the AVM. Dryrun fell out of date when inner transactions were introduced, and again with contract-to-contract calls, and again when boxes were added. Simulate, built alongside the evaluation logic itself, will never require such maintenance and should not fall out of date.

The Future with Simulate

Simulate lays the foundation for a slew of useful features. Now that the core architecture is in place, the field is open to suggestions of what will be most useful for the developer community. Here are some features we are hoping to add in the near future:

  • Allow simulate to run without log limitations.
  • Report opcode budget used by app and by smart signature.
  • Report fee credit (assuming no congestion, how many excess fees were used for this transaction group).
  • Report the execution trace: for each step of the evaluation, details the stack, scratch slots, changes to local/global/box state, etc.
  • Report foreign resources used.
  • Allow simulate to run without any foreign resource limits (without checking foreign arrays).
  • Use different suggested params in a simulate run.

We seek your suggestions! Reach us on github or discord (algoanne#5743), as usual.

Prominent API Providers of the ecosystem, Algonode and Purestake, will both have the simulate endpoint available for use soon after release.

For technical details of how to use simulate, see our technical overview article.

Try Before You Buy on Algorand was originally published in Algorand on Medium, where people are continuing the conversation by highlighting and responding to this story.