Under the Hood: Efficient Storage for Fork-Aware Blockchains

Learn about how blockchain software efficiently represents, queries, and authenticates data.

The analogy of a blockchain to a distributed ledger is a common one. First, blockchain miners compete to win a leader election. On becoming a leader, the miner packages transactions into a block, which they can then append to any previous block. Other participants in the network receive the block, and can verify if the transactions are valid as compared to their local copy of the ledger (for example, they can ensure Alice doesn’t send more cryptocurrency than she owns to Bob).

What might be unclear, however, is exactly how blockchain software efficiently keeps track of and queries data. For example, Alice may transfer a token to Bob at a block height of 400. If Bob wants to transfer that token to Charlie at a block height of 10000, the blockchain must be able to quickly determine (and verify) whether Bob owns that token in order to verify that transaction. It also must be possible to efficiently query state in different forks, each of which may have different transactions. This article will provide a high-level overview of what is happening under the hood to enable this efficient querying in the Stacks blockchain.

If you are curious about the internal workings of the blockchain, read on! After reading this post, you will understand what properties the view of the global state must have for the blockchain to operate efficiently and correctly.

Blockchains & state

As we just discussed, a blockchain is a series of blocks, containing transactions. Let’s call the final state of all data — accounts, balances, assets —“global state”, which is the end result of applying those transactions in a particular order. The materialized view is a snapshot of this global state at a particular block in the blockchain. Blockchain software needs to query the materialized view in its operation. Some examples of queries are validating transactions and checking the current value of a field.

Blockchains must support special properties that traditional databases do not. They must be able to support conflicting transaction histories (forks), as well as be able to authenticate any state on any fork.

What is a materialized view?

To reiterate, a materialized view is a snapshot of the state of the blockchain at a particular chain tip - it is the state of the world having applied all the transactions. In the Stacks blockchain, this data is represented as a simple key-value store. Examples of the type of data that would be stored are the total supply of a particular fungible token, or a mapping from a block height to the hash of the block at that height. To understand why we should care about the materialized view, let’s discuss what properties were most important during the design process.

1. Needs to be authenticated

We should be able to prove that a key corresponds to a certain value in a given fork, and the proof should be short.  To explain the general idea behind this, let’s describe a system to authenticate a singular key value pair. Instead of storing the value directly (since that could be space-intensive), our system would instead map a hash of the key to a hash of the value, which is generated by feeding the key (and value) through a hashing function, such as SHA256. A client could then verify they have the correct value locally by applying the same hash function, and comparing the hash they obtain to the hash the system stores at the key hash.

With this property, a client with a singular known-good block hash would be able to verify all key-value pairs at that block.

2. Able to cheaply handle multiple forks

Forks in blockchains are common, and each will have a different materialized view (this is because each fork may have different transactions, leading to a different blockchain state). Moreover, at any given point, a miner can build a block off of any previous block in the blockchain. We must be able to cheaply generate the materialized view for any new block as well as for any new fork. If we did not have this property, a bad actor could attempt to mine blocks in “unfavorable” forks (forks that take longer to process than average). This could lead to a denial-of-service attack, due to higher block processing times.

3. Adding new data should be cheap

Adding new data (aka, new key-value pairs) to the materialized view must be cheap. In other words, processing block number 10000000 should be just as cheap as processing block 1.

4. Needs to be history-dependent

The materialized view needs to be deterministic, or history dependent. Switching the order of block N and block M anywhere in the blockchain’s history should yield a different final state. In practice, doing this would lead to a different final block header hash.


The structure of the materialized view

As I said before, the materialized view is represented by a simple key-value store. There are two components to this storage:

  • The first component is the raw data itself, which can be stored in any data store or database. This component would store the actual key and value.
  • The second component is the authenticated index, for which we use the Merkelized Adaptive Radix Forest (MARF) data structure. The MARF is responsible for linking hashes of keys to a hash of the corresponding value in a way that allows us to cryptographically prove that a specific key maps to a specific value at a particular block in a particular fork. This data structure doesn’t store the raw data, but rather hashes of that data. It allows us to efficiently query and authenticate blockchain state. The MARF achieves each of the design goals laid out in the section above.

Summary

A blockchain is a peer-to-peer database. The materialized view is a snapshot of that database at a specific block, and it allows us to efficiently represent, query, and authenticate state. In contrast to conventional databases, the materialized view of a blockchain needs to handle conflicting histories efficiently (aka forks), and be able to authenticate key-value pairs. The MARF is the data structure that allows us to produce succinct cryptographic proofs for key-value pairs.

In a future post, I will cover the basics of the internals of the MARF, as well as how the design of the MARF helps achieve the design goals we laid out for the materialized view. Stay tuned! In the meanwhile, you can check out SIP 004, authored by Jude Nelson from the Stacks Foundation, for further reading or listen to the SIP 004 Reading Club discussion.

Get more of Hiro
Important updates from key Stacks Ecosystem projects and conversations about building on Stacks, delivered weekly.