At a high level, those three categories of improvements for Clarity are:
- Basic language and reliability improvements for the Clarity language and runtime
- Support for new functionality and better Bitcoin interactivity
- Language analysis tools for non-consensus breaking features
To be clear, these are not improvements that Hiro is actively working on, and we encourage anyone to pick these projects up. I will address the latter two categories in greater detail in future posts, but for this post today, I will focus on language and reliability improvements.
This category encompasses efforts that:
- Increase the reliability and performance of the Clarity virtual machine (VM)
- Standardize a language specification in a way that humans can understand and verify it
- Make improvements or changes in places where language irregularities exist
- Formalize the language specification and type system in a way that computers can understand and verify it
Some Background on Clarity
Compared to almost all production-use programming languages, the Clarity language and virtual machine are very naive: the virtual machine implementation favors simplicity of implementation over performance. The VM itself was written by a very small team of contributors, and the language specification and implementation both suffer somewhat as a result.
A formal specification for the syntax does not exist, and there are resultant peculiarities (e.g., type names are not reserved words). A formal specification for a language would specify, in a declarative manner, exactly when different terms are legal in the language syntax. These specifications can be used to validate parser implementations, automatically construct test suites, and surface any strange or unusual syntax.
In practice, without a formal specification, developers aren’t always sure what is, and what isn’t, a legal Clarity smart contract until they try running it. Additionally, without this formal specification, it’s hard to develop tools: for example, a developer building a syntax highlighter wouldn’t know whether the highlighter is correct because they don’t know exactly what the syntax of Clarity is without digging through the source code.
The VM implementation is a naive recursive interpreter. While this design is easy to work with and reason about, it places a burden on contract stack-depth and also results in an inefficient interpreter.
Implementing a Formally Verified Interpreter
There is a large possibility of improvements to the Clarity language and implementation that would come from determining a formal specification of the language and producing some kind of formally verified interpreter. Such an interpreter does not necessarily need to be the interpreter used by the Stacks blockchain to provide a benefit — the interpreter used by the Stacks blockchain could be compared against this “benchmark” interpreter using fuzzing, and any revealed discrepancies could be remedied in a future version of Clarity (i.e., Stacks 2.2.) or, if the revealed discrepancy rises to the level of a SIP-11 fork, then proposing and advocating for that fork.
Let’s take a look at some of the projects for improvement that fall inside of this work.
Projects for Improvement
- Formal syntax specification. As discussed above, formal specifications for languages specify, in a declarative manner, exactly when different terms are legal in the language syntax. Without such a specification, developers who want to build language tooling for the Clarity language have to consult the Clarity source code for guidance on the Clarity syntax. A specification would make it easier for people to build things like syntax checkers and IDEs.
- Formal type specification. We already have a specification for type admission and type inference, but these should be formalized, as well as providing formal typing rules for all native functions. A formal type specification tells contract authors exactly what kind of Clarity values can be used at different times in a contract. It would also enable developers that are building language tools (like IDEs, automated testing platforms, etc) to provide correct error checking and suggestions in their tools.
- Separation of Clarity VM from stacks-blockchain implementation. Currently, the stacks-blockchain implementation builds with the Clarity virtual machine in the same cargo build process, using whichever version of Clarity is implemented in the repository. Separating these two components means not only separating them into two different binaries, but allowing the stacks-blockchain to select the Clarity language version that it wants to use, rather than just implicitly using whichever version it is built with. Doing this is helpful across a number of fronts. It will improve the build process and maintainability of developer tooling that depends on the VM, like Clarinet, but possibly other future tools. It will allow for more community involvement and interaction from other parties. It will enable a stricter relationship between the Stacks blockchain and Clarity. And it will free the Clarity build system from the Stacks node build system, which is important if projects in this workstream result in new or alternative implementations of the Clarity VM.
- Rewrite the current recursive VM as an instruction stack interpreter. A stack-based interpreter is an interpreter that evaluates Clarity programs by translating them into linear instructions that operate on a single memory stack. It's a relatively common approach to implementing interpreters, and is sort of one level of sophistication above the recursive interpreter we have today. This project has the potential to improve the memory footprint of the stacks-node as well as the performance of evaluating contracts (however, it is still the case that MARF interactions would be the primary bottleneck).
- MARF improvements. The stacks-node MARF implementation, which handles storing all of the materialized state of the Stacks blockchain (e.g. account balances and smart contract variable state), is a bottleneck for a wide array of operations. It’s the slowest component of block evaluation, and is therefore the limiting factor in our block limits. Synchronization behavior of the MARF also leads to locking across network requests, block processing, bitcoin block downloads, and evaluating sortitions. Factoring out the MARF implementation as a library and working to improve its synchrony requirements and speed of lookups would be massively beneficial and increase reliability and speed of RPC endpoints, make mining behavior more predictable, and reduce the hardware requirements for running nodes. With these performance improvements, future network upgrades could increase the throughput of the blockchain.
- Formally verifiable interpreter implementation. This would start with something like a Haskell implementation of the Clarity VM, which is easier than it sounds — Haskell has lots of tooling for implementing LISP-dialects as well as type systems. With the formal syntax specification and type specification, this should be even easier. Once that implementation is in place, anyone can incrementally add new kinds of proofs to the implementation which assert various properties they are interested in proving. These are properties like the impossibility of altering a user account balance without a transaction from that user, the impossibility of executing a single smart contract multiple times in the same transaction (this is called reentrancy, and it has been the cause of many of the biggest smart contract hacks in other blockchains), and the impossibility of triggering unbounded execution (this is Turing completeness, and it is what has prevented other blockchains from successfully executing soft forks).
These are just a few ideas on how the Stacks community can improve the Clarity language. Hiro itself is not actively pursuing this work currently, so we’d love to see some of you take up the initiative! We’re happy to support however we can.