Last fall and into the spring, we ran into an interesting problem. When profiling the API’s CPU usage, we found that over 50% of CPU time was spent encoding/decoding raw Clarity values.
Clarity values are representations of data that are understood by Clarity (including strings, tuples, integers—see a full list of Clarity values here), and are used frequently. For example, Clarity values include things like generating repr strings for contract-call tx args, parsing “length-prefixed” strings from raw tx segments, and more.
Ideally, CPU usage shouldn’t be dedicated to encoding and decoding these values, but instead be dedicated to handling API requests and responses, as well as database interactions.
The API now uses that library to achieve CPU optimizations for decoding Clarity values, binary transaction blobs, post-condition binary blobs, and Stacks address blobs. That led to a 75% reduction in CPU time spent doing encoding/decoding.
Want to learn more about the roadmap for the Stacks API? Watch a presentation from one of Hiro's engineers:
Event Replay Optimizations
When an instance of the API is deployed for the first time, it needs to “sync” with the chain history from block 0 to the current block. Event-replay is an API feature that enables developers to store and record Stacks node events in a text file and read it back at different times. Importantly, event replay gives us all the blockchain history at once, as opposed to having the API take days waiting for a Stacks node to sync from scratch. As you can imagine, that increased speed is helpful for managing migrations and breaking changes.
We know that developers want to ship fast and with confidence, so the more we can reduce event import time, the better. Previously, a full event replay (archival mode) can take around 18 hours to complete. The process required importing the complete node event file into the API database while calculating re-orgs as well as the complete subdomain data.
These are intense CPU processes, given the sheer volume of data being handled and accounting for chain re-orgs when the blockchain forks for a few minutes and then re-arranges itself once miners figure out the correct order of blocks.
We recently released “pruned event import” mode, which reduced that import time by around 40% (around 7 hours) by allowing the import to ignore some events outside of a specific block height window. However, that only gets you so far.
To get import times that are an order of magnitude faster, we released the event-replay mode preorg. This mode performs a fast “re-org” pass over a TSV file, generating a new TSV that only includes canonical data. In other words, instead of the API ingesting the complete chain history, with all its imperfections and re-orgs, the API only ingests an “ironed” version of the chain history and only imports the final, real data. This saves the API a lot of work because it can simply ingest the data into its database instead of having to parse re-org data to determine the final chain state.
This mode takes around 7-8 minutes to run an event import, using a full archival 4.7 GB TSV file. That’s a lot of data ingested very quickly.
We reduced event import time from 18 hours to just 8 minutes.
Better FT/NFT Metadata Processing
In the Stacks API, we offer metadata processing for fungible tokens. For example, when the Hiro Wallet displays which tokens a user owns, it displays information about those tokens (their name, logo, correct decimals, and more), and that data is provided through the metadata served here. However, this does not guarantee the metadata is available when required. In most cases, the metadata lives in files that are outside of the Stacks chain (e.g. a JSON file somewhere). Sometimes when we try to get this data, the servers are not available, and we return “blank” data when users request it.
To solve that issue, we made a number of changes to how we handle errors for that metadata, including:
- Separating tokens-contract-handler.ts into a separate queue, handler and helper files under src/token-metadata. This makes the code much more maintainable and easier to isolate from the rest of the API.
- Creating a RetryableTokenMetadataError class and tagging each retryable error inside the metadata processor with this class, so we can decide what to do later in each case. This allows us to choose which errors we can recover from and what others we should report right away.
- Adding STACKS_API_TOKEN_METADATA_STRICT_MODE, an environment variable that controls how we handle retryable errors. If enabled, contracts with retryable errors will be processed again indefinitely. If disabled, processing will be retried for that contract until STACKS_API_TOKEN_METADATA_MAX_RETRIES is reached.
- Start listening to blockUpdate events inside the metadataprocessor, so we can try to drain the queue with every new block and try failed contracts. This allows us to retry metadata errors with each new block, so we can serve the best possible data.
Those are just a few of the recent changes we’ve made to the API, and we have a number of exciting things in the pipeline, including sending NFT updates through Websockets. If you're curious to learn more about the history of the Stacks API, check out this deep dive on scaling the Stacks API.