StreamingFast Substreams fundamental knowledge

Fundamentals overview

Substreams development involves using several different pieces of technology, including the Substreams engine, substreams command line interface (CLI), modules, protobufs, and various configuration files of different types. The documentation explains how these pieces fit together.
Substreams in Action

The process to use Substreams includes:

  • Choose the blockchain to capture and process data.
  • Identify interesting smart contract addresses (like DEXs or interesting wallet addresses).
  • Identify the data and defining and creating protobufs.
  • Find already-built Substreams modules and consume their streams, or:
  • Write Rust Substreams module handler functions.
  • Update the Substreams manifest to reference the protobufs and module handlers.
  • Use the substreams CLI to send commands and view results.

The Substreams engine

The Substreams engine serves as the CPU or brain of the Substreams system, handling requests, communication, and orchestrating the transformation of blockchain data.
Note: The Substreams engine is responsible for running developer-defined data transformations to process blockchain data.
Developers use the substreams CLI to send commands, flags, and a reference to the manifest configuration file to the Substreams engine. They create data transformation strategies in Substreams "module handlers" using the Rust programming language, which acts on protobuf-based data models referenced from within the Substreams manifest.

Substreams module communication

The Substreams engine runs the code defined by developers in Rust-based module handlers.
Note: Substreams modules have unidirectional data flow, meaning data is passed from one module to another in a single direction.
The data flow is defined in the Substreams manifest through the "inputs" and "outputs" fields of the configuration file, which reference the protobuf definitions for blockchain data. The data flow is also defined by using the "inputs" field to send data directly from one module to another.

Substreams DAG

Substreams modules are composed through a directed acyclic graph (DAG).
Note: In DAGs, data flows from one module to another in a one-directional manner, with no cycle, similar to Git's model of commits and branches.
The Substreams manifest references the modules and the handlers defined within them, forming the intention of how they are used by the Substreams engine.
Directed acyclic graphs contain nodes, which in this case are modules communicating in only one direction, passing from one node or module to another.
The Substreams engine creates the "compute graph" or "dependency graph" at run time through commands sent to the substreams CLI using the code in modules referenced by the manifest.

Protobufs for Substreams

Substreams module handlers linked to protobuf
Protocol buffers or protobufs are the data models operated on by the Rust-based module handler functions. They define and outline the data models in the protobufs.
Note: Protobufs include the names of the data objects and the fields contained and accessible within them.
Many protobuf definitions have already been created, such as the erc721 token model, for use by developers creating Substreams data transformation strategies.
Custom smart contracts, like UniSwap, also have protobuf definitions that are referenced in the Substreams manifest and made available to module handler functions. Protobufs provide an API to the data for smart contract addresses.
In object-oriented programming terminology, protobufs are the objects or object models. In front-end web development, they are similar to REST or other data APIs.
Tip: Firehose and Substreams treat the data as the API.

Substreams Rust modules

Writing Rust Modules for Substreams
The first step in Substreams development is to design an overall strategy for managing and transforming data. The Substreams engine processes modules by using the relationships defined in the manifest.
Note: Substreams modules work together by passing data from one module to another until they finally return an output transformed according to the rules in the manifest, modules, and module handler functions.
Modules define two types of module handlers: map and store. These two types work together to sort, sift, temporarily store, and transform blockchain data from Block objects and smart contracts for use in data sinks such as databases or subgraphs.