StreamingFast Substreams modules basics
Modules are an important part of Substreams, offering hooks into the execution of the Substreams compute engine. You can create Substreams data manipulation and transformation strategies within modules.
Modules are small pieces of Rust code running in a WebAssembly (WASM) virtual machine. They coexist within the stream of blocks sent by the Substreams compute engine, which arrives from a blockchain node.
Modules have one or more inputs, which can be in the form of a
store, or a
Clockobject received from the blockchain's data source.
Substreams modules data interaction diagram
The diagram shows how the
transfer_mapmodule extracts the transfers in a
Blockand tracks the total number of transfers.
Note: You can use multiple inputs in blockchains because they are clocked, which allows for synchronization between multiple execution streams and improved performance compared to conventional streaming engines.
As seen in the
storeexample diagram, modules can also take in multiple inputs. In this case, two modules feed into a
store, effectively tracking multiple
Multiple module inputs diagram
Every time a new
Blockis processed, all of the modules are executed as a directed acyclic graph (DAG).
Note: The protocol's Block protobuf model always serves as the top-level data source and executes deterministically.
Modules have a single typed output, which is typed to inform consumers of the types of data to expect and how to interpret the bytes being sent.
Tip: In subsequent modules, input from one module's data output is used to form a chain of data flow from module to module.
To develop most non-trivial Substreams, you will need to use multiple
storemodules. The specific number, responsibilities, and communication methods for these modules will depend on the developer's specific goals for the Substreams development effort.
The two module types are commonly used together to construct the directed acyclic graph (DAG) outlined in the Substreams manifest. The two module types are very different in their use and how they work. Understanding these differences is vital for harnessing the full power of Substreams.
mapmodules are used for data extraction, filtering, and transformation. They should be used when direct extraction is needed avoiding the need to reuse them later in the DAG.
To optimize performance, you should use a single
mapmodule instead of multiple
mapmodules to extract single events or functions. It is more efficient to perform the maximum amount of extraction in a single top-level
mapmodule and then pass the data to other Substreams modules for consumption. This is the recommended, simplest approach for both backend and consumer development experiences.
mapmodules have several important use cases and facts to consider, including:
- Extracting model data from an event or function's inputs.
- Reading data from a block and transforming it into a custom protobuf structure.
- Filtering out events or functions for any given number of contracts.
storemodules are used for the aggregation of values and to persist state that temporarily exists across a block.
Important: Stores should not be used for temporary, free-form data persistence.
storemodules are discouraged.
storemodules shouldn't be used as an infinite bucket to dump data into.
Notable facts and use cases for working
storemodules should only be used when reading data from another downstream Substreams module.
storemodules cannot be output as a stream, except in development mode.
storemodules are used to implement the Dynamic Data Sources pattern from Subgraphs, keeping track of contracts created to filter the next block with that information.
- Downstream of the Substreams output, do not use
storemodules to query anything from them. Instead, use a sink to shape the data for proper querying.