Comment on page



Substreams is a powerful indexing technology, which allows you to:
  1. 1.
    Extract data from several blockchains (Ethereum, Polygon, BNB, Solana...).
  2. 2.
    Apply custom transformations to the data.
  3. 3.
    Send the data to a place of your choice (for example, a Postgres database or a file).


Firehose is the extraction layer of Substreams (i.e. step number one of the previous glossary entry). Although Firehose is a different project, it is tightly related to Substreams.


CLI, which stands for command-line interface, is a text-based interface that allows you to input commands to interact with a computer. The Substreams CLI allows you to deploy and manage your Substreams.


Modules are small pieces of Rust code running in a WebAssembly (WASM) virtual machine. Modules have one or more inputs and an output. For example, a module could receive an Ethereum block as input and emit a list of transfers for that block as output.
There are two types of modules: map and store.

map Module

map modules receive an input and emit an output (i.e. they perform a transformation).

store Module

store modules write to key-value stores and are stateful. They are useful in combination with map modules to keep track of past data.

Directed Acyclic Graph (DAG)

DAGs are data structures used in many computational models. In Substreams, DAGs are used to define module data flows.
A DAG is a one-direction, acyclic graph. They are used in a variety of software, such as Git or IPFS.


Modules make Substreams really composable. Being composable means that Substreams can be independent, but they can also work together to create powerful streams.
For example, consider that you have two map modules: one emitting Transfer objects and another one emitting AccountInformation objects. You could create another module that receives the previous two modules as input and merges the information from both.
That is why Substreams is so powerful!

Protocol Buffers (Protobuf)

Protocol Buffers is a serializing format used to define module inputs and outputs in Substreams. For example, a manifest might define a module called map_tranfers with an input object, Transfer (representing an Ethereum transaction), and an output object MyTransfer (representing a reduced version of an Ethereum transaction).


The Substreams manifest (called substreams.yaml) is a YAML file where you define all the configurations needed. For example, the modules of your Substreams (along with their inputs and outputs), or the Protobuf definitions used.

WebAssembly (WASM)

WebAssembly (WASM) is a binary-code format used to run a Substreams. The Rust code used to define your Substreams transformations is packed into a WASM module, which you can use as an independent executable.


The Block Protobuf object contains all the blockchain information for a specific block number. EVM-compatible chains share the same Block object, but non-EVM-compatible chains must use their corresponding Block Protobuf definition.

SPKG (.spkg)

SPKG files contain Substreams definitions. You can create an .spkg file from a Substreams manifest using the substreams pack command. Then, you can use this file to share or run the Substreams independently. The .spkg file contains everything needed to run a Substreams: Rust code, Protobuf definitions and the manifest.


The CLI includes two commands to run a Substreams: run and gui. The substreams run command prints the output of the execution linearly for every block, while the substreams gui allows you to easily jump to the output of a specific block.


Subgraphs are another indexing mechanism developed by The Graph. In Subgraphs, data is indexed and available through a GraphQL endpoint.
One of the main differences between Subgraphs and Substreams is that Subgraphs rely on polling, while Substreams relies on streaming.


In Subgraphs, you define triggers to index your data. These triggers are events that happen in the blockchain (for example, AccountCreated). Subgraphs listen for those events, and index the data accordingly.


Substreams allows you to extract blockchain data and apply transformations to it. After that, you should choose a place to send your transform data, which is called sink. A sink can be a SQL database, a file or a custom solution of your choice.

Deployable Unit

A deployable unit is a Substreams manifest or package (spkg) that contains all the information about how to run it from sink service. In the manifest, it corresponds to the network and sink fields. See Working with deployable units

Substreams-powered Subgraph

When a Subgraph acts as a sink for your Substreams, you call it a Substreams-powered Subgraph.
The Subgraph Sink is one of the official sinks supported by Substreams, and can help you index your Subgraph way faster!

Parallel execution

Parallel execution is the process of a Substreams module's code executing multiple segments of blockchain data simultaneously in a forward or backward direction.


Workers are the fundamental unit of parallelizing in Substreams. Workers are computer processes that run in parallel to speed up the Substreams computations.