Change log
All notable changes to the Substreams project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Substreams builds upon Firehose. Keep track of Firehose releases and Data model updates in the Firehose documentation.
v1.3.5
Code generation
Added
substreams init
support for creating a substreams with data from fully-decoded Calls instead of only extracting events.
v1.3.4
Code generation
Added
substreams init
support for creating a substreams with the "Dynamic DataSources" pattern (ex: aFactory
contract creatingpool
contracts through thePoolCreated
event)Changed
substreams init
to always add prefixes the tables and entities with the project nameFixed
substreams init
support for unnamed params and topics on log events
v1.3.3
Fixed
substreams init
generated code when dealing with Ethereum ABI events containing array types.[!NOTE] For now, the generated code only works with Postgres, an upcoming revision is going to lift that constraint.
v1.3.2
Fixed
store.has_at
Wazero signature which was defined ashas_at(storeIdx: i32, ord: i32, key_ptr: i32, key_len: i32)
but should have beenhas_at(storeIdx: i32, ord: i64, key_ptr: i32, key_len: i32)
.Fixed the local
substreams alpha service serve
ClickHouse deployment which was failing with a message regarding fork handling.Catch more cases of WASM deterministic errors as
InvalidArgument
.Added some output-stream info to logs.
v1.3.1
Server
Fixed error-passing between tier2 and tier1 (tier1 will not retry sending requests that fail deterministicly to tier2)
Tier1 will now schedule a single job on tier2, quickly ramping up to the requested number of workers after 4 seconds of delay, to catch early exceptions
"store became too big" is now considered a deterministic error and returns code "InvalidArgument"
v1.3.0
Highlights
Support new
networks
configuration block insubstreams.yaml
to override modules' params and initial_block. Network can be specified at run-time, avoiding the need for separate spkg files for each chain.[BREAKING CHANGE] Remove the support for the
deriveFrom
overrides. Theimports
, along with the newnetworks
feature, should provide a better mechanism to cover the use cases thatderiveFrom
tried to address.
[!NOTE] These changes are all handled in the substreams CLI, applying the necessary changes to the package before sending the requests. The Substreams server endpoints do not need to be upgraded to support it.
Added
Added
networks
field at the top level of the manifest definition, withinitialBlock
andparams
overrides for each module. See the substreams.yaml.example file in the repository or https://substreams.streamingfast.io/reference-and-specs/manifests for more details and example usage.The networks
params
and `initialBlock`` overrides for the chosen network are applied to the module directly before being sent to the server. All network configurations are kept when packing an .spkg file.Added the
--network
flag for choosing the network onrun
,gui
andalpha service deploy
commands. Default behavior is to use the one defined asnetwork
in the manifest.Added the
--endpoint
flag tosubstreams alpha service serve
to specify substreams endpoint to connect toAdded endpoints for Antelope chains
Command 'substreams info' now shows the params
Removed
Removed the handling of the
DeriveFrom
keyword in manifest, this override feature is going away.Removed the `--skip-package-validation`` option only on run/gui/inspect/info
Changed
Added the
--params
flag toalpha service deploy
to apply per-module parameters to the substreams before pushing it.Renamed the
--parameters
flag to--deployment-params
inalpha service deploy
, to clarify the intent of those parameters (given to the endpoint, not applied to the substreams modules)Small improvement on
substreams gui
command: no longer reads the .spkg multiple times with different behavior during its process.
v1.2.0
Client
Fixed bug in
substreams init
with numbers in ABI types
Backend
Return the correct GRPC code instead of wrapping it under an "Unknown" error. "Clean shutdown" now returns CodeUnavailable. This is compatible with previous substreams clients like substreams-sql which should retry automatically.
Upgraded components to manage the new block encapsulation format in merged-blocks and on the wire required for firehose-core v1.0.0
v1.1.22
alpha service deployments
Fix fuzzy matching when endpoint require auth headers
Fix panic in "serve" when trying to delete a non-existing deployment
Add validation check of substreams package before sending deploy request to server
v1.1.21
Changed
Codegen: substreams-database-change to v1.3, properly generates primary key to support chain reorgs in postgres sink.
Sink server commands all moved from
substreams alpha sink-*
tosubstreams alpha service *
Sink server: support for deploying sinks with DBT configuration, so that users can deploy their own DBT models (supported on postgres and clickhouse sinks). Example manifest file segment:
where "./dbt" is a folder containing the dbt project.
Sink server: added REST interface support for clickhouse sinks. Example manifest file segment:
Fixed
Fix
substreams info
cli doc field which wasn't printing any doc output
v1.1.20
Optimized start of output stream in developer mode when start block is in reversible segment and output module does not have any stores in its dependencies.
Fixed bug where the first streamable block of a chain was not processed correctly when the start block was set to the default zero value.
v1.1.19
Changed
Codegen: Now generates separate substreams.{target}.yaml files for sql, clickhouse and graphql sink targets.
Added
Codegen: Added support for clickhouse in schema.sql
Fixed
Fixed metrics for time spent in eth_calls within modules stats (server and GUI)
Fixed
undo
json message in 'run' commandFixed stream ending immediately in dev mode when start/end blocks are both 0.
Sink-serve: fix missing output details on docker-compose apply errors
Codegen: Fixed pluralized entity created for db_out and graph_out
v1.1.18
Fixed
Fixed a regression where start block was not resolved correctly when it was in the reversible segment of the chain, causing the substreams to reprocess a segment in tier 2 instead of linearly in tier 1.
v1.1.17
Fixed
Missing decrement on metrics
substreams_active_requests
v1.1.16
Added
substreams_active_requests
andsubstreams_counter
metrics tosubstreams-tier1
Changed
evt_block_time
in ms to timestamp inlib.rs
, proto definition andschema.sql
v1.1.15
Highlights
See those two new features in action in this tutorial
Added
Sink configs can now use protobuf annotations (aka Field Options) to determine how the field will be interpreted in substreams.yaml:
load_from_file
will put the content of the file directly in the field (string and bytes contents are supported).zip_from_folder
will create a zip archive and put its content in the field (field type must be bytes).
Example protobuf definition:
Example manifest file:
substreams info
command now properly displays the content of sink configs, optionally writing the fields that were bundled from files to disk with--output-sinkconfig-files-path=</some/path>
Changed
substreams alpha init
renamed tosubstreams init
. It now includesdb_out
module andschema.sql
to support the substreams-sql-sink directly.The override feature has been overhauled. Users may now override an existing substreams by pointing to an override file in
run
orgui
command. This override manifest will have aderiveFrom
field which points to the original substreams which is to be overriden. This is useful to port a substreams to one network to another. Example of an override manifest:The
substreams run
andsubstreams gui
commands now determine the endpoint from the 'network' field in the manifest if no value is passed in the--substreams-endpoint
flag.The endpoint for each network can be set by using an environment variable
SUBSTREAMS_ENDPOINTS_CONFIG_<network_name>
, ex:SUBSTREAMS_ENDPOINTS_CONFIG_MAINNET=my-endpoint:443
The
substreams alpha init
has been moved tosubstreams init
Fixed
fixed the
substreams gui
command to correctly compute the stop-block when given a relative value (ex: '-t +10')
v1.1.14
Bug fixes
Fixed (bumped) substreams protobuf definitions that get embedded in
spkg
to match the new progress messages from v1.1.12.Regression fix: fixed a bug where negative start blocks would not be resolved correctly when using
substreams run
orsubstreams gui
.In the request plan, the process previously panicked when errors related to block number validation occurred. Now the error will be returned to the client.
v1.1.13
Bug fixes
If the initial block or start block is less than the first block in the chain, the substreams will now start from the first block in the chain. Previously, setting the initial block to a block before the first block in the chain would cause the substreams to hang.
Fixed a bug where the substreams would fail if the start block was set to a future block. The substreams will now wait for the block to be produced before starting.
v1.1.12
Highlights
Complete redesign of the progress messages:
Tier2 internal stats are aggregated on Tier1 and sent out every 500ms (no more bursts)
No need to collect events on client: a single message now represents the current state
Message now includes list of running jobs and information about execution stages
Performance metrics has been added to show which modules are executing slowly and where the time is spent (eth calls, store operations, etc.)
Upgrading client and server
[!IMPORTANT] The client and servers will both need to be upgraded at the same time for the new progress messages to be parsed:
The new Substreams servers will NOT send the old
modules
field as part of itsprogress
message, only the newrunning_jobs
,modules_stats
,stages
.The new Substreams clients will NOT be able to decode the old progress information when connecting to older servers.
However, the actual data (and cursor) will work correctly between versions. Only incompatible progress information will be ignored.
CLI
Changed
Bumped
substreams
andsubstreams-ethereum
to latest insubstreams alpha init
.Improved error message when
<module_name>
is not received, previously this would lead to weird error message, now, if the input is likely a manifest, the error message will be super clear.
Fixed
Fixed compilation errors when tracking some contracts when using
substreams alpha init
.
Added
substreams info
now takes an optional second parameter<output-module>
to show how the substreams modules can be divided into stagesPack command: added
-c
flag to allow overriding of certain substreams.yaml values by passing in the path of a yaml file. example yaml contents:
Backend
Removed
Removed
Config.RequestStats
, stats are now always enabled.
v1.1.11
Fixes
Added metering of live blocks
v1.1.10
Backend changes
Fixed/Removed: jobs would hang when config parameter
StateBundleSize
was different fromSubrequestsSize
. The latter has been removed completely: Subrequests size will now always be aligned with bundle size.Auth: added support for continuous authentication via the grpc auth plugin (allowing cutoff triggered by the auth system).
CLI changes
Fixed params handling in
gui
mode
v1.1.9
Backend changes
Massive refactoring of the scheduler: prevent excessive splitting of jobs, grouping them into stages when they have the same dependencies. This should reduce the required number of
tier2
workers (2x to 3x, depending on the substreams).The
tier1
andtier2
config have a new configurationStateStoreDefaultTag
, will be appended to theStateStoreURL
value to form the final state store URL, ex:StateStoreURL="/data/states"
andStateStoreDefaultTag="v2"
will make/data/states/v2
the default state store location, while allowing users to provide aX-Sf-Substreams-Cache-Tag
header (gated by auth module) to point to/data/states/v1
, and so on.Authentication plugin
trust
can now specify an exclusive list ofallowed
headers (all lowercase), ex:trust://?allowed=x-sf-user-id,x-sf-api-key-id,x-real-ip,x-sf-substreams-cache-tag
The
tier2
app no longer has customizable auth plugin (or any Modules),trust
will always be used, so thattier
can pass down its headers (e.g.X-Sf-Substreams-Cache-Tag
). Thetier2
instances should not be accessible publicly.
GUI changes
Color theme is now adapted to the terminal background (fixes readability on 'light' background)
Provided parameters are now shown in the 'Request' tab.
CLI changes
Added
alpha init
command: replaceinitialBlock
for generated manifest based on contract creation block.alpha init
prompt Ethereum chain. Added: Mainnet, BNB, Polygon, Goerli, Mumbai.
Fixed
alpha init
reports better progress specially when performing ABI & creation block retrieval.alpha init
command without contracts fixed Protogen command invocation.
v1.1.8
Backend changes
Added
Max-subrequests can now be overridden by auth header
X-Sf-Substreams-Parallel-Jobs
(note: if your auth plugin is 'trust', make sure that you filter out this header from public accessRequest Stats logging. When enable it will log metrics associated to a Tier1 and Tier2 request
On request, save "substreams.partial.spkg" file to the state cache for debugging purposes.
Manifest reader can now read 'partial' spkg files (without protobuf and metadata) with an option.
Fixed
Fixed a bug which caused "live" blocks to be sent while the stream previously received block(s) were historic.
CLI changes
Fixed
In GUI, module output now shows fields with default values, i.e.
0
,""
,false
v1.1.7 (https://github.com/streamingfast/substreams/releases/tag/v1.1.7)
Highlights
Now using plugin: buf.build/community/neoeinstein-prost-crate:v0.3.1
when generating the Protobuf Rust mod.rs
which fixes the warning that remote plugins are deprecated.
Previously we were using remote: buf.build/prost/plugins/crate:v0.3.1-1
. But remote plugins when using https://buf.build (which we use to generate the Protobuf) are now deprecated and will cease to function on July 10th, 2023.
The net effect of this is that if you don't update your Substreams CLI to 1.1.7
, on July 10th 2023 and after, the substreams protogen
will not work anymore.
v1.1.6 (https://github.com/streamingfast/substreams/releases/tag/v1.1.6)
Backend changes
substreams-tier1
andsubstreams-tier2
are now standalone Apps, to be used as such by server implementations (firehose-ethereum, etc.)substreams-tier1
now listens to Connect protocol, enabling browser-based substreams clientsAuthentication has been overhauled to take advantage of https://github.com/streamingfast/dauth, allowing the use of a GRPC-based sidecar or reverse-proxy to provide authentication.
Metering has been overhauled to take advantage of https://github.com/streamingfast/dmetering plugins, allowing the use of a GRPC sidecar or logs to expose usage metrics.
The tier2 logs no longer show a
parent_trace_id
: thetrace_id
is now the same as tier1 jobs. Unique tier2 jobs can be distinguished by theirstage
andsegment
, corresponding to theoutput_module_name
andstartblock:stopblock
CLI changes
The
substreams protogen
command now uses this Buf plugin https://buf.build/community/neoeinstein-prost to generate the Rust code for your Substreams definitions.The
substreams protogen
command no longer generate theFILE_DESCRIPTOR_SET
constant which generates an unsued warning in Rust. We don't think nobody relied on having theFILE_DESCRIPTOR_SET
constant generated, but if it's the case, you can provide your ownbuf.gen.yaml
that will be used instead of the generated one when doingsubstreams protogen
.Added
-H
flag on thesubstreams run
command, to set HTTP Headers in the Substreams request.
Fixed
Fixed generated
buf.gen.yaml
not being deleted when an error occurs while generating the Rust code.
Highlights
This release fixes data determinism issues. This comes at a 20% performance cost but is necessary for integration with The Graph ecosystem.
Operators
When upgrading a substreams server to this version, you should delete all existing module caches to benefit from deterministic output
Added
Tier1 now records deterministic failures in wasm, "blacklists" identical requests for 10 minutes (by serving them the same InvalidArgument error) with a forced incremental backoff. This prevents accidental bad actors from hogging tier2 resources when their substreams cannot go passed a certain block.
Tier1 now sends the ResolvedStartBlock, LinearHandoffBlock and MaxJobWorkers in SessionInit message for the client and gui to show
Substreams CLI can now read manifests/spkg directly from an IPFS address (subgraph deployment or the spkg itself), using
ipfs://Qm...
notation
Fixed
When talking to an updated server, the gui will not overflow on a negative start block, using the newly available resolvedStartBlock instead.
When running in development mode with a start-block in the future on a cold cache, you would sometimes get invalid "updates" from the store passed down to your modules that depend on them. It did not impact the caches but caused invalid output.
The WASM engine was incorrectly reusing memory, preventing deterministic output. It made things go faster, but at the cost of determinism. Memory is now reset between WASM executions on each block.
The GUI no longer panics when an invalid output-module is given as argument
Changed
Changed default WASM engine from
wasmtime
towazero
, useSUBSTREAMS_WASM_RUNTIME=wasmtime
to revert to prior engine. Note thatwasmtime
will now run a lot slower than before because resetting the memory inwasmtime
is more expensive than inwazero
.Execution of modules is now done in parallel within a single instance, based on a tree of module dependencies.
The
substreams gui
andsubstreams run
now accept commas inside aparam
value. For example:substreams run --param=p1=bar,baz,qux --param=p2=foo,baz
. However, you can no longer pass multiple parameters using an ENV variable, or a.yaml
config file.
HIGHLIGHTS
Module hashing changed to fix cache reuse on substreams use imported modules
Memory leak fixed on rpc-enabled servers
GUI more responsive
Fixed
BREAKING: The module hashing algorithm wrongfully changed the hash for imported modules, which made it impossible to leverage caches when composing new substreams off of imported ones.
Operationally, if you want to keep your caches, you will need to copy or move the old hashes to the new ones.
You can obtain the prior hashes for a given spkg with:
substreams info my.spkg
, using a prior release of thesubstreams
With a more recent
substreams
release, you can obtain the new hashes with the same command.You can then
cp
ormv
the caches for each module hash.
You can also ignore this change. This will simply invalidate your cache.
Fixed a memory leak where "PostJobHooks" were not always called. These are used to hook in rpc calls in Ethereum chain. They are now always called, even if no block has been processed (can be called with
nil
value for the clock)Jobs that fail deterministically (during WASM execution) on tier2 will fail faster, without retries from tier1.
substreams gui
command now handles params flag (it was ignored)Substeams GUI responsiveness improved significantly when handling large payloads
Added
Added Tracing capabilities, using https://github.com/streamingfast/sf-tracing . See repository for details on how to enable.
Known issues
If the cached substreams states are missing a 'full-kv' file in its sequence (not a normal scenario), requests will fail with
opening file: not found
https://github.com/streamingfast/substreams/issues/222
Highlights
This release contains fixes for race conditions that happen when multiple request tries to sync the same range using the same .spkg
. Those fixes will avoid weird state error at the cost of duplicating work in some circumstances. A future refactor of the Substreams engine scheduler will come later to fix those inefficiencies.
Operators, please read the operators section for upgrade instructions.
Operators
Note This upgrade procedure is applies if your Substreams deployment topology includes both
tier1
andtier2
processes. If you have defined somewhere the config valuesubstreams-tier2: true
, then this applies to you, otherwise, if you can ignore the upgrade procedure.
This release includes a small change in the internal RPC layer between tier1
processes and tier2
processes. This change requires an ordered upgrade of the processes to avoid errors.
The components should be deployed in this order:
Deploy and roll out
tier1
processes firstDeploy and roll out
tier2
processes in second
If you upgrade in the wrong order or if somehow tier2
processes start using the new protocol without tier1
being aware, user will end up with backend error(s) saying that some partial file are not found. Those will be resolved only when tier1
processes have been upgraded successfully.
Fixed
Fixed a race when multiple Substreams request execute on the same
.spkg
, it was causing races between the two executors.GUI: fixed an issue which would slow down message consumption when progress page was shown in ascii art "bars" mode
GUI: fixed the display of blocks per second to represent actual blocks, not messages count
Changed
[
binary
]: Commandssubstreams <...>
that fails now correctly return an exit code 1.[
library
]: Themanifest.NewReader
signature changed and will now return a*Reader, error
(previously*Reader
).
Added
[
library
]: Themanifest.Reader
gained the ability to infer the path if provided with input""
based on the current working directory.[
library
]: Themanifest.Reader
gained the ability to infer the path if provided with input that is a directory.
Highlights
This release contains bug fixes and speed/scaling improvements around the Substreams engine. It also contains few small enhancements for substreams gui
.
This release contains an important bug that could have generated corrupted store
state files. This is important for developers and operators.
Sinkers & Developers
The store
state files will be fully deleted on the Substreams server to start fresh again. The impact for you as a developer is that Substreams that were fully synced will now need to re-generate from initial block the store's state. So you might see long delays before getting a new block data while the Substreams engine is re-computing the store
states from scratch.
Operators
You need to clear the state store and remove all the files that are stored under substreams-state-store-url
flag. You can also make it point to a brand new folder and delete the old one after the rollout.
Fixed
Fix a bug where not all extra modules would be sent back on debug mode
Fixed a bug in tier1 that could result in corrupted state files when getting close to chain HEAD
Fixed some performance and stalling issues when using GCS for blocks
Fixed storage logs not being shown properly
GUI: Fixed panic race condition
GUI: Cosmetic changes
Added
GUI: Added traceID
Highlights
This release introduces a new RPC protocol and the old one has been removed. The new RPC protocol is in a new Protobuf package sf.substreams.rpc.v2
and it drastically changes how chain re-orgs are signaled to the user. Here the highlights of this release:
Getting rid of
undo
payload during re-orgsubstreams gui
ImprovementsSubstreams integration testing
Substreams Protobuf definitions updated
Getting rid of undo
payload during re-org
undo
payload during re-orgPreviously, the GRPC endpoint sf.substreams.v1.Stream/Blocks
would send a payload with the corresponding "step", NEW or UNDO.
Unfortunately, this led to some cases where the payload could not be deterministically generated for old blocks that had been forked out, resulting in a stalling request, a failure, or in some worst cases, incomplete data.
The new design, under sf.substreams.rpc.v2.Stream/Blocks
, takes care of these situations by removing the 'step' component and using these two messages types:
sf.substreams.rpc.v2.BlockScopedData
when chain progresses, with the payloadsf.substreams.rpc.v2.BlockUndoSignal
during a reorg, with the last valid block number + block hash
The client now has the burden of keeping the necessary means of performing the undo actions (ex: a map of previous values for each block). The BlockScopedData message now includes the final_block_height
to let you know when this "undo data" can be discarded.
With these changes, a substreams server can even handle a cursor for a block that it has never seen, provided that it is a valid cursor, by signaling the client to revert up to the last known final block, trading efficiency for resilience in these extreme cases.
substreams gui
Improvements
substreams gui
ImprovementsAdded key 'f' shortcut for changing display encoding of bytes value (hex, pruned string, base64)
Added
jq
search mode (hit/
twice). Filters the output with thejq
expression, and applies the search to match all blocks.Added search history (with
up
/down
), similar toless
.Running a search now applies it to all blocks, and highlights the matching ones in the blocks bar (in red).
Added
O
andP
, to jump to prev/next block with matching search results.Added module search with
m
, to quickly switch from module to module.
Substreams integration testing
Added a basic Substreams testing framework that validates module outputs against expected values. The testing framework currently runs on substreams run
command, where you can specify the following flags:
test-file
Points to a file that contains your test specstest-verbose
Enables verbose mode while testing.
The test file, specifies the expected output for a given substreams module at a given block.
Substreams Protobuf definitions updated
We changed the Substreams Protobuf definitions making a major overhaul of the RPC communication. This is a breaking change for those consuming Substreams through gRPC.
Note The is no breaking changes for Substreams developers regarding your Rust code, Substreams manifest and Substreams package.
Removed the
Request
andResponse
messages (and related) fromsf.substreams.v1
, they have been moved tosf.substreams.rpc.v2
. You will need to update your usage if you were consuming Substreams through gRPC.The new
Request
excludes fields and usages that were already deprecated, like using multiplemodule_outputs
.The
Response
now contains a single module outputIn
development
mode, the additional modules output can be inspected underdebug_map_outputs
anddebug_store_outputs
.
Separating Tier1 vs Tier2 gRPC protocol (for Substreams server operators)
Now that the Blocks
request has been moved from sf.substreams.v1
to sf.substreams.rpc.v2
, the communication between a substreams instance acting as tier1 and a tier2 instance that performs the background processing has also been reworked, and put under sf.substreams.internal.v2.Stream/ProcessRange
. It has also been stripped of parameters that were not used for that level of communication (ex: cursor
, logs
...)
Fixed
The
final_blocks_only: true
on theRequest
was not honored on the server. It now correctly sends only blocks that are final/irreversible (according to Firehose rules).Prevent substreams panic when requested module has unknown value for "type"
Added
The
substreams run
command now has flag--final-blocks-only
This should be the last release before a breaking change in the API and handling of the reorgs and UNDO messages.
Highlights
Added support for resolving a negative start-block on server
CHANGED: The
run
command now resolves a start-block=-1 from the head of the chain (as supported by the servers now). Prior to this change, the-1
value meant the 'initialBlock' of the requested module. The empty string is now used for this purpose,GUI: Added support for search, similar to
less
, with/
.GUI: Search and output offset is conserved when switching module/block number in the "Output" tab.
Library: protobuf message descriptors now exposed in the
manifest/
package. This is something useful to any sink that would need to interpret the protobuf messages inside a Package.Added support for resolving a negative start-block on server (also added to run command)
The
run
andgui
command no longer resolve astart-block=-1
to the 'initialBlock' of the requested module. To get this behavior, simply assign an empty string value to the flagstart-block
instead.Added support for search within the Substreams gui
output
view. Usage of search withinoutput
behaves similar to theless
command, and can be toggled with "/".
Release was retracted because it contained the refactoring expected for 1.1.0 by mistake, check https://github.com/streamingfast/substreams/releases/tag/v1.0.3 instead.
Fixed
Fixed "undo" messages incorrectly contained too many module outputs (all modules, with some duplicates).
Fixed status bar message cutoff bug
Fixed
substreams run
whenmanifest
contains unknown attributesFixed bubble tea program error when existing the
run
command
Highlights
Added command
substreams gui
, providing a terminal-based GUI to inspect the streamed data. Also adds--replay
support, to save a stream toreplay.log
and load it back in the UI later. You can use it as you wouldsubstreams run
. Feedback welcome.Modified command
substreams protogen
, defaulting to generating themod.rs
file alongside the rust bindings. Also added--generate-mod-rs
flag to togglemod.rs
generation.Added support for module parameterization. Defined in the manifest as:
and on the command-line as:
substreams run -p module=value -p "module2=other value" ...
Servers need to be updated for packages to be able to be consumed this way.
This change keeps backwards compatibility. Old Substreams Packages will still work the same, with no changes to module hashes.
Added
Added support for
{version}
template in--output-file
flag value onsubstreams pack
.Added fuel limit to wasm execution as a server-side option, preventing wasm process from running forever.
Added 'Network' and 'Sink{Type, Module, Config}' fields in the manifest and protobuf definition for future bundling of substreams sink definitions within a substreams package.
Highlights
Improved execution speed and module loading speed by bumping to WASM Time to version 4.0.
Improved developer experience on the CLI by making the
<manifest>
argument optional.The CLI when
<manifest>
argument is not provided will now look in the current directory for asubstreams.yaml
file and is going to use it if present. So if you are in your Substreams project and your file is namedsubstreams.yaml
, you can simply dosubstreams pack
,substreams protogen
, etc.Moreover, we added to possibility to pass a directory containing a
substreams.yaml
directly sosubstreams pack path/to/project
would work as long aspath/to/project
contains a file namedsubstreams.yaml
.Fixed a bug that was preventing production mode to complete properly when using a bounded block range.
Improved overall stability of the Substreams engine.
Operators Notes
Breaking Config values
substreams-stores-save-interval
andsubstreams-output-cache-save-interval
have been merged together intosubstreams-cache-save-interval
in thefirehose-<chain>
repositories. Refer to chain specificfirehose-<chain>
repository for further details.
Added
The
<manifest>
can point to a directory that contains asubstreams.yaml
file instead of having to point to the file directly.The
<manifest>
parameter is now optional in all commands requiring it.
Fixed
Fixed valuetype mismatch for stores
Fixed production mode not completing when block range was specified
Fixed tier1 crashing due to missing context canceled check.
Fixed some code paths where locking could have happened due to incorrect checking of context cancellation.
Request validation for blockchain's input type is now made only against the requested module it's transitive dependencies.
Updated
Updated WASM Time library to 4.0.0 leading to improved execution speed.
Changed
Remove distinction between
output-save-interval
andstore-save-interval
.substreams init
has been moved undersubstreams alpha init
as this is a feature included by mistake in latest release that should not have been displayed in the main list of commands.substreams codegen
has been moved undersubstreams alpha codegen
as this is a feature included by mistake in latest release that should not have been displayed in the main list of commands.
This upcoming release is going to bring significant changes on how Substreams are developed, consumed and speed of execution. Note that there is no breaking changes related to your Substreams' Rust code, only breaking changes will be about how Substreams are run and available features/flags.
Here the highlights of elements that will change in next release:
In this rest of this post, we are going to go through each of them in greater details and the implications they have for you. Full changelog is available after.
Warning Operators, refer to Operators Notes section for specific instructions of deploying this new version.
Production vs development mode
We introduce an execution mode when running Substreams, either production
mode or development
mode. The execution mode impacts how the Substreams get executed, specifically:
The time to first byte
The module logs and outputs sent back to the client
How parallel execution is applied through the requested range
The difference between the modes are:
In
development
mode, the client will receive all the logs of the executedmodules
. Inproduction
mode, logs are not available at all.In
development
mode, module's are always re-executed from request's start block meaning now that logs will always be visible to the user. Inproduction
mode, if a module's output is found in cache, module execution is skipped completely and data is returned directly.In
development
mode, only backward parallel execution can be effective. Inproduction
mode, both backward parallel execution and forward parallel execution can be effective. See Enhanced parallel execution section for further details about parallel execution.In
development
mode, every module's output is returned back in the response but only root module is displayed by default insubstreams
CLI (configurable via a flag). Inproduction
mode, only root module's output is returned.In
development
mode, you may request specificstore
snapshot that are in the execution tree via thesubstreams
CLI--debug-modules-initial-snapshots
flag. Inproduction
mode, this feature is not available.
The execution mode is specified at that gRPC request level and is the default mode is development
. The substreams
CLI tool being a development tool foremost, we do not expect people to activate production mode (-p
) when using it outside for maybe testing purposes.
If today's you have sink
code making the gRPC request yourself and are using that for production consumption, ensure that field production_mode
in your Substreams request is set to true
. StreamingFast provided sink
like substreams-sink-postgres, substreams-sink-files and others have already been updated to use production_mode
by default.
Final note, we recommend to run the production mode against a compiled .spkg
file that should ideally be released and versioned. This is to ensure stable modules' hashes and leverage cached output properly.
Single module output
We now only support 1 output module when running a Substreams, while prior this release, it was possible to have multiple ones.
Only a single module can now be requested, previous version allowed to request N modules.
Only
map
module can now be requested, previous version allowedmap
andstore
to be requested.InitialSnapshots
is now forbidden inproduction
mode and still allowed indevelopment
mode.In
development
mode, the server sends back output for all executed modules (by default the CLI displays only requested module's output).
Note We added
output_module
to the Substreams request and keptoutput_modules
to remain backwards compatible for a while. If anoutput_module
is specified we will honor that module. If not we will checkoutput_modules
to ensure there is only 1 output module. In a future release, we are going to removeoutput_modules
altogether.
With the introduction of development
vs production
mode, we added a change in behavior to reduce frictions this changes has on debugging. Indeed, in development
mode, all executed modules's output will be sent be to the user. This includes the requested output module as well as all its dependencies. The substreams
CLI has been adjusted to show only the output of the requested output module by default. The new substreams
CLI flag -debug-modules-output
can be used to control which modules' output is actually displayed by the CLI.
Migration Path If you are currently requesting more than one module, refactor your Substreams code so that a single
map
module aggregates all the required information from your different dependencies in one output.
Output module must be of type map
map
It is now forbidden to request a store
module as the output module of the Substreams request, the requested output module must now be of kind map
. Different factors have motivated this change:
Recently we have seen incorrect usage of
store
module. Astore
module was not intended to be used as a persistent long term storage,store
modules were conceived as a place to aggregate data for later steps in computation. Using it as a persistent storage make the store unmanageable.We had always expected users to consume a
map
module which would return data formatted according to a finalsink
spec which will then permanently store the extracted data. We never envisionedstore
to act as long term storage.Forward parallel execution does not support a
store
as its last step.
Migration Path If you are currently using a
store
module as your output store. You will need to create amap
module that will have as input thedeltas
of saidstore
module, and return the deltas.
Examples
Let's assume a Substreams with these dependencies: [block] --> [map_pools] --> [store_pools] --> [map_transfers]
Running
substreams run substreams.yaml map_transfers
will only print the outputs and logs from themap_transfers
module.Running
substreams run substreams.yaml map_transfers --debug-modules-output=map_pools,map_transfers,store_pools
will print the outputs of those 3 modules.
InitialSnapshots
is now a development
mode feature only
InitialSnapshots
is now a development
mode feature onlyNow that a store
cannot be requested as the output module, the InitialSnapshots
did not make sense anymore to be available. Moreover, we have seen people using it to retrieve the initial state and then continue syncing. While it's a fair use case, we always wanted people to perform the synchronization using the streaming primitive and not by using store
as long term storage.
However, the InitialSnapshots
is a useful tool for debugging what a store contains at a given block. So we decided to keep it in development
mode only where you can request the snapshot of a store
module when doing your request. In the Substreams' request/response, initial_store_snapshot_for_modules
has been renamed to debug_initial_store_snapshot_for_modules
, snapshot_data
to debug_snapshot_data
and snapshot_complete
to debug_snapshot_complete
.
Migration Path If you were relying on
InitialSnapshots
feature in production. You will need to create amap
module that will have as input thedeltas
of saidstore
module, and then synchronize the full state on the consuming side.
Examples
Let's assume a Substreams with these dependencies: [block] --> [map_pools] --> [store_pools] --> [map_transfers]
Running
substreams run substreams.yaml map_transfers -s 1000 -t +5 --debug-modules-initial-snapshot=store_pools
will print all the entries in store_pools at block 999, then continue with outputs and logs frommap_transfers
in blocks 1000 to 1004.
Enhanced parallel execution
There are 2 ways parallel execution can happen either backward or forward.
Backward parallel execution consists of executing in parallel block ranges from the module's start block up to the start block of the request. If the start block of the request matches module's start block, there is no backward parallel execution to perform. Also, this is happening only for dependencies of type store
which means that if you depends only on other map
modules, no backward parallel execution happens.
Forward parallel execution consists of executing in parallel block ranges from the start block of the request up to last known final block (a.k.a the irreversible block) or the stop block of the request, depending on which is smaller. Forward parallel execution significantly improves the performance of the Substreams as we execute your module in advanced through the chain history in parallel. What we stream you back is the cached output of your module's execution which means essentially that we stream back to you data written in flat files. This gives a major performance boost because in almost all cases, the data will be already for you to consume.
Forward parallel execution happens only in production
mode is always disabled when in development
mode. Moreover, since we read back data from cache, it means that logs of your modules will never be accessible as we do not store them.
Backward parallel execution still occurs in development
and production
mode. The diagram below gives details about when parallel execution happen.
You can see that in production
mode, parallel execution happens before the Substreams request range as well as within the requested range. While in development
mode, we can see that parallel execution happens only before the Substreams request range, so between module's start block and start block of requested range (backward parallel execution only).
Operators Notes
The state output format for map
and store
modules has changed internally to be more compact in Protobuf format. When deploying this new version, previous existing state files should be deleted or deployment updated to point to a new store location. The state output store is defined by the flag --substreams-state-store-url
flag parameter on chain specific binary (i.e. fireeth
).
Library
Added
production_mode
to Substreams RequestAdded
output_module
to Substreams Request
CLI
Fixed
Ctrl-C
not working directly when in TUI mode.Added
Trace ID
printing once available.Added command
substreams tools analytics store-stats
to get statistic for a given store.Added
--debug-modules-output
(comma-separated module names) (unavailable inproduction
mode).Breaking Renamed flag
--initial-snapshots
to--debug-modules-initial-snapshots
(comma-separated module names) (unavailable inproduction
mode).
Moved Rust modules to
github.com/streamingfast/substreams-rs
Library
Gained significant execution time improvement when saving and loading stores, during the squashing process by leveraging vtprotobuf
Added XDS support for tier 2s
Added intrinsic support for type
bigdecimal
, will deprecatebigfloat
Significant improvements in code-coverage and full integration tests.
CLI
Added
substreams tools proxy <package>
subcommand to allow calling substreams with a pre-defined package easily from a web browser using bufbuild/connect-webLowered GRPC client keep alive frequency, to prevent "Too Many Pings" disconnection issue.
Added a fast failure when attempting to connect to an unreachable substreams endpoint.
CLI is now able to read
.spkg
fromgs://
,s3://
andaz://
URLs, the URL format must be supported by our dstore library).Command
substreams pack
is now restricted to local manifest file.Added command
substreams tools module
to introspect a store state in storage.Made changes to allow for
substreams
CLI to run on Windows OS (thanks @robinbernon).Added flag
--output-file <template>
tosubstreams pack
command to control where the.skpg
is written,{manifestDir}
and{spkgDefaultName}
can be used in thetemplate
value where{manifestDir}
resolves to manifest's directory and{spkgDefaultName}
is the pre-computed default name in the form<name>-<version>
where<name>
is the manifest's "package.name" value (_
values in the name are replaced by-
) and<version>
ispackage.version
value.Fixed relative path not resolved correctly against manifest's location in
protobuf.files
list.Fixed relative path not resolved correctly against manifest's location in
binaries
list.substreams protogen <package> --output-path <path>
flag is now relative to<package>
if<package>
is a local manifest file ending with.yaml
.Endpoint's port is now validated otherwise when unspecified, it creates an infinite 'Connecting...' message that will never resolves.
CLI
Fixed error when importing
http/https
.spkg
files inimports
section.
New updatePolicy append
, allows one to build a store that concatenates values and supports parallelism. This affects the server, the manifest format (additive only), the substreams crate and the generated code therein.
Rust API
Store APIs methods now accept
key
of typeAsRef<str>
which means for example that bothString
an&str
are accepted as inputs in:StoreSet::set
StoreSet::set_many
StoreSet::set_if_not_exists
StoreSet::set_if_not_exists_many
StoreAddInt64::add
StoreAddInt64::add_many
StoreAddFloat64::add
StoreAddFloat64::add_many
StoreAddBigFloat::add
StoreAddBigFloat::add_many
StoreAddBigInt::add
StoreAddBigInt::add_many
StoreMaxInt64::max
StoreMaxFloat64::max
StoreMaxBigInt::max
StoreMaxBigFloat::max
StoreMinInt64::min
StoreMinFloat64::min
StoreMinBigInt::min
StoreMinBigFloat::min
StoreAppend::append
StoreAppend::append_bytes
StoreGet::get_at
StoreGet::get_last
StoreGet::get_first
Low-level state methods now accept
key
of typeAsRef<str>
which means for example that bothString
an&str
are accepted as inputs in:state::get_at
state::get_last
state::get_first
state::set
state::set_if_not_exists
state::append
state::delete_prefix
state::add_bigint
state::add_int64
state::add_float64
state::add_bigfloat
state::set_min_int64
state::set_min_bigint
state::set_min_float64
state::set_min_bigfloat
state::set_max_int64
state::set_max_bigint
state::set_max_float64
state::set_max_bigfloat
Bumped
prost
(and related dependencies) to^0.11.0
CLI
Environment variables are now accepted in manifest's
imports
list.Environment variables are now accepted in manifest's
protobuf.importPaths
list.Fixed relative path not resolved correctly against manifest's location in
imports
list.Changed the output modes:
module-*
modes are gone and become the format forjsonl
andjson
. This means all printed outputs are wrapped to provide the module name, and other metadata.Added
--initial-snapshots
(or-i
) to therun
command, which will dump the stores specified as output modules.Added color for
ui
output mode under a tty.Added some request validation on both client and server (validate that output modules are present in the modules graph)
Service
Added support to serve the initial snapshot
CLI
Changed
substreams manifest info
->substreams info
Changed
substreams manifest graph
->substreams graph
Updated usage
Service
Multiple fixes to boundaries
substreams
server
substreams
serverVarious bug fixes around store and parallel execution.
substreams
CLI
substreams
CLIFix null pointer exception at the end of CLI run in some cases.
Do log last error when the CLI exit with an error has the error is already printed to the user and it creates a weird behavior.
substreams
Docker
substreams
DockerEnsure arguments can be passed to Docker built image.
substreams
server
substreams
serverVarious bug fixes around store and parallel execution.
Fixed logs being repeated on module with inputs that was receiving nothing.
substreams
crate
substreams
crateAdded
substreams::hex
wrapper around hex_literal::hex macro
substreams
CLI
substreams
CLIAdded
substreams run -o ui|json|jsonl|module-json|module-jsonl
.
Server
Fixed a whole bunch of issues, in parallel processing. More stable caching. See chain-specific releases.
Fixed
substreams
crate usage from tagged version published on crates.io.
Changed
startBlock
toinitialBlock
in substreams.yaml manifests.code:
is now defined in thebinaries
section of the manifest, instead of in each module. A module can select which binary with thebinary:
field on the Module definition.Added
substreams inspect ./substreams.yaml
orinspect some.spkg
to see what's inside. Requiresprotoc
to be installed (which you should have anyway).Added command
substreams protogen
that writes a temporarybuf.gen.yaml
and generates Rust structs based on the contents of the provided manifest or package.Added
substreams::handlers
macros to reduce boilerplate when create substream modules.substreams::handlers::map
is used for the handlers corresponding to modules of typemap
. Modules of typemap
should return aResult
where the error is of typeError
substreams::handlers::store
is used for the handlers corresponding to modules of typestore
. Modules of typestore
should have no return value.
Implemented packages (see docs).
Added
substreams::Hex
wrapper type to more easily deal with printing and encoding bytes to hexadecimal string.Added
substreams::log::info!(...)
andsubstreams::log::debug!(...)
supporting formatting arguments (acts likeprintln!()
macro).Added new field
logs_truncated
that can be used to determined if logs were truncated.Augmented logs truncation limit to 128 KiB per module per block.
Updated
substreams run
to properly report module progress error.When a module WASM execution error out, progress with failure logs is now returned before closing the substreams connection.
The API token is not passed anymore if the connection is using plain text option
--plaintext
.The
-c
(or--compact-output
) can be used to print JSON as a single compact line.The
--stop-block
flag onsubstream run
can be defined as+1000
to stream from start block + 1000.
Added Dockerfile support.
Client
Improved defaults for
--proto-path
and--proto
, using globs.WASM file paths in substreams.yaml manifests now resolve relative to the location of the yaml file.
Added
substreams manifest package
to create .pb packages to simplify querying using other languages. See the python example.Added
substreams manifest graph
to show the Mermaid graph alone.Improved mermaid graph layout.
Removed native Go code support for now.
Server
Always writes store snapshots, each 10,000 blocks.
A few tools to manage partial snapshots under
substreams tools
First chain-agnostic release. THIS IS BETA SOFTWARE. USE AT YOUR OWN RISK. WE PROVIDE NO BACKWARDS COMPATIBILITY GUARANTEES FOR THIS RELEASE.
See https://github.com/streamingfast/substreams for usage docs..
Removed
local
command. See README.md for instructions on how to run locally now. Buildsfeth
from source for now.Changed the
remote
command torun
.Changed
run
command's--substreams-api-key-envvar
flag to--substreams-api-token-envvar
, and its default value is changed fromSUBSTREAMS_API_KEY
toSUBSTREAMS_API_TOKEN
. See README.md to learn how to obtain such tokens.
Last updated