Sumatra tables provide a way to enrich events with externally-loaded batch data. A lookup feature fetches the table row for the given key and returns the requested field values.
Tables are static, meaning that the contents of the table are not updated by some ongoing process. Instead, a table may be updated by uploading a new version of the table, then publishing a Scowl change to reference the new version. For dynamic table functionality, see Latest Queries.
The primary method for uploading a table is the
Returns one or more fields from the table for the specified key
Lookup<table_name>( feature(s) by feature )
Lookup<geoip>(lat, lng by zip) Lookup<region_to_iso2>(iso2 by billing_state) Lookup<product_dim>(sku, category by product_id)
To reference a table in a
Lookup feature, it must be added to a
special Scowl file named
deps.scowl. This file includes only
require statements, which tell the topology to import a particular
version of a named resource (i.e. a table).
Import a named, versioned resource (i.e. table). You may reference a single table or multiple tables using the group syntax.
require table name version require table ( name version name version ... )
require table ( geoip v20220927202524 region_to_iso2 v20220927202928 ) require table product_dim v20220927220048
require statement is only valid Scowl within your
Version identifiers are automatically generated during the table upload
process. The format is the letter
v, followed by the UTC timestamp of
A new table version is created every time a table is uploaded for a given table name. Old versions are kept around and may be referenced in the LIVE topology or materialization experiments.
The Sumatra CLI includes a
sumatra deps command to help you manage the versions in your
To fetch the latest versions of all tables and save to your local
sumatra deps update
To preview the deps without saving them:
sumatra deps list
Additionally, the sumatra CLI includes a
sumatra table command to inspect table versions.
To list all tables:
sumatra table list
To list all versions of a particular table:
sumatra table history my_table
To create a table (or a new version of an existing table), use the
create_table_from_dataframe method in the Python SDK.
In addtion to the table name and the Pandas dataframe, you must specify which column to use as the key (primary index) of the table.
df = ...query data warehouse... tbl = sumatra.create_table_from_dataframe('geozip', df, 'zip_code') tbl.wait()
- Saves the dataframe to parquet
- Uploads the parquet file to Sumatra (S3)
- Kicks off a job to validate and load the data
Because step (3) may take a while, the method returns a handle
that you can
.wait() on. The handle's
.status property indicates
the status of the job:
The table's schema is inferred from the dtypes of the Pandas
dataframe. Currently, the supported types are Scowl's basic types:
time column should appear in the dataframe as a
pandas.Timestamp. Note that due to a Pandas limitation,
int columns may not contain
null values, and will be cast to
float if any nulls as present. Null values are supported for all other types.
To inspect the schema of an uploaded table, use the following CLI command:
sumatra table schema my_table
For a dataframe to be valid for upload, it must meet all of the following criteria:
- All column names must match the pattern
- The key column must appear in the list of columns
- The values in the key column must be non-null and unique
- The dtypes of all columns must be a supported type (see previous section)
- Row count must not exceed the maximum
Sumatra's table capability was designed to meet two important change management requirements:
- Users can upload tables and experiment freely, without any worry that they will impact the LIVE topology.
- Going live with a new table version requires publishing an updated Scowl topology, and all the oversight that entails.
The typical workflow for deploying a table update is:
- Upload table from dataframe
sumatra deps updatein your branch
sumatra pushyour branch
- Run a materialization to validate the new table version
git committhe change(s) to your
- PR, code review, publish the updated scowl
Dev / Prod
Each Sumatra instance will have its own versions of resources. If, for example, the same table data is loaded into Dev and Prod, the table will be assigned a different version number in each, based on upload timestamp.
Therefore, if you plan to deploy the same folder of scowl files to multiple instances, you will need to keep a different
deps.scowl file per instance. The convention is to use the local
deps.scowl file for Prod and store the deps files
for other instances elsewhere.
There are basically two options:
- Store Dev's
deps.scowlfile in a separate folder, e.g.
- Maintain a
deps.scowl.devfile in the primary folder. Note that the filename must have an extension other than
.scowlto avoid conflicts.
To reference Dev's deps file, in CI/CD or manual use of the CLI, use the
--deps-file parameter, e.g.:
sumatra pull --deps-file deps.scowl.dev sumatra push --deps-file deps.scowl.dev sumatra deps update --deps-file ../dev/deps.scowl sumatra plan --deps-file ../dev/deps.scowl sumatra apply --deps-file ../dev/deps.scowl
Note that, regardless of the local name used, the deps are maintained as
deps.scowl in the server-side branch.
sumatra table delete command allows you to delete a table,
including its complete version history.
To prevent impacting live decisiong, Sumatra will not allow a table to be deleted if it is referenced in the LIVE topology.