Skip to content

class Client

Bases: BaseClient

Client to connect to Sumatra GraphQL API

Humans: First, log in via the CLI: sumatra login

Bots: Set the SUMATRA_INSTANCE and SUMATRA_SDK_KEY environment variables

Attributes

boto: boto3.Session property

Boto3 session object

branch: str property writable

Default branch name

instance: str property

Instance name from client config, e.g. 'yourco.sumatra.ai'

workspace: Optional[str] property

User's current workspace slug, e.g. my-workspace

workspace_id: Optional[str] property

User's current workspace id, e.g. 01ee8330-edf4-07ae-ae19-3ab915f227c8

Functions

__init__(instance=None, branch=None, workspace=None)

Create connection object.

Parameters:

Name Type Description Default
instance Optional[str]

Sumatra instance url, e.g. yourco.sumatra.ai. If unspecified, the your config default will be used.

None
branch Optional[str]

Set default branch. If unspecified, your config default will be used.

None
workspace Optional[str]

Sumatra workspace name to connect to.

None

api_key()

Return the API key for the connected workspace

Returns:

Type Description
str

API key

clone_branch(dest, branch=None)

Copy branch to another branch name.

Parameters:

Name Type Description Default
dest str

Name of branch to be created or overwritten.

required
branch Optional[str]

Specify a source branch other than the client default.

None

create_branch_from_dir(scowl_dir=None, branch=None, deps_file=None)

Create (or overwrite) branch with local scowl files.

Parameters:

Name Type Description Default
scowl_dir Optional[str]

Path to local .scowl files.

None
branch Optional[str]

Specify a source branch other than the client default.

None
deps_file Optional[str]

Path to deps file [default: /deps.scowl]

None

Returns:

Type Description
str

Name of branch created

create_branch_from_scowl(scowl, branch=None)

Create (or overwrite) branch with single file of scowl source code.

Parameters:

Name Type Description Default
scowl str

Scowl source code as string.

required
branch Optional[str]

Specify a source branch other than the client default.

None

Returns:

Type Description
str

Name of branch created

create_model_from_pmml(model, filename, comment=None)

Create (or overwrite) model from PMML file.

Parameters:

Name Type Description Default
model str

Model name, e.g. "churn_predictor".

required
filename str

Local PMML file, e.g. "my_model.xml"

required
comment Optional[str]

A comment string to store with the model version. Max 60 characters. Optional

None

Returns:

Type Description
str

A ModelVersion handle to the upload job.

create_table_from_dataframe(table, df, key_column, include_index=False)

Create (or overwrite) table from a DataFrame

Parameters:

Name Type Description Default
table str

Table name.

required
df DataFrame

DataFrame to upload as table

required
key_column str

Name of column containing the prmary index for the table

required
include_index bool

Include the DataFrame's index as a column named index?

False

Returns:

Type Description
TableVersion

A TableVersion handle to the upload job.

create_timeline_from_dataframes(timeline, df_dict)

Create (or overwrite) timeline from a collection of DataFrames—one per event type.

Parameters:

Name Type Description Default
timeline str

Timeline name.

required
df_dict dict

Dictionary from event type name to DataFrame of events.

required

create_timeline_from_file(timeline, filename)

Create (or overwrite) timeline from events stored in a file.

Supported file types: .jsonl, .jsonl.gz

Parameters:

Name Type Description Default
timeline str

Timeline name.

required
filename str

Name of events file to upload.

required

create_timeline_from_jsonl(timeline, jsonl)

Create (or overwrite) timeline from JSON events passed in as a string.

Parameters:

Name Type Description Default
timeline str

Timeline name.

required
jsonl str

JSON event data, one JSON dict per line.

required

create_timeline_from_log(timeline, start_ts, end_ts, event_types=None)

Create (or overwrite) timeline from the Event Log

Parameters:

Name Type Description Default
timeline str

Timeline name.

required
start_ts Union[DateTime, str]

Earliest event timestamp to fetch (local client timezone).

required
end_ts Union[DateTime, str]

Latest event timestamp to fetch (local client timezone).

required
event_types Optional[list[str]]

Event types to include (default: all).

None

create_timeline_from_s3(timeline, s3_uri, time_path, data_path, id_path=None, type_path=None, default_type=None)

Create (or overwrite) timeline from a JSON file on S3

Parameters:

Name Type Description Default
timeline str

Timeline name.

required
s3_uri str

S3 bucket URI.

required
time_path str

JSON path where event timestamp is found (e.g. $._time)

required
data_path str

JSON path where event payload is found (e.g. $)

required
id_path Optional[str]

JSON path where event ID is found (e.g. $.event_id)

None
type_path Optional[str]

JSON path where event type is found (e.g. $._type)

None
default_type Optional[str]

Event type to use in case none found at type_path

None

delete_branch(branch=None)

Delete server-side branch

Parameters:

Name Type Description Default
branch Optional[str]

Specify a branch other than the client default.

None

delete_model(model)

Delete a model permanently.

If the model is referenced in the LIVE topology, it cannot be deleted.

Parameters:

Name Type Description Default
model str

Model name.

required

delete_model_version(model, version)

Delete a specific version of a model permanently.

If the model version is referenced in the LIVE topology, it cannot be deleted.

Parameters:

Name Type Description Default
model str

Model name.

required
version str

Version identifier.

required

delete_openai_config()

Delete the current OpenAI configuration

delete_table(table)

Delete a table permanently.

If the table is referenced in the LIVE topology, it cannot be deleted.

Parameters:

Name Type Description Default
table str

Table name.

required

delete_table_version(table, version)

Delete a specific version of a table permanently.

If the table version is referenced in the LIVE topology, it cannot be deleted.

Parameters:

Name Type Description Default
table str

Table name.

required
version str

Version identifier.

required

delete_timeline(timeline)

Delete timeline

Parameters:

Name Type Description Default
timeline str

Timeline name.

required

diff_branch_with_live(branch=None)

Compare branch to LIVE topology and return diff.

Parameters:

Name Type Description Default
branch Optional[str]

Specify a source branch other than the client default.

None

Returns:

Type Description
dict[str, list[str]]

Events and features added, redefined, and deleted.

execute_athena(sql)

Execute a SQL query against the Athena backend

Parameters:

Name Type Description Default
sql str

SQL query (e.g. "select * from event_log where event_type='login' limit 10")

required

Returns:

Type Description
DataFrame

Query execution id

get_branch(branch=None)

Return metadata about the branch.

Parameters:

Name Type Description Default
branch Optional[str]

Specify a branch other than the client default.

None

Returns:

Type Description
dict

Branch metadata

get_branches()

Return all branches and their metadata.

Returns:

Type Description
list[dict]

Branch metadata.

get_deps(live=False)

Fetch latest dependencies from server as Scowl source require statements.

Parameters:

Name Type Description Default
live bool

Return the LIVE versions of dependencies instead of latest.

False

Returns:

Type Description
str

Scowl source code as string.

get_error_counts(start_ts, end_ts, event_types=None)

Return the number of errors from CloudWatch logs in the given time range.

Parameters:

Name Type Description Default
start_ts Union[DateTime, str]

Earliest event timestamp to count (local client timezone).

required
end_ts Union[DateTime, str]

Latest event timestamp to count (local client timezone).

required
event_types Optional[list[str]]

List of event types to include. If None, include all event types in LIVE topology.

None

get_event_counts(start_ts, end_ts, event_types=None)

Return the number of events from CloudWatch logs in the given time range.

Parameters:

Name Type Description Default
start_ts Union[DateTime, str]

Earliest event timestamp to count (local client timezone).

required
end_ts Union[DateTime, str]

Latest event timestamp to count (local client timezone).

required
event_types Optional[list[str]]

List of event types to include. If None, include all event types in LIVE topology.

None

get_features_from_feed(event_type, start_ts=None, end_ts=None, count=None, where={}, batch_size=10000, ascending=False)

For a given event type, return the feature values as they were calculated at event time.

Fetches events in descending time order from end_ts. May specify count or start_ts, but not both.

Parameters:

Name Type Description Default
event_type str

Event type name.

required
start_ts Optional[Union[DateTime, str]]

Earliest event timestamp to fetch (local client timezone). If not specified, count will be used instead.

None
end_ts Optional[Union[DateTime, str]]

Latest event timestamp to fetch (local client timezone) [default: now].

None
count Optional[int]

Number of rows to return (if start_ts not specified) [default: 10].

None
where dict[str, str]

dictionary of equality conditions (all must be true for a match), e.g. {"zipcode": "90210", "email_domain": "gmail.com"}.

{}
batch_size int

Maximum number of records to fetch per GraphQL call.

10000
ascending bool

Sort results in ascending chronological order instead of descending.

False

Returns:

Name Type Description
rows list[dict]

_id, _time, [features...] (in descending time order).

get_features_from_log(event_type, start_ts=None, end_ts=None, features=None, include_inputs=False, where=None, deserialize_json=True)

For a given event type, fetch the historical values for features, as calculated in the LIVE environment.

Parameters:

Name Type Description Default
event_type str

Event type name.

required
start_ts Optional[Union[DateTime, str]]

Earliest event timestamp to fetch (local client timezone). If not specified, will start from beginning of log.

None
end_ts Optional[Union[DateTime, str]]

Latest event timestamp to fetch (local client timezone) [default: now].

None
features Optional[list[str]]

Subset of features to fetch. [default: all].

None
include_inputs bool

Include request json as "_inputs" column.

False
where Optional[str]

SQL clauses (not including "where" keyword), e.g. "col1 is not null"

None
deserialize_json bool

Deserialize complex data types from JSON strings to Python objects.

True

Returns:

Name Type Description
rows DataFrame

_id, _time, [features...] (in ascending time order).

get_inputs_from_feed(start_ts=None, end_ts=None, count=None, event_types=None, where={}, batch_size=10000, ascending=False)

Return the raw input events from the Event Feed.

Fetches events in descending time order from end_ts. May specify count or start_ts, but not both.

Parameters:

Name Type Description Default
start_ts Optional[Union[DateTime, str]]

Earliest event timestamp to fetch (local client timezone). If not specified, count will be used instead.

None
end_ts Optional[Union[DateTime, str]]

Latest event timestamp to fetch (local client timezone) [default: now].

None
count Optional[int]

Number of rows to return (if start_ts not specified) [default: 10].

None
event_types Optional[list[str]]

Subset of event types to fetch. [default: all]

None
where dict[str, str]

dictionary of equality conditions (all must be true for a match), e.g. {"zipcode": "90210", "email_domain": "gmail.com"}.

{}
batch_size int

Maximum number of records to fetch per GraphQL call.

10000
ascending bool

Sort results in ascending chronological order instead of descending.

False

Returns:

Type Description
list[dict]

list of events: [{"_id": , "_type": , "_time": , [inputs...]}] (in descending time order).

get_live_schema()

Return the feature names and types for every event in the LIVE topology

Returns:

Type Description
dict[str, dict[str, str]]

dictionary {'event_name': {'f1': 'int', 'f2': 'bool', ...} ...}

get_live_scowl()

Return scowl source code for LIVE topology as single cleansed string.

Returns:

Type Description
str

Scowl source code as string.

get_model_history(name)

Return list of versions for the given model along with their metadata.

Parameters:

Name Type Description Default
name str

Model name.

required

Returns:

Type Description
list[dict]

Model version metadata.

get_model_version(name, version)

Return handle to a specific model version.

Parameters:

Name Type Description Default
name str

Model name.

required
version str

Model version.

required

Returns:

Type Description
ModelVersion

Model version future object.

get_models()

Return all models and their metadata.

Returns:

Type Description
list[dict]

Model metadata.

get_models_openai()

Return all OpenAI models and their metadata.

Returns:

Type Description
list[dict]

OpenAI Model metadata.

get_openai_config()

Return the current OpenAI model configuration, if any

Returns:

Type Description
dict

OpenAI Model configuration state.

get_settings()

Return settings metadata about the current workspace.

Returns:

Type Description
dict

Workspace settings

get_table_history(name)

Return list of versions for the given table along with their metadata.

Parameters:

Name Type Description Default
name str

Table name.

required

Returns:

Type Description
list[dict]

DataFrame of version metadata.

get_table_version(name, version)

Return handle to a specific table version.

Parameters:

Name Type Description Default
name str

Table name.

required
version str

Table version.

required

Returns:

Type Description
TableVersion

Table version future object.

get_tables()

Return all tables and their metadata.

Returns:

Type Description
list[dict]

Table metadata.

get_timeline(timeline)

Return metadata about the timeline.

Parameters:

Name Type Description Default
timeline str

Timeline name.

required

Returns:

Type Description
dict

Timeline metadata.

get_timelines()

Return all timelines and their metadata.

Returns:

Type Description
list[dict]

Timeline metadata.

infer_schema_from_timeline(timeline)

Attempt to infer the paths and data types of all fields in the timeline's input data. Generate the scowl to parse all JSON paths.

This function helps bootstrap scowl code for new event types, with the expectation that most feature names will need to be modified.

e.g.

    account_id := $.account.id as int
    purchase_items_0_amount := $.purchase.items[0].amount as float

Parameters:

Name Type Description Default
timeline str

Timeline name.

required

Returns:

Type Description
str

Scowl source code as string.

invite_user(email, role, resend_email=True, app=None)

Invite a user to this workspace, with the given role.

Parameters:

Name Type Description Default
email str

The user's email address

required
role str

The desired role for the user. One of {'owner', 'publisher', 'writer', 'reader'}

required
resend_email bool

If True, resend the invitation email if the user has already been invited to Sumatra

True
app Optional[str]

The name of the app to invite the user to ('optimize' or None)

None

Returns:

Type Description
dict

A dict of the user's metadata

list_users()

list all of the users in this workspace

Returns:

Type Description
list[dict]

A dataframe of the users and their metadata

materialize(timelines, features=None, start_ts=None, end_ts=None, branch=None)

Enrich collection of timelines using topology at branch. Timelines are merged based on timestamp.

Parameters:

Name Type Description Default
timelines list[str]

Timeline names.

required
features list[str]

list of features to materialize, e.g. ['login.email', 'purchase.*']

None
start_ts Optional[Union[DateTime, str]]

Earliest event timestamp to materialize (local client timezone).

None
end_ts Optional[Union[DateTime, str]]

Latest event timestamp to materialize (local client timezone).

None
branch Optional[str]

Specify a source branch other than the client default.

None

Returns:

Type Description
Materialization

Handle to Materialization job

publish_branch(branch=None)

Promote branch to LIVE.

Parameters:

Name Type Description Default
branch Optional[str]

Specify a branch other than the client default.

None

publish_dir(scowl_dir=None, deps_file=None)

Push local scowl dir to branch and promote to LIVE.

Parameters:

Name Type Description Default
scowl_dir Optional[str]

Path to .scowl files. Default: '.'

None
deps_file Optional[str]

Path to deps file [default: /deps.scowl]

None

publish_scowl(scowl)

Push local scowl source to branch and promote to LIVE.

Parameters:

Name Type Description Default
scowl str

Scowl source code as string.

required

query_athena(sql)

Execute a SQL query against the Athena backend and return the result as a dataframe.

Parameters:

Name Type Description Default
sql str

SQL query (e.g. "select * from event_log where event_type='login' limit 10")

required

Returns:

Type Description
DataFrame

Result of query

refresh_openai_config()

Refresh the OpenAI model list using the existing configuration

Returns:

Type Description
dict

OpenAI Model configuration state.

remove_user(email)

Remove a user from this workspace

Parameters:

Name Type Description Default
email str

The user's email address

required

Returns:

Type Description
dict

A dict of the user's metadata

replay(features, start_ts, end_ts, extra_timelines=None, branch=None)

Recompute historical feature values from LIVE event log on given topology branch.

This is the primary function of the SDK.

Parameters:

Name Type Description Default
features list[str]

list of features to materialize, e.g. ['login.email', 'purchase.*']

required
start_ts Union[DateTime, str]

Earliest event timestamp to materialize (local client timezone).

required
end_ts Union[DateTime, str]

Latest event timestamp to materialize (local client timezone).

required
extra_timelines Optional[list[str]]

Names of supplemental timelines.

None
branch Optional[str]

Specify a source branch other than the client default.

None

Returns:

Type Description
Materialization

Handle to Materialization job

resolve_deps(requires)

Return the resolved resources (i.e. table schemas) from the given requires statements.

Parameters:

Name Type Description Default
requires str

Scowl requires statement as code blob

required

Returns:

Type Description
str

Resolved resource definitions (table schemas) as scowl code.

resolve_deps_from_file(deps_file=None)

Return the resolved resources (i.e. table schemas) from the local deps.scowl file.

Parameters:

Name Type Description Default
deps_file Optional[str]

Path to deps file [default: ./deps.scowl]

None

Returns:

Type Description
str

Resolved resource definitions (table schemas) as scowl code.

save_branch_to_dir(scowl_dir=None, branch=None, deps_file=None)

Save remote branch scowl files to local dir.

Parameters:

Name Type Description Default
scowl_dir Optional[str]

Path to save .scowl files.

None
branch Optional[str]

Specify a source branch other than the client default.

None
deps_file Optional[str]

Path to deps file [default: /deps.scowl]

None

Returns:

Type Description
str

Name of branch created

save_deps(live=False, deps_file=None)

Fetch latest dependencies from server and save to file

Parameters:

Name Type Description Default
live bool

Return the LIVE versions of dependencies instead of latest.

False
deps_file Optional[str]

Path to save deps file [default: ./deps.scowl]

None

Returns:

Type Description
str

Full path to saved dependency file.

sdk_key()

Return the SDK key for the connected workspace

Returns:

Type Description
str

SDK key

set_openai_config(api_key, timeout_ms=None, retry_limit=None, max_tokens=None)

Create or update OpenAI model configuration

Parameters:

Name Type Description Default
api_key str

OpenAI API key

required
timeout_ms int

Timeout in milliseconds. Default 5000

None
retry_limit int

Number of retries to perform on API error. Default 3

None
max_tokens int

Maximum number of tokens to generate in a single request. Default 8192

None

Returns:

Type Description
dict

OpenAI Model configuration state.

set_user_role(email, role)

Set a user's role within this workspace.

Note that the user must already be a member of the workspace. You can use invite_user to add a new user.

Parameters:

Name Type Description Default
email str

The user's email address

required
role str

The desired role for the user. One of {'owner', 'publisher', 'writer', 'reader'}

required

Returns:

Type Description
dict

A dict of the user's metadata

update_settings(slug=None, nickname=None, billing_email=None, icon=None)

Update workspace settings metadata.

Parameters:

Name Type Description Default
slug Optional[str]

Desired slug of the new workspace. Must consist only of letters, numbers, '-', and '_'. If this slug is taken, a random one will be generated instead, which may be changed later.

None
nickname Optional[str]

A human readable name for the new workspace

None
billing_email Optional[str]

Billing email address on the account

None
icon Optional[bytes]

Binary encoding of a PNG image to use as the workspace icon. Max size 50kb

None

Returns:

Type Description
dict

A dict of the updated workspace settings

user_email()

Return the email address of the connected user.

Returns:

Type Description
str

Email address

version()

Return the server-side version number.

Returns:

Type Description
str

Version identifier