class Client
Client to connect to Sumatra GraphQL API
Humans: First, log in via the CLI: sumatra login
Bots: Set the SUMATRA_INSTANCE
and SUMATRA_SDK_KEY
environment variables
Attributes
branch: str
property
writable
Default branch name
instance: str
property
readonly
Instance name from client config, e.g. 'yourco.sumatra.ai'
Methods
__init__(self, instance=None, branch=None)
special
Create connection object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
instance |
Optional[str] |
Sumatra instance url, e.g. |
None |
branch |
Optional[str] |
Set default branch. If unspecified, your config default will be used. |
None |
clone_branch(self, dest, branch=None)
Copy branch to another branch name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dest |
str |
Name of branch to be created or overwritten. |
required |
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
create_branch_from_dir(self, scowl_dir=None, branch=None, deps_file=None)
Create (or overwrite) branch with local scowl files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scowl_dir |
Optional[str] |
Path to local .scowl files. |
None |
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
deps_file |
Optional[str] |
Path to deps file [default: |
None |
Returns:
Type | Description |
---|---|
str |
Name of branch created |
create_branch_from_scowl(self, scowl, branch=None)
Create (or overwrite) branch with single file of scowl source code.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scowl |
str |
Scowl source code as string. |
required |
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
Returns:
Type | Description |
---|---|
str |
Name of branch created |
create_model_from_pmml(self, model, filename, comment=None)
Create (or overwrite) model from PMML file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
str |
Model name, e.g. "churn_predictor". |
required |
filename |
str |
Local PMML file, e.g. "my_model.xml" |
required |
comment |
Optional[str] |
A comment string to store with the model version. Max 60 characters. Optional |
None |
Returns:
Type | Description |
---|---|
str |
A |
create_table_from_dataframe(self, table, df, key_column, include_index=False)
Create (or overwrite) table from a DataFrame
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str |
Table name. |
required |
df |
pd.DataFrame |
DataFrame to upload as table |
required |
key_column |
str |
Name of column containing the prmary index for the table |
required |
include_index |
bool |
Include the DataFrame's index as a column named |
False |
Returns:
Type | Description |
---|---|
TableVersion |
A |
create_table_from_s3(self, table, s3_uri, key_column, expected_row_count, version=None)
Create (or overwrite) table from a DataFrame
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str |
Table name. |
required |
s3_uri |
str |
S3 bucket URI. |
required |
key_column |
str |
Name of column containing the prmary index for the table |
required |
expected_row_count |
int |
Number of rows. Validated against what is ingested |
required |
version |
Optional[str] |
Use this existing empty table version. If unspecified, create a new table version. |
None |
Returns:
Type | Description |
---|---|
TableVersion |
A |
create_timeline_from_dataframes(self, timeline, df_dict, timestamp_column=None)
Create (or overwrite) timeline from a collection of DataFrames—one per event type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
df_dict |
Dict[str, pd.DataFrame] |
Dictionary from event type name to DataFrame of events. |
required |
timestamp_column |
Optional[str] |
Name of timestamp column, default |
None |
create_timeline_from_feed(self, timeline, start_ts, end_ts, event_types=None)
Create (or overwrite) timeline from the Event Feed
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
start_ts |
Union[pd.Timestamp, str] |
Earliest event timestamp to fetch (local client timezone). |
required |
end_ts |
Union[pd.Timestamp, str] |
Latest event timestamp to fetch (local client timezone). |
required |
event_types |
Optional[List[str]] |
Event types to include (default: all). |
None |
create_timeline_from_file(self, timeline, filename)
Create (or overwrite) timeline from events stored in a file.
Supported file types: .jsonl
, .jsonl.gz
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
filename |
str |
Name of events file to upload. |
required |
create_timeline_from_jsonl(self, timeline, jsonl)
Create (or overwrite) timeline from JSON events passed in as a string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
jsonl |
str |
JSON event data, one JSON dict per line. |
required |
create_timeline_from_log(self, timeline, start_ts, end_ts, event_types=None)
Create (or overwrite) timeline from the Event Log
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
start_ts |
Union[pd.Timestamp, str] |
Earliest event timestamp to fetch (local client timezone). |
required |
end_ts |
Union[pd.Timestamp, str] |
Latest event timestamp to fetch (local client timezone). |
required |
event_types |
Optional[List[str]] |
Event types to include (default: all). |
None |
create_timeline_from_s3(self, timeline, s3_uri, time_path, data_path, id_path=None, type_path=None, default_type=None)
Create (or overwrite) timeline from a JSON file on S3
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
s3_uri |
str |
S3 bucket URI. |
required |
time_path |
str |
JSON path where event timestamp is found (e.g. $._time) |
required |
data_path |
str |
JSON path where event payload is found (e.g. $) |
required |
id_path |
Optional[str] |
JSON path where event ID is found (e.g. $.event_id) |
None |
type_path |
Optional[str] |
JSON path where event type is found (e.g. $._type) |
None |
default_type |
Optional[str] |
Event type to use in case none found at |
None |
delete_branch(self, branch=None)
Delete server-side branch
Parameters:
Name | Type | Description | Default |
---|---|---|---|
branch |
Optional[str] |
Specify a branch other than the client default. |
None |
delete_model(self, model)
Delete a model permanently.
If the model is referenced in the LIVE topology, it cannot be deleted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
str |
Model name. |
required |
delete_model_version(self, model, version)
Delete a specific version of a model permanently.
If the model version is referenced in the LIVE topology, it cannot be deleted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
str |
Model name. |
required |
version |
str |
Version identifier. |
required |
delete_table(self, table)
Delete a table permanently.
If the table is referenced in the LIVE topology, it cannot be deleted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str |
Table name. |
required |
delete_table_version(self, table, version)
Delete a specific version of a table permanently.
If the table version is referenced in the LIVE topology, it cannot be deleted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str |
Table name. |
required |
version |
str |
Version identifier. |
required |
delete_timeline(self, timeline)
Delete timeline
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
diff_branch_with_live(self, branch=None)
Compare branch to LIVE topology and return diff.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
Returns:
Type | Description |
---|---|
Dict[str, List[str]] |
Events and features added, redefined, and deleted. |
distributed_materialize_many(self, timelines, features=None, start_ts=None, end_ts=None, branch=None)
Enrich collection of timelines using topology at branch. Timelines are merged based on timestamp.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timelines |
List[str] |
Timeline names. |
required |
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
start_ts |
Optional[Union[pd.Timestamp, str]] |
Earliest event timestamp to materialize (local client timezone). |
None |
end_ts |
Optional[Union[pd.Timestamp, str]] |
Latest event timestamp to materialize (local client timezone). |
None |
features |
List[str] |
List of features to materialize, e.g. |
None |
Returns:
Type | Description |
---|---|
Materialization |
Handle to Materialization job |
get_branch(self, branch=None)
Return metadata about the branch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
branch |
Optional[str] |
Specify a branch other than the client default. |
None |
Returns:
Type | Description |
---|---|
Dict |
Branch metadata |
get_branches(self)
Return all branches and their metadata.
Returns:
Type | Description |
---|---|
pd.DataFrame |
Branch metadata. |
get_deps(self, live=False)
Fetch latest dependencies from sever as Scowl source require
statements.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
live |
bool |
Return the LIVE versions of dependencies instead of latest. |
False |
Returns:
Type | Description |
---|---|
str |
Scowl source code as string. |
get_features_from_feed(self, event_type, start_ts=None, end_ts=None, count=None, where={}, batch_size=10000, ascending=False)
For a given event type, return the feature values as they were calculated at event time.
Fetches events in descending time order from end_ts
. May specify count
or start_ts
, but not both.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
event_type |
str |
Event type name. |
required |
start_ts |
Optional[Union[pd.Timestamp, str]] |
Earliest event timestamp to fetch (local client timezone). If not specified, |
None |
end_ts |
Optional[Union[pd.Timestamp, str]] |
Latest event timestamp to fetch (local client timezone) [default: now]. |
None |
count |
Optional[int] |
Number of rows to return (if start_ts not specified) [default: 10]. |
None |
where |
Dict[str, str] |
Dictionary of equality conditions (all must be true for a match), e.g. {"zipcode": "90210", "email_domain": "gmail.com"}. |
{} |
batch_size |
int |
Maximum number of records to fetch per GraphQL call. |
10000 |
ascending |
bool |
Sort results in ascending chronological order instead of descending. |
False |
Returns:
Type | Description |
---|---|
Dataframe |
_id, _time, [features...] (in descending time order). |
get_features_from_log(self, event_type, start_ts=None, end_ts=None, features=None, include_inputs=False, where=None, deserialize_json=True)
For a given event type, fetch the historical values for features, as calculated in the LIVE environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
event_type |
str |
Event type name. |
required |
start_ts |
Optional[Union[pd.Timestamp, str]] |
Earliest event timestamp to fetch (local client timezone). If not specified, will start from beginning of log. |
None |
end_ts |
Optional[Union[pd.Timestamp, str]] |
Latest event timestamp to fetch (local client timezone) [default: now]. |
None |
features |
Optional[List[str]] |
Subset of features to fetch. [default: all]. |
None |
include_inputs |
bool |
Include request json as "_inputs" column. |
False |
where |
Optional[str] |
SQL clauses (not including "where" keyword), e.g. "col1 is not null" |
None |
deserialize_json |
bool |
Deserialize complex data types from JSON strings to Python objects. |
True |
Returns:
Type | Description |
---|---|
Dataframe |
_id, _time, [features...] (in ascending time order). |
get_inputs_from_feed(self, start_ts=None, end_ts=None, count=None, event_types=None, where={}, batch_size=10000, ascending=False)
Return the raw input events from the Event Feed.
Fetches events in descending time order from end_ts
. May specify count
or start_ts
, but not both.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_ts |
Optional[Union[pd.Timestamp, str]] |
Earliest event timestamp to fetch (local client timezone). If not specified, |
None |
end_ts |
Optional[Union[pd.Timestamp, str]] |
Latest event timestamp to fetch (local client timezone) [default: now]. |
None |
count |
Optional[int] |
Number of rows to return (if start_ts not specified) [default: 10]. |
None |
event_types |
Optional[List[str]] |
Subset of event types to fetch. [default: all] |
None |
where |
Dict[str, str] |
Dictionary of equality conditions (all must be true for a match), e.g. {"zipcode": "90210", "email_domain": "gmail.com"}. |
{} |
batch_size |
int |
Maximum number of records to fetch per GraphQL call. |
10000 |
ascending |
bool |
Sort results in ascending chronological order instead of descending. |
False |
Returns:
Type | Description |
---|---|
List of events |
[{"_id": , "_type": , "_time": , [inputs...]}] (in descending time order). |
get_key_usage(self, start_date=None, end_date=None)
Get API Key usage for tenants
get_keys(self)
Get API Keys for tenants
get_live_schema(self)
Return the feature names and types for every event in the LIVE topology
Returns:
Type | Description |
---|---|
Dictionary {'event_name' |
{'f1': 'int', 'f2': 'bool', ...} ...} |
get_live_scowl(self)
Return scowl source code for LIVE topology as single cleansed string.
Returns:
Type | Description |
---|---|
str |
Scowl source code as string. |
get_model_history(self, name)
Return list of versions for the given model along with their metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Model name. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame |
DataFrame of version metadata. |
get_model_version(self, name, version)
Return handle to a specific model version.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Model name. |
required |
version |
str |
Model version. |
required |
Returns:
Type | Description |
---|---|
ModelVersion |
Model version future object. |
get_models(self)
Return all models and their metadata.
Returns:
Type | Description |
---|---|
pd.DataFrame |
Model metadata. |
get_table_history(self, name)
Return list of versions for the given table along with their metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Table name. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame |
DataFrame of version metadata. |
get_table_version(self, name, version)
Return handle to a specific table version.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Table name. |
required |
version |
str |
Table version. |
required |
Returns:
Type | Description |
---|---|
TableVersion |
Table version future object. |
get_tables(self)
Return all tables and their metadata.
Returns:
Type | Description |
---|---|
pd.DataFrame |
Table metadata. |
get_timeline(self, timeline)
Return metadata about the timeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
Returns:
Type | Description |
---|---|
pd.Series |
Timeline metadata. |
get_timelines(self)
Return all timelines and their metadata.
Returns:
Type | Description |
---|---|
pd.DataFrame |
Timeline metadata. |
get_user(self, username)
Get User.
infer_schema_from_timeline(self, timeline)
Attempt to infer the paths and data types of all fields in the timeline's input data. Generate the scowl to parse all JSON paths.
This function helps bootstrap scowl code for new event types, with the expectation that most feature names will need to be modified.
e.g.
account_id := $.account.id as int
purchase_items_0_amount := $.purchase.items[0].amount as float
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
Returns:
Type | Description |
---|---|
str |
Scowl source code as string. |
list_tenants(self)
List tenants.
list_users(self)
List users.
materialize(self, timeline, branch=None)
Enrich timeline using topology at branch.
This is the primary function of the SDK.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeline |
str |
Timeline name. |
required |
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
Returns:
Type | Description |
---|---|
Materialization |
Handle to Materialization job |
materialize_many(self, timelines, branch=None)
Enrich collection of timelines using topology at branch. Timelines are merged based on timestamp.
This is the primary function of the SDK.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timelines |
List[str] |
Timeline names. |
required |
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
Returns:
Type | Description |
---|---|
Materialization |
Handle to Materialization job |
publish_branch(self, branch=None)
Promote branch to LIVE.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
branch |
Optional[str] |
Specify a branch other than the client default. |
None |
publish_dir(self, scowl_dir=None, deps_file=None)
Push local scowl dir to branch and promote to LIVE.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scowl_dir |
Optional[str] |
Path to .scowl files. Default: |
None |
deps_file |
Optional[str] |
Path to deps file [default: |
None |
publish_scowl(self, scowl)
Push local scowl source to branch and promote to LIVE.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scowl |
str |
Scowl source code as string. |
required |
query_athena(self, sql)
Execute a SQL query against the Athena backend and return the result as a dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sql |
str |
SQL query (e.g. "select * from event_log where event_type='login' limit 10") |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame |
Result of query as a Pandas dataframe |
replay(self, features, start_ts, end_ts, extra_timelines=None, branch=None)
Recompute historical feature values from LIVE event log on given topology branch.
This is the primary function of the SDK.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[str] |
List of features to materialize, e.g. |
required |
start_ts |
Union[pd.Timestamp, str] |
Earliest event timestamp to materialize (local client timezone). |
required |
end_ts |
Union[pd.Timestamp, str] |
Latest event timestamp to materialize (local client timezone). |
required |
extra_timelines |
Optional[List[str]] |
Names of supplemental timelines. |
None |
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
Returns:
Type | Description |
---|---|
Materialization |
Handle to Materialization job |
resolve_deps(self, requires)
Return the resolved resources (i.e. table schemas) from the given requires statements.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
requires |
str |
Scowl requires statement as code blob |
required |
Returns:
Type | Description |
---|---|
str |
Resolved resource definitions (table schemas) as scowl code. |
resolve_deps_from_file(self, deps_file=None)
Return the resolved resources (i.e. table schemas) from the local deps.scowl
file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
deps_file |
Optional[str] |
Path to deps file [default: ./deps.scowl] |
None |
Returns:
Type | Description |
---|---|
str |
Resolved resource definitions (table schemas) as scowl code. |
save_branch_to_dir(self, scowl_dir=None, branch=None, deps_file=None)
Save remote branch scowl files to local dir.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scowl_dir |
Optional[str] |
Path to save .scowl files. |
None |
branch |
Optional[str] |
Specify a source branch other than the client default. |
None |
deps_file |
Optional[str] |
Path to deps file [default: |
None |
Returns:
Type | Description |
---|---|
str |
Name of branch created |
save_deps(self, live=False, deps_file=None)
Fetch latest dependencies from server and save to file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
live |
bool |
Return the LIVE versions of dependencies instead of latest. |
False |
deps_file |
Optional[str] |
Path to save deps file [default: ./deps.scowl] |
None |
Returns:
Type | Description |
---|---|
str |
Full path to saved dependency file. |
set_role(self, username, role)
Set role for user
set_tenant(self, username, tenant)
Set tenant for user
tenant(self)
Return the assigned tenant name for the connected user.
Returns:
Type | Description |
---|---|
str |
Tenant name |
version(self)
Return the server-side version number.
Returns:
Type | Description |
---|---|
str |
Version identifier |