Skip to content

0.10.0

Released: Sep 29, 2022

Tables

This release introduces Tables, a new core capability that allows you to enrich events with externally-loaded batch data.

While Sumatra has supported dynamic lookup features for a while, via Latest Queries, that approach is cumbersome for simple static lookup tables.

With this release, you can now self-sufficiently push a Pandas dataframe to Sumatra and make it available instantly, without any ongoing maintenance required. The syntax for the new Lookup feature looks like:

shipping_geo := Lookup<geoip>(lat, lng by shipping_zip)

Check out the details in the Tables reference for how to upload and use tables.

Dependencies

In support of the new table functionality, we introduce a new concept in this release: dependencies. Dependencies are resources (such as tables), present in a particular Sumatra instance, that may be incorporated into your topology.

Now a special deps.scowl file may be included in your branch to add require statements that import these resources, e.g.:

require table (
    geoip v20220927202524
    region_to_iso2 v20220927202928
)

See Dependencies for details on how to manage dependencies for tables.

Extended Latest Windows

Prior to this release, the longest window supported in a Latest query was 30 days. That is still the default if no window is specified. However, you can now extend the time window by specifying a longer duration, e.g.

signup_ip := Latest<signup>(ip by email last 365 days)

This enhancement is particularly useful for implementing dynamic batch features with slow-changing dimensions.

The longest supported duration is now 2 years. Note that extending the window of an existing feature will not extend the life of data written prior to the change. This is because TTL is set at write-time. To extend the TTL, reload the data after the Scowl window extension has been published to LIVE.

Warning

Consult your internal data retention policies to ensure that features containing customer information remain in compliance.

SQL Access (alpha)

Sumatra maintains what is effectively a data lake of all data that has been ingested into the platform or generated by it. The data include:

  • The raw log of JSON events sent to the /event API
  • The log of enriched features computed by the LIVE topology
  • Timelines
  • Materializations
  • Tables

With this release, users now have query access to this data in the Athena SQL dialect. To query the data lake and return a dataframe in Python, use the new query_athena method, e.g.:

sql = """
    SELECT
    *
    FROM purchase
    WHERE _time > timestamp '2022-09-01 01:00'
"""
df = sumatra.query_athena(sql)

In this alpha release, configuration of tables and views is a manual process. To access something in particular, please reach out to Sumatra support to provide you with the requisite DDL statement.

Dashboard (alpha)

The console Dashboard now exposes operational metrics previously available only to admins via Cloudwatch.

The dashboard reports event counts, latencies, and error rates, with markers at publication times to allow for easy correlation of LIVE changes and impact.

Dashboard

Note that the dashboard is under active development and will change in look and feel in upcoming minor releases.

Standard Library Additions

A few new functions and constants have been added to the Scowl standard library:

Fine-grained Durations

While these durations are typically too small to be useful as aggregate windows, they may be helpful when measuring the time between events, e.g.

fast_click := click_time - view_time < 500 milliseconds

Trig Functions

Trig functions can be useful for modeling cyclical features like hour of day, e.g.:

hour_cos := Cos(2 * Pi() * hour_of_day / 24)
hour_sin := Sin(2 * Pi() * hour_of_day / 24)

Split

The new Split function tokenizes a string, given a delimiter.

Other Improvements

  • JSON encoder: encode NaN and +/- Inf floats as nulls
  • Update CloudWatch dashboard to correctly report OpenSearch free space
  • Finish migration to v2 Dynamo client
  • Expose user-defined types and table defs in Workshop
  • Fix bug that threw materialization error when feature names were SQL keywords