Models

Models are a convenient way to incorporate Machine Learning into your Sumatra real-time data services. Like any other Sumatra feature, a model transforms input features into new output features. But in this case, the transformation is a machine learning model.

Benefits

One can always fetch Sumatra features from a standalone model prediction service, if that best suits your needs.

However, the key benefits of performing model inference directly in Sumatra are:

Fast, self-service deployment, without the need to manage your own microservices
Allows for post-scoring business rules like thresholds and exception lists directly in your Scowl code
Model inference works in replay as well, for consistent online-offline prediction

Workflow

To build and deploy a Sumatra model:

:one: Train your model in any package that supports PMML

:two: Upload your model to Sumatra from the CLI:

sumatra model put "my_model_name" "my_model_v1.0.pmml" --comment "Initial xgboost"

Or equivalently from the Python Client:

create_model_from_pmml("my_model_name",
                       "my_model_v1.0.pmml",
                       comment="Initial xgboost")

:three: Import the latest model version in your deps.scowl file:

require model cash_out v20230120212732

Tip

Run sumatra deps update to fetch and save the latest versions of all resources

:four: Add a ModelPredict call to your topology:

risk_score := ModelPredict<cash_out>({
  amount,
  dollars_in_out_1h,
  dollars_out_by_email,
  emails_per_bank,
  emails_per_device
}).probability_fraud

:five: Publish your branch, as usual, to go Live.

Schema

When you upload a model artifact to Sumatra, it will infer both the input and output schemas, including the names and data types for each.

Both the input and output are represented as Scowl Structs.

The Models UI presents the schema in a Scowl-like format, separated by the -> operator, e.g.

model cash_out v20230120212732 {
 amount: float,
 dollars_in_out_1h: float,
 dollars_out_by_email: float,
 emails_per_bank: float,
 emails_per_device: float
} -> {
 probability_fraud: float,
 probability_good: float
}

Warning

At model-training time, you must choose feature names that follow the strict requirements of Scowl feature names: lowercase alphanumeric with underscores (i.e. [a-z][_a-z0-9]*).

ModelPredict

To invoke the model on a particular set of features, call the ModelPredict function with the name of the model, e.g.:

ModelPredict<iris>({
 sepal_length: 5.1,
 sepal_width: 3.5,
 petal_length: 1.4,
 petal_width: 0.2
})
= {probability_setosa: 0.3, versicolor: 0.25, virginica: 0.45}

See the ModelPredict docs for full details.

PMML

Sumatra currently supports models serialized to the PMML format.

The Predictive Model Markup Language (PMML) is an XML-based language for describing models in a standard format to allow for models trained in a variety of tools to be executed in a variety of environments without requiring a specific implementation for each combination.

Supported Tools

Many machine learning packages, across a variety of languages, are able to export to PMML:

Supported Models

The full PMML standard supports a broad range of classification, regression, and preprocessing stages. Sumatra implements a core subset, which includes:

Preprocessing

Missing value replacement (Impute)
One-hot encoding

Classification

XGBoost
Random Forest
Decision Tree
Naive Bayes

Regression

Linear
Logistic