Approximate Aggregates
Approximate aggregates are cheaper to compute than exact aggregates, both in terms of run-time latency and cloud compute cost. They are a good option for high-volume data streams, where the savings is substantial and the approximation error is typically minimal.
DecayedCount
Estimates the total number of events. Samples are weighted with exponential decay based on window duration.
Warning
May return inaccurate estimates when data volume is low. Use for high-volume streams only.
Syntax:
DecayedCount([feature(s)]
[by feature(s)]
[where condition]
[last duration]
[exclusive])
Code Examples:
DecayedCount()
DecayedCount(by account_id)
DecayedCount(by email last week)
DecayedCount(by ip, user_agent where country!="us" last 10 minutes)
DecayedCount(by ip where verdict="Block" exclusive)
DecayedCount(session_id by email)
DecayedSum
Estimates the sum of values for a given feature. Samples are weighted with exponential decay based on window duration.
Warning
May return inaccurate estimates when data volume is low. Use for high-volume streams only.
Syntax:
DecayedSum(feature
[by feature(s)]
[where condition]
[last duration]
[exclusive])
Code Examples:
DecayedSum(amount by user)
DecayedSum(price by device_id last week)
DecayedSum(bad_word_count by commenter where not is_admin last week)
DecayedSum(price by email where price > 9.99)
DecayedSum(price by email exclusive)
DecayedAverage
Estimates the average of values for a given feature (EWMA). Samples are weighted with exponential decay based on window duration.
Results expected to be accurate for high- or low-volume streams.
Tip
When traffic is bursty, the ratio DecayedSum(...) / DecayedCount(...)
typically yields better results.
Syntax:
DecayedAverage(feature
[by feature(s)]
[where condition]
[last duration]
[exclusive])
Code Examples:
DecayedAverage(amount by user)
DecayedAverage(risk_score by device_id last week)
DecayedAverage(price by email where risk_score > 75 last 4 hours)
HyperLogLog
Estimates the number of unique values for a given feature. Samples are weighted with exponential decay based on window duration.
Warning
May return inaccurate estimates when data volume is low. Use for high-volume streams only.
Syntax:
HyperLogLog(feature(s)
[by feature(s)]
[where condition]
[last duration]
[exclusive])
Code Examples:
HyperLogLog(device_id by user)
HyperLogLog(email by ip last week)
HyperLogLog(ip, user_agent by device_id where is_signed_in last 2 hours)
HyperLogLog(email by ip where verdict="Allow" exclusive)