Skip to content

Approximate Aggregates

Approximate aggregates are cheaper to compute than exact aggregates, both in terms of run-time latency and cloud compute cost. They are a good option for high-volume data streams, where the savings is substantial and the approximation error is typically minimal.

DecayedCount

Estimates the total number of events. Samples are weighted with exponential decay based on window duration.

Warning

May return inaccurate estimates when data volume is low. Use for high-volume streams only.

Syntax:

DecayedCount<event>(query)

Code Examples:

DecayedCount()
DecayedCount(by account_id)
DecayedCount(by email last week)
DecayedCount(by ip, user_agent where country!="us" last 10 minutes)
DecayedCount(by ip where verdict="Block" exclusive)
DecayedCount<page>(session_id by email)

DecayedSum

Estimates the sum of values for a given feature. Samples are weighted with exponential decay based on window duration.

Warning

May return inaccurate estimates when data volume is low. Use for high-volume streams only.

Syntax:

DecayedSum<event>(query)

Code Examples:

DecayedSum(amount by user)
DecayedSum(bad_word_count by commenter where not is_admin last week)
DecayedSum(price by email where price > 9.99)
DecayedSum(price by email exclusive)
DecayedSum<purchase>(price by device_id last week)

DecayedAverage

Estimates the average of values for a given feature (EWMA). Samples are weighted with exponential decay based on window duration.

Results expected to be accurate for high- or low-volume streams.

Tip

When traffic is bursty, the ratio DecayedSum(...) / DecayedCount(...) typically yields better results.

Syntax:

DecayedAverage<event>(query)

Code Examples:

DecayedAverage(amount by user)
DecayedAverage(risk_score by device_id last week)
DecayedAverage(price by email where risk_score > 75 last 4 hours)
DecayedAverage<purchase>(price by account_id)

HyperLogLog

Estimates the number of unique values for a given feature. Samples are weighted with exponential decay based on window duration.

Warning

May return inaccurate estimates when data volume is low. Use for high-volume streams only.

Syntax:

HyperLogLog<event>(query)

Code Examples:

HyperLogLog(device_id by user)
HyperLogLog(email by ip last week)
HyperLogLog(ip, user_agent by device_id where is_signed_in last 2 hours)
HyperLogLog(email by ip where verdict="Allow" exclusive)
HyperLogLog<login>(ip by email_domain last week)