September 11, 2025

Introducing New Statistical Aggregations: Average, Percentile, Variance, and More

We’re excited to announce the release of new statistical aggregation functions in Scanner’s query language, which helps you explore your logs in powerful ways.

Introducing `stats` queries

Scanner supports a new stats query feature, which gives you the ability to compute statistical aggregations.

* | stats <functors> by col1, col2, ...

For example, let’s say you would like to know if any employees are querying your company’s S3 buckets at high levels. This could indicate that an employee’s user identity has been compromised and is being used to steal data from S3.

Here is a stats query in Scanner that retrieves all of the S3 requests made by employee IAM user identities and then compute the average, median, and 90th percentile of request counts by user.

userIdentity.type: "IAMUser" and eventSource: "s3.amazonaws.com"
| stats count() as numReqs, userIdentity.arn by userIdentity.arn
| stats avg(numReqs), percentile(50, numReqs),
  percentile(90, numReqs)

Users in the 90th percentile might be suspect, so you can then drill down into the activity of these users and check for malicious behavior.

userIdentity.type: "IAMUser" and eventSource: "s3.amazonaws.com"
| stats count() as numReqs, userIdentity.arn by userIdentity.arn
| where numReqs >= 158

New visualizations

When you execute a stats query, Scanner allows you to visualize the results in a few ways. You can view a simple bar chart that demonstrates the total aggregation breakdown, or you can view time-binned bar charts and line charts that display how the aggregations have evolved over time.

Statistical functions available with `stats`

When you use the stats query feature, there are several statistical functions you can use to explore your data.

count() – compute the total count of hits per group
countdistinct(col, ...) – compute the distinct number of values in a column
avg(col) – compute the average of a numeric column across all groups
var(col) – compute the sample variance of a numeric column
percentile(n, col) – compute the n-th percentile of a column
sum(col) – compute the sum of a numeric column
min(col) – compute the minimum value of a column
max(col) – compute the minimum value of a column

Powerful, fast data exploration

We want queries to be fast so that investigations can be performed as quickly as possible, and this includes stats queries. When result sets get large, Scanner uses probabilistic algorithms and data structures to produce approximate answers with low error. For more details, check out our docs: https://docs.scanner.dev/scanner/using-scanner/aggregations.

Now that these statistical query functions are in place, we are building several cool features on top of them. Stay tuned!

‍

Cliff Crosland

CEO, Co-founder

Scanner, Inc.

Cliff is the CEO and co-founder of Scanner.dev, which provides fast search and threat detections for log data in S3. Prior to founding Scanner, he was a Principal Engineer at Cisco where he led the backend infrastructure team for the Webex People Graph. He was also the engineering lead for the data platform team at Accompany before its acquisition by Cisco. He has a love-hate relationship with Rust, but it's mostly love these days.

Back to Blog

Introducing New Statistical Aggregations: Average, Percentile, Variance, and More

Introducing stats queries

New visualizations

Statistical functions available with stats

Powerful, fast data exploration

Share this article

Introducing `stats` queries

Statistical functions available with `stats`