September 13, 2023

Faster Querying, Basic Aggregations, and Saved Queries

We’re excited to announce the release of a few new features our customers have been asking for.

Even faster querying

Queries are now powered by a new monoid data structure server we built in Rust. The monoid server is about 2x faster than Redis for our specific use case, and we’ll share more on that in a future blog post. We will likely cover interesting lessons learned while tackling some hefty systems engineering problems, including why it is important to experiment with alternative memory allocators like jemalloc.

Performance on a 25TB data set

Needle-in-haystack: Searching for a UUID takes ~3 seconds.

Worst case query: Wildcard * query that matches all 25TB of logs takes a little less than one minute.

25TB Query
Scanner worst-case query performance: Less than 1 minute to scan 25TB

Compare with worst case query performance in AWS CloudWatch, which would take 300 minutes to scan 25TB, a factor of 300x slower than Scanner.

AWS CloudWatch worst-case query: 300+ minutes to scan 25TB
Basic aggregations

Scanner now supports these basic query aggregations:

  • countdistinct: Return the number of distinct values there are for a field. Uses a hyper-log-log-plus sketch to give a low-error estimate for high cardinality data sets.
  • groupbycount: Group results by a key and display the number of results for each key. Also uses a hyper-log-log-plus sketch to give a low-error estimate for high cardinality data sets.
  • max: Return the maximum value of a field across all hits.
  • count: Return the number hits.
Basic aggregation- `groupbycount`
Saved queries

Users can now save queries and share with their team. This should reduce the need to tab-switch between internal wiki pages and the search tool, and it should also help new team members become productive faster.

Scanner Saved Query

You can learn more about our Query Syntax in our documentation here.

Coming soon

Over the next few months we’ll be rolling out the following new features:

Advanced aggregations
The new monoid data structure server will allow us to support more advanced aggregations in the coming months, including the ability to build more sophisticated result tables with many columns. This will make it easier to create advanced detection rules.
Real-time detection rules engine
Our new monoid data structures allow us to build a detection rules engine that runs on logs as they are indexed in real-time. Alerts will be sent to custom webhooks, Slack, or PagerDuty.
Querying across multiple AWS accounts
We’ve been getting requests to support querying logs from multiple AWS accounts from one Scanner instance. Logs and index files will be stored locally in the S3 buckets in each account, and the instance will read from them all and aggregate search results.
We would love to hear about your security data lake use cases
If you have security data lake use cases you would like to see a tool support, please reach out. We think Scanner is on its way to being the most useful security data lake tool in the world, and we want to hear from you so we can improve it even more.

We believe that traditional log architectures are broken for modern log volumes. Scanner enables fast search and detections for log data lakes – directly in your S3 buckets. Reduce the total cost of ownership of logs by 80-90%.
Photo of Cliff Crosland
Cliff Crosland
CEO, Co-founder
Scanner, Inc.
Cliff is the CEO and co-founder of Scanner.dev, which provides fast search and threat detections for log data in S3. Prior to founding Scanner, he was a Principal Engineer at Cisco where he led the backend infrastructure team for the Webex People Graph. He was also the engineering lead for the data platform team at Accompany before its acquisition by Cisco. He has a love-hate relationship with Rust, but it's mostly love these days.