Not Another Demo Podcast: Modern SIEM Without Busting Budgets

In E47 of Not Another Demo, Scanner CEO, Cliff Crosland got to sit down with host Matt Nelson to discuss Scanner’s modern SIEM technology that can be 1/10th the cost of traditional SIEM technology all while being much faster and efficient. Matt and Cliff spoke through some of the technical details of Scanner, how they have set themselves apart and where their company goes from here with their technology.
Watch the full episode on YouTube here or read the full transcript below.
Matt Nelson
Hey everyone, and welcome to your favorite YouTube series. I mean, it’s my favorite YouTube series as well, and I know what you’re thinking, you, you’re thinking, Matt, normally you say it is our favorite YouTube series as well. But Alas, today is just me, Aaron, Anthony. Real Job’s got in the way of them recording this. So you’re just stuck with me today and, and I know what else you’re thinking. Think, Matt, how can you be so smiley, so happy, so upbeat when I am buried in logs, when I am spending a fortune on logging and have no great way to even sort through all of them, to which I say it’s because I may have a solution for you. And with that, I would like to introduce Cliff Crosland from Scanner, Scanner.dev. Did you say Scanner?
Scanner.dev, one of those, or both of those. We’re gonna spend the next, you, you know, half an hour or so, like always.
And we are gonna talk about that exact thing. Now, you know, the premise of not another demo, no PowerPoints, no marketing jargon, no acronyms. Cliff, if you use one, I might ask you what it means. If I use one, please ask me what it means and then we can go back to using ’em because we don’t wanna say all those terrible words because they’re long. And that’s why I have acronyms in the first place. So anyways, with that, please introduce yourself to the audience. And, and I’ll start with the first question of like, everyone’s calendars are packed, we’re in tons of meetings. We’ve got, we’re busier than ever, right?
People aren’t here today because they’re too busy.
Why should someone carve out some time to, to talk with Scanner about what you guys are doing?
Cliff Crosland
Yeah, if you don’t wanna spend like half a million dollars on logs this year, that might be a good reason to, to chat with us. If you wanna, like, instead of, instead of spending half a million dollars, a million dollars on logs and you wanna hire a couple of amazing people on your team, that tends to be one of the, one of the main reasons that people chat with Scanner to try to like basically take some crazy, really high volume, really expensive log source and in their current tools and move it to something far more cost effective. So that’s, if you, if you like, if you like saving money and saving time, we can talk about that too, about like the query speed on like historical data and, and what we did do at Scanner, like saving money and saving time.
Then, then that might be a good reason to chat.
Matt Nelson
That might be the best answer I’ve ever had to that question in the 46 other episodes that we’ve done of this. So I guess tell us a little bit about yourself and your background with the company and you know, what, what your thought processes were when you were starting the company up and getting things up and running.
Cliff Crosland
Yeah, absolutely. So my co-founder and I, we were early engineers, like on the founding team at this startup called Accompany because it accompanies you everywhere. It is, it is a challenging name, name a startup, i, I must say. But yeah, we, Accompany really scaled up in its log volume and the, the tool that we were using before our log bill blew up from $10,000 a year to a million dollars a year. And this is like not uncommon once you start to get to generating terabytes of, of logs per day.
And we, we felt like, okay, well let’s, let’s not spend a million dollars a year on these logs. Like what, what should we do? We ended up at that startup, we ended up dumping logs to S3 because we thought like maybe we can, maybe we can investigate later, maybe we can dig into these later.
There are some tools that allow you to query that data in cloud storage. But then it was like almost impossible to get value out of that data because the searching through cloud storage is unbelievably slow. So it was almost like a, a black hole. Like we, we lost a lot of ability to, to look at logs by doing that. But it was cheap. It was like so much cheaper. So at Scanner we thought there, there are, there are SQL based tools that are designed for cloud storage. So like Snowflake and other, other kinds of like, presto, Trino, I don’t know, like a lot of, a lot of security tools are built on top of Snowflake for example.
But we thought like, where’s the search index that’s built on top of cloud storage? You know, like where’s the equivalent of like Elastic or Splunk? But it’s built and designed for cloud storage so that the cost is way, way lower.
You’re not spending, you know, we could talk about like exactly how much it costs, but it’s like 10, 10 times, sometimes 20 times more expensive just to run the hard drives to store all these logs on like an expensive cluster. But cloud storage is so cheap. We thought like, why hasn’t someone designed a search index for cloud storage? So that’s where Scanner was born is like we, we wanted to make it really fast to search through tons of logs that are stored in cloud storage and really dramatically reduced the cost. We, we felt like, you know, logging shouldn’t be like the third highest line item on a budget for a security team or like a DevOps team who has observability needs. And it, it’s often crazy how the cost keeps blowing up and the value keeps getting lower because you’re either retaining less or you can’t really search that far in the, in the past it’s, it’s just really we did, we just thought that was like wildly expensive.
So that’s why we we started Scanner.
Matt Nelson
Yeah, I I mean I I think right, right from the start, you guys are, I mean one, so this is interesting to me ’cause you, you’re jumping into a space that’s been around quite a long time, very developed, certainly in need, I think of some, some fresh blood and a new way of doing it, especially as a lot of workloads moved from on-prem to the, to the cloud and a lot of logs were, were happening out there. I guess in terms of like going up against some behemoth companies that you’re, you’re competing against as you’re getting started, like what’s your, what’s your mindset, your game plan? Like what’s the go to market strategy?
And if that’s like I already told you we’re a lot cheaper, that’s pretty good market strategy, but like what are some of the other benefits, I guess?
Cliff Crosland
Yeah, for sure. So I think one of the things that, that’s, that’s really, it’s, it’s challenging for teams with logs in general is just getting things set up, just the infrastructure to get logs going. And so it, it may not make very much sense to like switch wholesale to another like log search tool, another SIEM tool like ours. I guess that that is an acronym. I I, yeah, Security Information and Event Managaement.
Matt Nelson
Actually I paused a minute ago because I flipped over to my browser. I was like, God, am I gonna forget what SIEM stands for? So yeah, Security Information and Event Management. Now we can just go back to SIEM because again, nobody wants to say all those words put together, we get it.
Cliff Crosland
Awesome. So the, it it, it can be a challenge just to get things centralized and it, it is really hard if, if you’re using one of these behemoth companies for, for SIEM, for observability on your logs to completely just pull a trigger and switch everything over. But one of the cool things that we have found is that we, we love to let Scanner be in augmenting a tool and a companion to those tools. And often what happens is people will take a really high volume log source that they have, which they wish they had more value in, and that’s being piped to an expensive tool and, and their log retention, they keep making it shorter and shorter to, to make like keep the cost down.
And then they’re like, well I need to go investigate on this log source and I can’t like look back beyond 15 days.
This is getting really bad. And then they can do is they can archive that off to cloud storage and then Scanner picks it up from there and can index that data and search. So it basically is, is you can free up a lot of costs and reduce costs for your expensive SIEM tool by moving some of the high volume log sources over to, to Scanner. And you don’t have to go switch wholesale. You might still have like super, super critical lower volume logs going to your SIEM, but then for these high volume logs that you would love to be able to look at, but they’re too expensive, Scanner can give you that visibility and like help you reduce that cost by switching it over.
Matt Nelson
So a couple of questions there. One, I guess, so if you had say like you said you had some of these high volume logs going to Scanner and some others staying with your, your other platform, when you need to correlate all of those back together in order to do an investigation or some sort of threat hunting, where does that sit? Is that something that you guys are capable of? Is there something else that people would, would need there to, to be able to do that? Because you may need to pull all those logs back together, right?
Cliff Crosland
Yeah, that’s, that’s a really great point. We do have, there are a couple of different paths that we’ve seen our users take. One is to use our Splunk app so that you can run a scanner query from within Splunk and then pull in like the results into Splunk and then cross correlate with your other Splunk data.
So you can install Scanner that way. But also Grafana, we have a, a plugin there, so if people are using Grafana, Loki or other log search stuff in Grafana, I don’t think Grafana is as big among security teams, but we have seen, we have seen a bunch of people switch to, to Grafana over time. I’ve I’ve been kind of surprised by how much that’s happening. But yeah, so if you, if you are using Grafana, you can run searches for multiple sources there, but we’ve also seen people, you just use our API and use like a Jupyter Notebook.
We with our, our like Python SDK that you go and query like your Splunk, your Elastic, your other tool, you also go query Scanner and then you look at the results together. But yeah, I, that’s definitely like a, a challenge where it’s like I, I do now have two tools, what do I do about that?
And the, it seems to be like for a lot of the users that we talked with, they were already kind of trying to use two tools anyways where you have like your SIEM and then you have all these logs and S3 that you’re trying to query with like Athena or something and they’re really hard to, to get down. We at least make that much better than like a, you can get answers much, much faster than Athena and, but yeah, I think in an ideal world, yeah, you, you, it would be really fun to have just one tool. And we have had like, not all of our users have done this yet, but we’ve had several users who started with Scanner on like one big high volume log source and then gradually moved all of their log sources over and stopped using what they were using before just because it, it really helped reduce the, the cost for them. So yeah, but that is definitely like a hard challenge.
Yep.
Matt Nelson
Yeah, and I mean I, I am seeing that as well because like you said, either people are shortening the amount of time that they’re keeping logs to keep the cost down. Another thing I’m I’m seeing is people just actually like, okay, well we’ll get logs from here, but we won’t get logs from here and we won’t get logs from here. And then, you know, as like an architect, I think, well, are you actually doing security now or are you just collecting logs because you have to collect certain logs because I feel like you’re leaving these huge gaps in your environment.
So talk a little bit then about like the infrastructure behind this. Like I go, okay, I’ve got some big log sources that are super expensive, I would love to move them over to scanner. Like what does that entail either from like a storage perspective or from your platform? Talk a little bit about the nuts and bolts of the platform.
Cliff Crosland
Yeah, for sure. So the way that it works is that that Scanner keeps all of your data in your S3 bucket. And so the idea is you give permission for Scanner to read your S3 bucket where your logs are. And so we’ll have, we’ll have teams who will be using a typical log search tool and they’ll have like maybe 30 days or 90 days in the tool, but then they’ll archive the rest off to S3 and those archives can get, can get analyzed by Scanner and suddenly become like much more visible. So anyways, the, the way that it works is that we launch an AWS account specifically for every single user gets their own unique AWS account.
And then you just run like, you know, there are a couple of options for infrastructure as code, but like Cloud Formation or pluming or Terraform to give IAM permission in your AWS account for Scanner to read those buckets.
And then Scanner will read the, the, the raw logs in those buckets and then we’ll create these index files. And those index files are stored in another S3 bucket in the customer’s account. So like all of the data lives with the customer, the only, the compute lives with Scanner, but there is like a, a private beta test we’re doing for self hosting with a couple of users who want to pilot this where they can run scanner in their AWS account as well. So like everything the compute and the, and the data is in their account, but for, but like if you were to come to scanner today, unless you wanna be part of that pilot, the, there you like the compute lives in Scanner’s environment and the S3 buckets or in the customer’s environment.
And then when you, the, the way that it works is to reduce to like make it so data transfer cost is zero between S3 buckets and Scanner’s compute is Scanner will launch the compute in the same region alongside your buckets.
So that way there is like cost for get and put requests against S3. But that’s so cheap because the, the get and puts are all doing like a lot of work. You’re not doing tons and tons of little get requests or tons of little puts. They’re like kinda large meaty log files and index files that we create. And so then the data transfer cost between the compute when Scanner is like analyzing and creating the index files and when Scanner is querying those index files, that data transfer cost is free, which is pretty cool.
Yeah. So you’re not shipping your logs off somewhere. A lot of people are already collecting a lot of logs like CloudTrail from AWS or like GitHub has audit logs. You can configure a pipe to an S3 bucket or CrowdStrike can use whatever Falcon data replicator to push that into buckets. But so people like, there are many ways to get the log into S3, but once they’re there a scanner will analyze them and provide fast search on them by creating these index files. So yeah, the, that’s a, that’s at a high level what that looks like. Yeah, yeah, yeah, yeah. Sorry, go ahead.
Matt Nelson
It’s hard to, I mean, you know, with us not using any sort of visuals, it’s hard to get too, too deep without having a picture on the screen of some sort. If anybody wants to see more though, of course, you know, you just get a hold of and get ahold of me or get a hold of Cliff and we’ll, I’m sure they’ll sure you’ll be happy to demo that for someone.
So a little bit more I guess in terms of like, okay, I just so that I’m understanding, so if I can get my logs to an S3 bucket Scanner can sit on top of them however I get them there. So we’re really talking about like logs from pretty much any source then right on-prem cloud because we’re, what we’re seeing is a lot of people that are still kind of stuck in limbo between, you know, having stuff on-prem, having stuff in the cloud, but either through an API or, or some other thing, they would be able to get those logs into to S3 and then you guys will can connect into the S3 bucket from there. Did I get that right?
Cliff Crosland
That’s totally right. Yeah, so like any, any logs, as long as it matches a couple of different file formats like JSON, CSV, Parquet, even plain text and syslog, like those, those will will pick up in their raw format for you. You don’t have to go and transform ’em or like insert them into a SQL schema or anything like that. Like we’ll just, we’ll take over you just point Scanner at them and then you specify like the file type and then it will, it will take it from there. So yeah.
Matt Nelson
Okay. See Anthony, Aaron, see, see how smart I am. I didn’t even need you guys today to have that conversation
Let’s talk a little bit because we started the conversation talking about like cost savings and honestly that is a huge conversation for everyone. Everyone right now, you know, budgets are strained some, some uncertainty in the economics and, and other things. So I guess, you know, in, in like general terms, when you guys see someone migrating from, you know, one of the, you know, the, the established SIEM platforms over to Scanner, like what’s a typical cost savings? Is there a way that folks could calculate what the savings would look like, I guess Talk through that a little bit.
Cliff Crosland
Yeah, for sure. So it, it’s interesting, a lot of the, the traditional log tools, if you look at how much they charge, they, they often charge based on volume. You can also like, I don’t know, some of ’em have different pricing plans where you can pay for like compute workload versus ingestion volume, but ingestion volume is pretty common everywhere. And the wild thing to us is that ingestion volume pricing is typically like measured in dollars per gigabyte. And that is, that is way too high in our opinion like that, you know, it’s so easy to get to the point where you’re generating like hundreds of gigabytes or terabytes per day.
And we’ve talked to companies who are generating dozens of terabytes per day of logs and then the cost is just unbelievably high in these tools that are, you know, multiple dollars per per gigabyte means like millions, tens of millions a year.
Yeah, yeah. So, so the way that Scanner’s pricing works is we think the price of of log volume should be tens of cents per gigabyte. That feels way more fair and it, it really dramatically reduces the cost down. And the reason we can do that is because like Snowflake and other tools that use cloud storage as their storage layer, the, we think that that is where massive log volumes should go. Like they should go in the cheapest possible storage location and the, instead of building a giant indexing cluster with really expensive hard drives, tons of your log data is in-memory, like, I don’t know, there’s the old tools that was fine when you were generating, you know, gigabytes of logs per day, but now that everyone generates hundreds of gigabytes or terabytes per day, yeah, yeah, it’s gotta go somewhere cheaper.
So yeah, it’s, it, the cost saving will be like, it’s, it’s a five to 10 x reduction in like cost per gigabyte for sure compared to the traditional tools.
And, and we can, it’s, it’s easy to do that because of the new storage medium that that’s being used for, for the search.
The, the interesting thing there is you can do that, like if you, if you use like a Amazon Athena or something like that to go and like run a data lake query or whatever on, on your logs in cloud storage, that’s also, you know, tens of cents per gigabyte to store that data. That’s, that’s, it’s still, it’s cheap. But the problem that we found with those is that just the querying was slow and we thought like, is there a cool way we can change things up so that we can keep the querying really fast even though we’re using cloud storage? And that’s where like our index files and stuff came in, in into play.
But anyways, yeah, so, so our users, like we’ve had users who were like looking at using things that would cost like a quarter of a million, a half a million dollars a year and they went with us and it’s more like 25K, 50K a year and, and, and, and smaller and bigger, like it depends on how much log volume you have, but it was like, whoa, this is like a giant step down and cost compared to some of the other tools just because of like the, the new architecture.
Matt Nelson
Yeah. And the, and the size of company that you’re working with, doesn’t matter does matter or do you sit in a certain place, I guess talk through like what a what a typ typical customer looks like.
Cliff Crosland
Yeah, our typical customers are mid-market enterprise probably, I guess might be a way to say it, which is like a couple hundred employees, maybe like a thousand employees max. I think we, we also have like some really small teams using Scanner very happily, but they, they also probably could use an expensive tool happily because they, their log volume is so small, it’s still not that expensive. But it seems to us once you have like hundreds of employees to like maybe a thousand or a few thousand employees that your security team might be five to 20 people and your log volume’s absurd and like it’s preventing you from hiring people.
Like that’s when it, it’s really intense and the pain is really strong. Whereas I think like you go to even bigger companies, they might all be on-prem and might not actually have much in cloud storage.
So it depends. Some, some really big companies we’ve talked to do have like a lot of data in the cloud. But I think it’s like the people, our, our typical customer is like, I guess, I don’t know, is it upper mid-market? I don’t know exactly what it is, but it’s like a couple hundred to like a few thousand employees probably.
Matt Nelson
Yeah. Okay. And so to tag onto that question, I feel like looking at a platform like Scanner gives people a way to, or an opportunity to kind of re-architect how they’re even doing logging and all of that. So, but some even I, I think to some security people, some of the, some of the names that you threw out may not be as familiar.
So I guess talk a little bit about working with Scanner. I’m a company, I’m interested, I’d like to do this, I’d like to take this opportunity kind of re-architect or rethink about logging in my organization.
Walk us through, like, do you help people kind of put those pieces together, have those conversations, what a POC looks like? I guess talk a little bit about the pre-sales motion that you guys have.
Cliff Crosland
Yeah, for sure. So one of the things that we really care about is just helping people solve problems. Like, it might be the case that as we meet with them, that Scanner isn’t the right thing, but we’ll recommend like, oh, like you, if you have this particular thing and all of your data is like perfectly tabular and, and stuff, like maybe you want to put that into Snowflake or like consider that. Or if you already have Snowflake running, maybe that’s your solution or something. So what we do is we meet with people to talk about, but I think the, the reason people meet with us in the first place is like, oh wow, yeah, this is expensive for us.
This is like, this is like insane. Can you help me here? And then we’ll talk through, okay, cool. Like what are the log sources you have, what are the features that you need that, that are really critical to you to use on these different log sources that you have?
Okay, some of these might make sense to move into an S3 bucket and here’s how you can do it. There’s like, we recommend, you know, a bunch of different tools of different approaches. A lot of, a lot of things already have like a, a built in way. A lot of, you know, a a lot of tools have a way to to archive stuff to S3 on their own. That’s one approach. You can also spin up like log pipelines and stuff. But anyways, we help people solve the problem to get certain kinds of really high volume log sources off to, into S3. And then from there we, we do a trial with them to say, okay, cool, over the next 30 days, let’s see, like is this meeting your needs?
Like scanner, Scanner is young, it doesn’t have like 20 years of features like some of the incumbents do for sure.
And so like, but like is Scanner search and the like stats aggregations we have, is that, is that good enough for you for these? And how about the detection rules and like the detections as code where you can like, I don’t know, write the YAML files and sync ’em with GitHub. Is this meeting your needs or do you need something additionally?
Yeah, so we just try to figure out what do, what do people need and what makes sense and we will, we’ll work with them to like sit down with them and, and they’ll share their screen sometimes and we’ll walk through like, here’s how you can set up this log source to move over here and over there and just help ’em solve problems with, with their logs and try to get those costs down, but, but get the costs down and still get the kind of coverage that they want.
Like we are excited to, to have like a cool roadmap with lots of features on it and like machine learning detections are on the roadmap. That’d be sweet. But if you really, really essentially need that and your SIEM provides that then like the maybe for some of those log sources, yeah, keep it over there. But for others where it’s like, oh, a simple like more lightweight, straightforward system where I can write any query I want to in Scanner to do a detection and that’s enough for me. Like then you can move that and then suddenly you have like $400,000 next year to or to, to play with, you know, like, yeah.
So yeah,
Matt Nelson
Which I mean the, those sorts of numbers can make such a huge impact for, you know, for any company or security organization inside of a company. For sure. So a couple of things there. One post-sales, right, as a company that is now part of the, the Scanner family. Do you continue that sort of like assistance and, and as you’re releasing new features, do you have a regular cadence with your customers? Talk a little bit through the post-sales process
Cliff Crosland
Process. Yeah, for sure. We, we create private Slack channels for every single customer and we hang out in there all the time and we, we meet with people, it is like for some teams it’s as frequent as like every other week. For other teams it’s like maybe monthly or like for some that are like, I got this covered like quarterly. But yeah, and sometimes it spikes, it’s like actually we have a brand new initiative and we’re thinking about adding this tool and like using that as a destination for your detection alerts. Like how can they play well together then we’ll meet like a ton like weekly for, for a while. So yeah, we really love to just be a resource to help people solve problems with a, with a log infrastructure.
So, but it tends to be, yeah, some, it’s a regular cadence.
It’s, I think on average it’s probably monthly. Like we will meet up, we’ll look through the problems that they have, but there are some teams that are like, actually I’m good and like, you know, we, but I I think for everyone that we, like right after they start to become a customer, it’s quite regular until they’re like comfortable and then it’ll spike up when they have like a new initiative or they wanna try something new or they’re like, oh, you, you released a cool new feature. Like let’s play with that and let’s meet about it. So yeah, we really love to, we really love hanging out with our, our customers and making sure that they’re successful and like, whether it’s using Scanner to solve a problem or other tools, we, we love to, to brainstorm with them and figure that out.
Matt Nelson
I would just imagine given how your technology is that you get some real wizards, like as customers, like some super smart people that are probably in there like doing stuff and whatever. So I am sure of that. So I guess, and also as a young company, I, I know you’re probably taking requests from customers and, and incorporating those features and functionalities into the platform. So obviously we can’t talk roadmap stuff ’cause that’s super secret, but tell us, give us one like recent ADD-on or option or feature that you got from one of your customers that you have implemented into the system.
Cliff Crosland
Yeah, for sure. I think the most recent one was this, our detections as code like continuous integration, continuous development deployment integration with GitHub, but basically the early version of detections, you can just jump into the Scanner UI and create a scanner query, talk about how the time range you want to run it on what you want to have happen when it, the detection goes off where the alerts go all in the UI. But we had people say, I really wanna figure out like a good way to do change management auditing. Like figure out who’s changing, what can I like test these? Can I write unit tests for them?
So we did is we built like a a command line tool that you can run and you can write code or like these, these files.
I actually don’t know what YAML stands for. I think it’s yet another markup language or something. Sorry, that’s another acronym.
Matt Nelson
You know what, I’ll put it in the description because I dunno what it means either.
Cliff Crosland
Yeah. But like, like yeah, there you can write up basically your detections as code in a GitHub repository and then people collaborate on them. You can write unit tests in there. You can run the test on your laptop as you’re making the detection and then the, you can connect Scanner to your GitHub repository and then it will sync to, to Scanner. It will run all the tests. And if all the tests pass, then it will like update all your detection rules. So that was one thing where people were really wanted a, a clean way to collaborate on detections and make sure that they had good like detection review and like a good detection lifecycle.
Like, well, I can, I can change this, I can review it, I can say this is not good anymore, so let’s delete it. And I, I have a full history in GitHub of, of how that works. And so that was a recent feature we added a bunch of users asked about it, wanted that.
Matt Nelson
Nice, nice. Well I love it. We’ve been talking for a while now. You know, I’ve been a fan of what you guys are doing over there for, gosh, we probably first met like a year or so ago. I would imagine.
So. Hey folks out there, if you are spending too much money on logging, that’s everybody. And you’d like to learn a, a ways or better ways to be able to store more logs, do better security, save your organization some money, work with a great group of people, be able to have some input into a platform as it continues to build and grow. Talk with the folks over at Scanner. Cliff, thank you so much for being a part of this. Thanks, thanks for putting up with, with only me today, but I’m very happy that we finally got this episode out. I think this is a really important topic and I’m excited for everyone’s feedback. And yeah, with that everyone oh yes, as I have to do like, subscribe, all that YouTubey kind of stuff. We appreciate every one of you and we will see you again for episode 48. Thanks everybody.