Believe in Serverless Podcast: Serverless Speed Rust vs. Go, Java, and Python in AWS Lambda functions

Our CEO and Co-founder Cliff Crosland joins Danielle Heberling and the Believe in Serverless community for a discussion about innovative ways Scanner uses AWS Lambda functions to perform fast full-text searches over large volumes of log data stored in S3.
Through experimenting with different programming languages, including Rust, Go, Java, and Python, we found Rust and Go to be the most efficient for our high-performance needs, particularly for handling massive data sets while minimizing compute costs. We also highlighted the importance of choosing the right data formats and parsing libraries to optimize performance.
Watch the full discussion on YouTube here and you can find the full transcript below.
Danielle Heberling
Good morning, good afternoon, good evening, wherever you’re at in the world. Welcome to our stream today. Today we have a really exciting talk. We’re gonna have Cliff Crosland. He’s the CEO and co-founder of a company called Scanner. And he’s going to share innovative ways that they’ve used Lambda functions with various programming languages to perform fast full text search over insanely large volumes of logs and data lakes. So let me bring Cliff up here. Hey Cliff. Hello. How are you doing today?
Cliff Crosland
I’m doing great. How are you?
Danielle Heberling
Really good. Yeah, really excited for the talk. A lot of times with Lambda sometimes people talk about, you know, smaller use cases. This seems really awesome with Data Lakes and such.
Cliff Crosland
Yeah, it’s, it’s a, it’s a kind of a new, we, we were just trying to get creative with solving this problem, which is can get extremely expensive if you are, have a bunch of computes sitting there idle all the time. So, yeah, if you, with like massive log data sets, if you, if you’re using Lambdas and you only need them for a short period of time, you can get a lot of compute suddenly and then in like one big burst and then drop it down to zero rapidly and then not really pay very much money for it. So it’s a, it’s a really cool way to do like big data stuff too.
Danielle Heberling
Super excited to hear about it. Folks listening along. Feel free to ask questions in the chat as we go. Cliff’s gonna do his best to answer them on the fly. I might also interrupt you, but otherwise we’ll, we’ll have plenty of time for questions at the end too. So I’m gonna bring up your slides here.
I think we’re all set. Yeah, I’ll be here if you need me. Cliff, I’m gonna go off screen now.
Cliff Crosland
Great, thank you so much. Cool. Today I wanna talk about something that was really fun to, to play with, which is trying to get the best performance possible for our use case and trying out different programming languages with Lambda functions in AWS to, to do really massive sort of scan and analysis of data in S3. And so we compared a couple of different languages and I’m gonna go through like the details of what the performance looked like and, and what the takeaways were for for us. So if you’re gonna do something like what we do, like doing, doing crazy stuff on, on massive data sets in cloud storage, we, we have some recommendations.
Cool. Just to give a little bit of background on me, I’m the co-founder of a company called Scanner. And what we do is we provide security teams and also DevOps teams and other people who need observability on massive log data sets.
We at our, at our prior startup, my co-founder and I, we really struggled with the cost of logs and we, we felt like there’s a much cheaper and more efficient way to do it. And I’ll talk about how we decided to tackle this problem, but I just, one thing I’ll note is that I, I do love Rust a lot, but sometimes I hate it, but maybe it’s, I hate it because it’s forcing me to do things for my own good and that might be a little bit of foreshadowing about which language we decided to go with for our lambda functions. But yeah, the, that’s a little bit about me and really enjoy using Rust.
But let’s, let’s talk about the, the problem that we need to solve. So it’s kind of breathtaking if you look at the common crawl of the web, so this is like what a lot of large language models are trained on.
It’s, it’s basically like this open public data set that tries to every month give you a dump of the all the new websites and like an up-to-date view of like the internet or the publicly accessible internet. And that can be like hundreds of terabytes a month. And the wild thing is the, like, you know, pretty routinely you’ll get companies, like an individual enterprise will generate about as much log data as the internet produces webpages every month. So like, so the entire internet is on the order of petabytes in size, maybe tens of petabytes. But there are users we work with who have 10 tens of petabytes of logs, which is wild. And we tend to focus on the security use case. So we tend to have users who have either hundreds of terabytes or, you know, single digit petabytes of logs that they want to analyze and they really need fast search on that.
So the problem that we have to solve is like a crazy scale problem where we want to search through hundreds of terabytes to single digit petabytes of logs in cloud storage and allow people to do that in seconds and not wait like hours or sometimes days for queries to finish. I’ll talk about how we, what like we do and how we use land of functions for this. But the idea is that we generate these really efficient indexes for logs and those indexes are stored in S3 and we, we connect with users as three buckets and they have like massive data sets and yeah, it’s, it, it can be really painful to search through that, that data.
We can talk about different alternatives, but we’ll just really focus on how we decide to solve it. But these indexes that we try to make it very friendly to cloud storage and very friendly to like really massive map reduce type job where you can spin up a lot of compute altogether with lambdas for the query and then reduce that down to results that you show to the user.
And so how the indexing works and, and queries is we as users upload new logs. So like there are lots of different kinds of logs, mostly security logs that we work with, like cloud audit logs or like, you know, that you’ll have XDR or EDR, but the basically these, these like agents running on your employee’s laptops that are tracking all of the processes activity and sending that to a centralized place to be analyzed. But there’s just massive log volumes from these. And we will index this and we will save these S3, we’ll save the index files in the users S3 bucket and we merge these index files together over time to make them faster to search. But basically, yeah, the entire data layer lives in S3 and so queries need to finish rapidly.
And one way to do this would be to have a really massive idle query cluster with tons of CPUs available to go and do like a massive analysis of lots of data in parallel, but then you’re paying for all this idle query time.
The beauty of Lambda functions is you can burst like pretty rapidly to massive scale and then like bring it back down again and not pay for idle compute all the time. I, if you were to query continuously for 24 hours, then one of the things that we, our, our architecture allows for is like you can run our queries on queryors on like an EC2 instance or something hanging out all the time. But in the most common cases, people want to jump in and investigate a few things and dig into the data, want answers rapidly, and then they, they back off. And Lambdas are a really cool, a really cool fit for to, to solve this particular problem.
So in the early days of scanner, we were trying to decide how to use Lambdas and whether this was like a, a feasible solution. So we tried a bunch of different languages.
We like, we’ll I’ll show you some results for Python, Java, go and Rust. We, I think JavaScript would be good to, to have experimented with here as well. But the, we as, as you probably you can probably tell, we went with, we went with like more systems level languages that really need to be unbelievably efficient. JavaScript can be really crazy efficient, but the, the ability to process mass amounts of data handle concurrency and stuff was really important to us. So we went with those. But I’ll, what, what I’ll do here is in this talk, I’ll, I’ll talk about our experience doing experiments with each of these languages and that the performance trade-offs and the things that we discovered in the details of running these different languages in, in lambda functions and how fast they were. So I’m going to highlight the key takeaways real quick.
And there are these six takeaways. The first is that Rust and Go are very fast for this use case to analyze a lot of S3 data and parallel. And I would say like we could have gone with either Go or Rust to solve this problem and it would’ve been both would’ve been good choices from a performance perspective. We went with Rust because we really enjoyed the, like the memory safety and threat safety guarantees because there’s a lot of parallelization, even the Lambdas we’re running like a bunch of threads and parallel in these lambdas to try to eke out as much performance as possible.
But anyways, Rust and GO are fast nearly equivalent for this use case. Java was slower than we expected for this use case, even when you enable snap start. So our use case is like, we need to burst up a lot of compute from a cold start and then let it all disappear again.
And if you’re waiting for Java to spin up, if, if it takes a while, then you don’t want to add, you know, half a second, multiple seconds sometimes to the query, to each query just to get started. You, you want things to, to boot up immediately. So Java was, was slower than, than we thought it was going to be. Python was the slowest. Maybe that’s not a surprise. Python has a lot of cool libraries for managing data, but yes, it’s not like the most optimized system level language. So yeah, it, it it’s perhaps it like very clear that that would, that would be the slowest. The fourth takeaway was interesting though the, a lot of the bottleneck comes with how quickly you can parse data.
And so the parsing library you’re using for your data format and also what data format you choose itself makes an unbelievable difference.
And the fifth is like kind of interesting that we found this, this may change over time. I don’t know exactly what the backend architecture revolution looks like for Lambda functions, but we found that if you want to get optimal S3 network performance, you should, you should make your Lambda memory allocation at least 640 megabytes. So 85 to 90 megabytes per second in throughput, like read throughput from S3 is what you kind of plateau to.
And the, which is good. Like that’s, that’s pretty good. And if you have massive datasets, but you have a lot of lambdas to go and work on that altogether, that actually allows you to make progress quickly.
So then the last takeaway is there are some interesting trade offs between the two CPU architectures. You can choose for Lambda functions X86 and ARM. So ARM is cheaper. It, it also sometimes has better performance.
I think like for many use cases I think it is better, but for our use case it wasn’t always better than X 86. But yeah, I think there, there’s some interesting things that you might want to measure because like the, the cost does drop quite a bit if you’re using arm.
Cool. I’ll talk about the language performance experiment here. And the experiment is we wanted to basically take a really standard data format as our first experiment to compare languages. Something where each language probably had a chance at, at having evolved enough and, and built enough libraries to handle this data format efficiently. So JSON in this case, like maybe there’s a, a highly optimized JSON library in every language. So we wanted to see, okay, if I had to read one gigabytes of one gigabyte of JSON logs that was compressed with Z standard from S3, how quickly can I do that?
That’s the way that Scanner works is we don’t just read JSON, we, we have built our own data format, but this was, this was like the, where we started was like how, how quickly, how, how do all of these languages perform at this with this common data format. So the languages we tested, Python, Java, GO and Rust for each of those. We tested the two CPU architectures supported by Lambda ARM and X86 and then we tried a bunch of different JON parsing libraries in each language just to try to find the one that was optimal. And then what we did is we ran 10 cold starts at each these memory allocations starting at 128 megabytes, which is the, the smallest possible lambda allocation you can have. And then 256, then adding 256 each time all the way up to like 3 gigs ish.
And then, and, and then we jumped from that all the way up to 10 gigs and we just wanted to see, okay, does memory make a difference? Do you get better hardware, better network throughput? Like what better CPUs or or more CPU capacity? What, what does it look like performance wise if you increase the memory of, of the Lambda functions? And so one of the things that I’ll, I’ll go ahead and I’ll show is this is a Scanner and just to kind of give a feel for what needs to happen and how much data there is to to process.
Like you, you can go around, you can search and so on inside of scanner and like zoom in on different histogram bars and there, there are, there are a lot of cool features here, but I’ll just zoom in on a particular query that can happen a lot.
And this data set is really large, this is 250 terabytes of Cloudtrail logs. And this, if you were to run this in Amazon Athena, this might take like a few hours. We had problems where it would run for an hour and a half and then it would crash. So anyways, what’s happening here is Scanner has found like 25,000 index files roughly that are relevant here. And then what it’s doing is the lambda functions are, are analyzing these index files and instead of reading all this data, the index files guide it to the, to the regions that contain hits for this token associated with, with this field. And so here, yeah, we have like the, the data search is 250 terabytes here and we had to read 3 terabytes and that that happened like, or almost 4 terabytes here, but that happens quite quite rapidly in tens of seconds and like saving you like hours, which is why the, the lambdas are beautiful and why we really need to make sure that the S3 bandwidth that like the ability to get data as rapidly as possible from S3 and parse and understood is the critical thing to, to get the network to work really well.
So anyways, I’ll, I’ll go back to the experiment here. The, the overall results here, I’ll just actually check comments real quick.
I see, oh yeah, I love that from r Darrell R yeah. There, there is a relationship, a love hate relationship with Rust for sure where it, it, it makes you take your medicine and not, not do dangerous things, but it can take a long time to agree with the compiler and eventually you, you, you understand what the compiler’s gonna get mad at you for and, and you and you have patterns that, that are workarounds. But I’ve always been shocked at like when you’re trying to do something with like a graph data structure in Rust where it just falls apart and it, it becomes like really challenging to get the borrow checker to be happy with you anyways. Cool. Feel free to add more, more questions or, or chats happy to answer.
But here are like the, the high level performance results here. So these are kind of small, but the, the idea here is that Blues, python Green is Java, purple is Go and orange is Rust. And this is how long it took to read one gigabyte of JSON logs from S3 from a cold start. And, and this is, this is talking about the best case scenario for Python was using like this particular CPU architecture and this JSON parsing library and kind of the same for each of them, but you can see here that Python is always quite a bit slower. It’s like, you know, hundreds of seconds to read that one gigabyte of data.
And as I just showed in that like query example, we have to read a couple of terabytes to get through some of these queries and so it really isn’t gonna work if it takes, you know, over a hundred seconds for, for one lambda to do.
Its small part of the job to read a gigabyte of of logs, but Java does better Go and Rust are like extremely good.
Go was just marginally slower than Rust. I’ll bet you that, you know, like we could probably eke out better performance and I would say like they’re both really good at this use case, but the, the purple and orange are very small and I’ll get into details of exactly how much time it took for them to read a gigabyte, but I, we were, we were very pleased with how fast it was. So this is some additional information on the, like what the configuration had to look like for each of these languages to get to peak performance. So like Python required a lot of memory and Java too to, to get to like the fastest network throughput and, and then, and the fastest CPU like the, the the best CPU resources to gave us the best performance.
But yeah, it was like, you know, 10 seconds ish more, a little more than 10 seconds for Python. But then for go and Rust, we were able to get to the point where you could read a gigabyte of this JSON data in two seconds, which was, which was really cool. And not, not much like, not much variance that that they, they all got were pretty consistent.
And then this is interesting, this just as a look at cold start time. So at first Java was really bad because we weren’t using Snap Start, we were just trying to, to launch it from a cold start. And I think everyone kind of understands this with, with Java and this is kind of a battle that people have with Lambda functions.
Snap Start makes a big difference here and I’ll, I’ll talk about that, but the cold start time with for Java without Snap Start enabled is really slow.
It can be like 500 to 600 milliseconds and it gets maybe a little bit better as you allocate a lot of memory for it. But yeah, it takes a long time just for the Lambda to wake up and start to work and then Rust and GO their cold start times were like 30 to 60 milliseconds, so much better now. Like Snap start’s really important. You, it’s, it wasn’t supported. I had to, I have to check like maybe Snap Start is usable now with ARM 64 CPUs, but at the time we did this experiment, Snap Start wasn’t available for ARM. So, which is a little bit of a bummer because I think Java, if you’re using ARM CPUs with Java, you, you get good cost savings and I think the performance is better in general, but the cold start time was, was not very good.
Maybe someone in the chat, if anyone knows if Java Snap Start is available now for arm 64 chips, I’d, I’d be super curious, but at the time it wasn’t so cool.
So then Cold Start improves dramatically when you’re using Snap Start. So Snap Start kind of, you know, like takes a snapshot of the JVM after it’s warmed up. I am not intimately familiar with all, all that goes on with with the the the JVM boot up. But anyways, the Cold Start time is way better than Python once you have Snap Start enabled and it’s pretty close to GO at like low memory allocation levels. So like for small Lambda memory allocations it’s pretty close to GO, GO and Rust are pretty much, you know, tied really, really fast. Cold Start times, Rust is a little bit faster at Cold Start everywhere than Go, but yeah, once you have Snap start it, it’s good, but it’s still, you know, considerably slower than Rust when you’re at, at a cold start.
So like Snap Start doesn’t quite get you all the way to go on rest performance, but it, it does make a difference.
I think like Snap start’s great for things like API request handlers it you, if you have traffic coming in all the time, that’s great for us. It’s like I want to burst from nothing, get answers as fast as possible and then like remove that compute entirely. So anyways, it all depends on, on what your use case is, but for us it just, the cold start helps Java, but also Java couldn’t overcome the was still too slow compared to go and Rust. In fact, like the cold starts was, was pretty good for Java. But if I go back, if I go back here, yeah, even this green bar, it’s kinda hard to see the Java is still dramatically slower, like four times slower than go and rust for the, the like to finish the, the task of reading and and processing this JSON data.
So okay, I’m gonna go through some performance details for each language and the trade-offs between the two CPU architectures.
Cool. Like roughly this is what the Python experimental code looked like. There, there are some changes I think like where this is maybe a little simplified compared to the experiment, but the idea is let’s initiate a get object request, let’s stream the body down and decompress it. And then as we’re decompressing chunks, let’s take each line and parse the, the JSON data. And so the, and every, every language we did basically the same thing here. The the data was all JSON lines and compressed.
And the the interesting thing here is that the, the X 86 versus arm were pretty similar. Like arm was a little bit slower than X 86 for Python and like we reached peak performance like roughly here at almost two gigs of, of, of memory still took like 10, 11, 12 seconds to complete the task. So like still, still too slow and Java was better, but not, but not good enough.
We were kind of, I don’t know, maybe we were hoping to use Java because Java does have quite a, it has a more mature ecosystem than than a Rust and, and go in some ways. But I think Rust and Go are are both like extremely mature. Java has a lot of AWS support and Rust is kind of new there.
So anyways, with, with Java we really tried to get it to be as fast as possible by experimenting with multiple JSON parsing libraries, and I’ll talk about that as well. But here’s what the code roughly looks like, which is we get an object, we decompress on the fly and then as we’re decompressing in these chunks and reading lines, we parse that data.
So, and this is like a breakdown of what this looks like.
The, the parsing library made a pretty big difference.
JSON ITER was the fastest org.
JSON, which is maybe the most popular or one of the most popular JSON parsing libraries in, in Java was slower and was a lot slower under X 86. So I think, you know, Amazon talks about this a lot, but if you switch to ARM 64, it’ll be cheaper and also in many cases faster. And I think, I think for Java programs, I I’ll bet that like ARM 64 is often faster than X 86. So that’s, that’s this blue bar here. Like for different memory allocations for Lambda, like the, the Standardish JSON parsing library was a bit slow on X 86, but all of them were, were quite fast on ARM 64.
And then the performance duration, it’s kind of interesting whether you use Cold Starts or Snap starts, the, the total, the, the total duration of the task was roughly the same. So even though Snap Start really helped the Java programs to get started quickly, it still was, was quite slow to, to finish the, the, the task of reading all this data from S3.
So like these bars all look pretty similar all the way along. And in fact, like oftentimes the Snap Start, which has like a much, much smaller cold start was a little bit slower overall, like a few seconds slower to, to finish the job than, than like just running from a cold start.
So yeah, Java, that was interesting there, there are interesting differences between the, the parsing libraries, but yeah, it was, it was just too slow for this job. We were excited by Go, this is kind of very interesting and, and one surprising finding from using Go is that the JSON library matters a lot here. There’s a huge performance difference between the like standard libraries JSON parser and then the highly optimized JSON parsers that you’ll find out there. So anyways, here’s what the Go code roughly looked like. You pull in the the object, you decompress it on the fly, and then this is showing how you can use fast JSON to parse the data, which makes a huge difference. So here’s a Go’S performance results.
The, if, if you use the FAST JSON parser, it makes the entire task go 10 times faster. So then Go becomes very comparable to Rust in terms of performance.
So it, it was, it was a little bit, it, I think like the, the ergonomics of using the fast JSON parser is worse than using the standard library, but if you like, really, really care about performance, we found that you, you have to do a lot of research and try a lot of different libraries and, and, and, and different techniques for, for parsing data to make go as fast as possible. Like if you look here, the, the difference is huge. It’s, yeah, it’s like 20 seconds to complete if you’re using the standard library JSON parser versus just like three seconds and two seconds here to, to parse, you know, gigabyte of JSON logs using this fast JSON library.
So really pay attention to the, the choice of library and it, it can be a lot of work, but finding the the right libraries to, to handle your use case and get as much performance out as possible can make a huge difference.
And I think this, this might be, I I’m curious what people in the go community think, but I feel like go often makes this trade off where they are, they, the standard library is really focused on ease of use and, and performances is important, but then if you really wanna go crazy with performance for the people who are like writing C, C++, et cetera, that you have to go outside of the standard library. Th this may have changed though, like it could also be the case we ran this a while ago and maybe the standard library Jason Par is getting better, I’m not sure. But anyways, this was, this was a surprising finding is we had completely written off go at first and then we’re like, oh, actually Go is a good choice for, for this use case as well.
Cool. And then finally I’ll, I’ll chat about rust, rust here. The, the, this, this is the similar kind of structure again, we’re just trying to read data on the fly, parse it on the fly. And the two different libraries we used for this experiment were standard JSON, which is like basically the, the, it’s it’s basically standard library. It’s not actually in the real standard library, but it is the most frequently used JSON par parsing library in the community. And then SIMD JSON SIMD is single instruction, multiple data. This is like extremely optimized for running vectorized CPU operations.
So it’s kind of like insane JSON parsing and that worked really, really well. So, but the, the standard library JSON parser was pretty good, but yeah, the SIMD JSON parser was still three times faster than the, the standard one. And this is kind of interesting as well, if, if you use the CPU architecture, like a lot of libraries, they have really cool optimizations for X 86 and they don’t have the same for ARM 64 or they come much later.
So like SIMD JSON did, did like the did an okay job at, at parsing, oh, I guess like this, this this chart is actually wrong. I think I accidentally copied the, the go one again. I could find the rest one, but the numbers are are like much, much lower or yeah. Okay, well maybe, maybe in a second I’ll, I’ll see if I can go find the original chart for the, the Rust data here. But basically, yeah, there was a huge difference between SIMD and standard JSON and where the SIMD JSON was much faster, especially for, for X 86, which can be a bummer because like you might wanna use ARM 64.
Oh cool. I see. Yeah, we can, we can post this in Discord and I’ll, I’ll like maybe put these in a PDF. This is what I get for trying to use Figma slides. This is like cool design, but I, I’m, I’m kind of a new with Figma.
Cool. But here is some in like another interesting point, which is that if you’re trying to build a system that like needs to parse a huge amount of data, then even if you may do the optimal thing with an inefficient data format like JSON, it, you’re never going to reach like perfect o optimality. And what we did is, even though Rust SIMD JSON is extremely fast, we decided to write our own proprietary format and use Mozilla’s bin code, the parsing like deserialization serialization library to parse our index files. So we really tried to make JSON work because it’s open bin code is pretty much optimized for Rust in in and yeah, the, but we had to go with the bin code because even though SIMD was fast, it was just massive improvement, but like four time, 4x improvement in speed to use, like a really highly optimized data format for this particular use case.
So this, this is scanning a few gigabytes of scanner index format data and we tried it, we tried the using JSON for the format, that was okay, but using bin code for the format was amazing. It, it gets even better. So definitely if you’re dealing with massive amounts of data, spend a lot of time on what libraries you’re using to parse the data, but also like what data formats you decide to, to use for the underlying data. And the, the, the next frontier for us is actually changing our data format again. So we’re using bin code, we’re using like the, our own proprietary format for bin code, which is optimized for Rust, but then there’s, there, there are new libraries all the time and a lot of really cool data format innovation happening in the rust ecosystem.
So there’s a library called archive, and what this this does is it allows you to do zero copy deserialization.
So basically like load the data into a buffer that’s all like kind of contiguous in memory and, and you can still operate on it like a, a normal rust data structure, but it, you can just without pulling it apart and building that the structure at parse time, you just load it into memory directly and then there’s one another path that confirms that the data’s valid and, but then it’s unbelievably fast. So this is our, the, the next step and this may improve our query speed by another 2x some, some experiments show like a 3x improvement in speed. So anyways, I think it really pays to spend a lot of time on when you’re dealing with massive data sets to, to figure out what your language supports, what the, what data formats to use, what parsing libraries are available there.
And yeah, now we’re going a little bit crazy where we’re just trying to, even like archive allows you to, to get good like CPU cache, locality performance boosts because the data is contiguous.
Like the, if you have like a, a data structure with a bunch of different pointers and strings and and and hash maps and so on, they’re all like together in the same chunk of of memory and that can sometimes fit well in the CPU cache. So you get like crazy performance by, by being very careful about your data format. So, but in general, yeah, the, the really cool thing about Lambda functions is that you can run many different languages there. And we were very excited about the fact that Rust, even though it’s doesn’t seem like it’s officially supported all, all the time in the documentation, you get like a little link somewhere.
It, it doesn’t show up in like the list of, of, of supported runtimes.
But the support is great and we think we, we think everyone should, should give rust a chance. There and go is awesome also in Lambda’s if, if you need to do crazy amounts of data processing against S3.
So the last little bit, which I thought was really interesting is we were trying to determine what the like network performance looks like inside of, in, in, in Lambda’s and whether that changed based on what CPU architecture you were using for your Lambda and whether that changed based on how much memory you were allocating for your Lambda. We’re basically trying to figure out, okay, we need the, our Rust Lambda functions to have like basically perfect bandwidth to S3 to read as much data as possible. And we wanted to know like, okay, how, how can we use as little memory as possible to get away with that?
And so what we did is we tested for each CPU architecture X 86 and arm, we tested various memory allocation settings just like we did before. And this test, instead of reading S3 or sorry JSON from S3 is just reading a really big plain text file that is not compressed from S3.
We were just like trying to get the, an understanding of the network bandwidth. So this is just getting, this is Rust again, getting an S3 object and then streaming this down and just counting lines, that’s all it’s doing. And we, we wanted to see what configuration settings gave the best performance there. So yeah, the, the, it was really interesting there was this plateau. So it seems like it could be the case that the actual CPUs that you’re, that you’re, you use the hardware that your small lambda functions are allocated to or once you have like a small lambda function with like low memory usage, those seem to get worse hardware or just not as much CPU time or network time. But as soon as you reach 640 megabytes of memory allocation, then you see this plateau of like, okay, this is, I, I now seem to be reaching optimal S3 performance here.
And so here the green line or the blue line is X 86 and the green line is arm 64. They look very similar to each other. And the bigger your lambda function gets in terms of memory usage, the faster your network bandwidth looks like to S3 and then it gets, it plateaus around like high eighties, like 90 megabytes per second. So yeah, if you really need fast interaction with S3, you don’t need a ton of memory. I was actually kind of expecting to need like gigabytes of memory allocated to my lambda functions to get the fastest speed, but really you, you start to hit that once you get to 640 megabytes, you don’t need any more than that.
Cool. And yeah, so the, the a a couple of interesting differences between the two kinds of CPU architectures, some of the takeaways, if you, if you’re kind of have a wild use case like us and you really need things like CPU specific instructions for like really highly optimized like SIMD related things that you’re doing with data, then libraries tend to support X 86 more often than they support arm. So this example might be outta date, I, I feel like someone mentioned that this library now supports ARM 64 for these vectorized operations in the CPU. But anyways, like this particular string search library which can search through multiple strings all the same time written in Rust, really awesome library. At the time we did this experiment, it only supported SIMD operations for X 86 and not ARM. And that was kind of a bummer.
I feel like we, we wanted to just try to use ARM everywhere to reduce costs, but sometimes if you really have this performance need, you have to go with X 86 because it has better support for crazy optimizations.
And at the time we ran the experiment, snap Start was only supported on X 86. So kind of it it it seemed that even though ARM is cheaper and often faster for certain use cases, it it wasn’t for ours. So we had to stick with X 86 a lot of the time, like more often than, than we wanted to.
And so ARM was great. It’s a lot cheaper I think like if, if the performance comparison for you is better or, or like is it the same or better than what you’re doing in X 86 then go with arm cheap 20% cheaper per unit of compute time and, and AWS does tout arm as being faster and I, it probably is the case for a lot of like web applications and API and ser server based workloads, but unfortunately it, it wasn’t for our, and at the time it didn’t su didn’t support Snap Start. So the, the cold starts can be slow if you really need a, a lot of compute bursting from, from nowhere.
Cool. Okay, so just like take a step back and talk about the high level takeaways from our experiment comparing these languages.
Again, I’m not trying to say that like Rust and go are what you should always use in Lambdas. I think honestly from a productivity standpoint, rust can be painful and, and slow things down depending on what you’re trying to do. Like using JavaScript or Python or even and Java in these use cases might be perfectly great. But if you have this use case where you need to spin up a significant amount of compute from a cold start to scan S3 data, then we, you probably need to go with Rust or Go.
I have heard of people doing stuff with C and C++ and getting like crazy stuff that trying, trying to get going in in Lambdas and I, I wish them all the best but Rust and Go have great Lambda support.
Like we use Cargo Lambda, that could be another talk as the, we use Cargo Lambda to make it very easy for us to build Lambdas and deploy them. But yeah, I would definitely say if you have a use case like ours, try going with rest and go Java was unfortunately slower than than we wanted.
And yeah, I perhaps not surprising, but like given how much AWS supports Java and how like a lot of cool, there are a lot of cool libraries for AWS like higher level libraries that interact with AWS APIs that are available in Java that aren’t available elsewhere. So a little bit of a bummer that Java wasn’t fast enough, but we’re, we’re very happy with Rust.
Obviously like the, the choice of data format is critical. It makes a big difference in how quickly you can parse and, and download data. But also we were surprised at how big the variance was between different parsing libraries. Like the JSON libraries had a wild difference in performance, especially in go there was like an order of magnitude difference in performance depending on which library you you were using.
So yeah, if, if you’re, if you’re disappointed with, with Go and you’re using it for a use case like ours, try new libraries, try new formats like you, you might find a way through. And then another takeaway is if you’re choosing between using X 86 and ARM 64 for your Lambda CPU architecture, you may see like pretty big performance differences because SIMD support is different in different libraries.
Go with ARM if you can because it’s, it’s cheaper and for some of the languages ARM and X 86 perform similarly, so go with ARM in that case.
But in, in our case, yeah, the ARM 64 didn’t have all of the, all of the functionality we needed in, in some of these libraries.
So Cool. I that that’s it, it kinda went super deep on, on languages CPU architectures and, and network performance, et cetera. But yeah, I would love to, I’ll I’ll check the, the chat real quick and see if anyone has any questions, but yeah, we’re, we’re having a lot of fun processing tons of data over at Scanner in S3 and we’re really grateful for Lambda functions, which makes it possible to not burn a lot of compute cost on idle, idle compute nodes. You can spin them up from scratch and get really good cold start times. I guess like maybe that’s another interesting takeaway is that we were kind of expecting cold starts and lambdas to be so bad that it would really harm user experience, but with Go and Rust it was 30 to 50 milliseconds cold Start, which is quite good for, for our use case, like very like not too noticeable.
I think if it were a hundred milliseconds plus you could maybe start to notice. So we were, we were very glad that Lambdas are very fast to boot up.
Cool. Yeah, just checking the chat.
Danielle Heberling
Looks like we got one comment here from Daryl.
Cliff Crosland
Thanks Daryl.
Danielle Heberling
Yeah, this is a super interesting use case. I always love to see stuff like this last call to type in any questions before I ask mine. All right, good. I get to ask my questions now. Nice. You had mentioned that when you were doing the, the tests that you were testing against cold starts, usually we’ll wanna do the opposite, but I’m just curious, like how, how did you force a cold start?
Cliff Crosland
Oh yeah, that, that’s a great question. So the, we we wanted to make sure that every time it, it does, if you dig into a Lambda function, like the output logs, the, the output metrics in the ui, you can see if there is a cold start time. Actually this might be in the blog post of exactly where you can find the cold start information, but if there, if there was, we, we wanted to make sure that that value was present. And so what we would do is we would launch using one language and, and launch that up and then we would switch to another language run, run those experiments and switch over and over again.
But sometimes you’re right, like sometimes it would be, sometimes we wouldn’t get a cold start when we expect it to after waiting.
But yeah, we, we wanted every single time we, we wanted to make sure like the cold start log entry was there, I can’t remember, maybe you know what I’m talking about, but there, there is like a little bit of information in the logs and also in the output.
Danielle Heberling
Yeah, yeah. I forget the, the wording for it, but yeah, I know what you’re talking about.
Cliff Crosland
Yeah
Danielle Heberling
It’s funny
Cliff Crosland
Was probably, yeah, sorry,
Danielle Heberling
It’s usually the opposite. You’re like, I don’t want cold starts, but for your case it was, that’s really interesting. And then I was wondering too, if you had the information slash if you could share it, if the answer’s no, that’s fine, but how much did running the experiment cost for your team?
Cliff Crosland
Oh yeah. Okay. So this one was cheap, this one. So we, we have other cases where we spin up, you know, hundreds to like thousands of Lambda functions suddenly and, and then make them go away. In this case, we only had to run like one at a time to say, okay, given this one lambda, it’s going to like read this one gigabyte as, as quickly as possible. And with the idea that like we would even eventually have a bunch of lambdas all reading a ton of data in parallel and each one would have to be fast. So this one I, I’ll bet it was like tens of dollars maybe.
Yeah, it was, it wasn’t, it wasn’t too bad.
It was basically like, I guess 10 times all of the stuff we did, like at each one of these we ran 10 cold starts. I mean, some of these were kind of ran for a while, like a couple hundred seconds and then we did that 10 times.
So maybe, maybe it was like a hundred dollars. I, I’d have to, I’d have to check, but it was, it wasn’t too bad.
But we did, once, we did run an experiment where we loaded 250 terabytes of logs and, and to create this data set separate experiment, and that one did cost like a few thousand dollars. But one of the things that’s cool is that because we’re focused on S3 compared to like if I were to do this in another log tool that’s hosted somewhere, or like CloudWatch that may have cost me like $20,000 or something to do. So yeah, it, it, that was maybe the most, it, the, the, the most expensive experiment we ran was like a few grand for a, a, a giant, giant, giant data set to index it and query it. But yeah, this one was not bad.
Danielle Heberling
Awesome. Yeah, very, very interesting. Use case lambda, not just for tiny projects. Yeah. I’m trying to think if I have anything else. Oh yeah, just since, since you and your team were testing a lot of different things, I was curious for about how long it took you all to do the whole experiment.
Cliff Crosland
Yeah, so this whole experiment was maybe a week long. We tried a bunch of different things. I think like more experiments, it would be fun to test even more data formats I think like, or beyond just JSON, you know, there’s like Parquet or CSV that people often need to need to run through.
But yeah, we wanted it to be kind of amenable to the full text search that we’re doing. So, but anyways, yeah, it was, it wasn’t too bad. It was like, it was like a week of playing with a couple of different ideas. We could probably add like JavaScript in here. I’m, I am really curious because I think JavaScript was probably the most widely used language for Lambdas. Like maybe it actually would do better in this use case than we thought. We thought it would be like Python, but we haven’t really tried JavaScript a ton.
Yeah. But yeah, it’d be interesting.
Danielle Heberling
That would be super interesting to see for sure. Yeah, my company we’re all type script CDK, so, but it’s a web app so makes sense. Awesome. Very cool. I guess last thing I definitely wanted to mention is, or maybe more of a question for you, if people wanna learn more about Scanner, where should they go?
Cliff Crosland
Yeah, for sure. So if you check out us, check out our website Scanner.dev, you can see a little bit more, some videos, performance comparison, link to our docs. And then I’m on Twitter Clifton Crossland, so you can bug me there. Yeah, we, we often go to security conferences, so like BSides New York is coming up or like our, our AWS re:Invent, probably a bunch of people here might be going to that, but if you’re in town for those conferences, would love to, would love to chat. But feel free to come check us out if at Scanner.dev if, if you wanna know more about crazy log searching.
Danielle Heberling
Awesome. Well thanks so much for sharing your knowledge and all your experience with us today. This has been really great and yeah, we can follow up with the, the, the missing report and Discord later for folks that are interested in that.
Cliff Crosland
Yeah, sorry about that. Wrong copy paste did go twice. That was my favorite chart, so, oh well.
Danielle Heberling
Oh man.
Cliff Crosland
I’ll make sure to get that to you.
Danielle Heberling
Yeah, we can take care of that. Awesome. Well thanks for tuning in everyone. I am gonna end this stream, so see y’all later.
Cliff Crosland
Thank you.