FASTER Than IndexOf Or Total Flop? - Benchmarking in C# DotNet

Name: FASTER Than IndexOf Or Total Flop? - Benchmarking in C# DotNet
Uploaded: 2023-05-18T06:07:05.0000000+00:00
Duration: 16 min 26 s
Description: This isn't exactly the video I wanted to provide you folks but... I think it's still worth sharing! In this video, I examine an implementation of the Aho-Corasick algorithm and compare it with the IndexOf() method in C#. But there's a catch to my benchmarks... Let's find out!

May 18, 2023

• 290 views

This isn't exactly the video I wanted to provide you folks but... I think it's still worth sharing! In this video, I examine an implementation of the Aho-Corasick algorithm and compare it with the IndexOf() method in C#. But there's a catch to my benchmarks... Let's find out!

View Transcript

so this isn't the video that I wanted to put together for today but I still think the content is going to be interesting so I was hoping to focus on string search algorithms so specifically what's built into C sharp to do something like an index of for looking for substrings compared to something like the aho chorusik algorithm I'm not even sure if I'm pronouncing that properly so unfortunately what I fail to realize is that it would not be in apples and apples comparison it's really like apples and oranges because my implementation of the aho chorusik algorithm is really focused on going through binary data and not strings so with that said I want to jump over to the code look at some Benchmark setups and really this is going to be framing for how much better do I need to make my aho corsic algorithm to be comparable to what's built into C sharp for Strings so here in Visual Studio I am using benchmark.net if you're not totally familiar with this it's a very popular benchmarking framework there's a nuget package and everything available for it and tons of examples online line I'm just going to walk through what we have for a setup here so in the beginning you can see that I have some parameters this is just going to be the different combinations of benchmarks that we're able to produce here so what I'm looking at is comparing the length of the signature or the substring that we want to be looking for we want to compare different counts of how many signatures we have so for example a very simple string index of would just be like one signature and then the length of that signature length of the substring in that case is sort of this second parameter that we have here the next this Source length is actually how long that string data is going to be and then because I have to populate what this content looks like I'm not really using a variable configuration here for this parameter I'm actually just using a single five percent sort of frequency that we're coming up with I figured it might be interesting later to play around with what happens if this is a greater or smaller but for now I figured I'd keep it pegged at five percent and we'll just see what happens with that the next part of the code I have on the screen is just some Global setup and what that means is just before each Benchmark is run this code is going to set up the data that we're looking at and that way we can use the parameters that were defined above because as it goes through the different iterations those parameters will be updated for us for the next run this looks a little bit more complicated than it might need to only because I am pulling in some of the code from the other project I'm working on I'm using Auto fact to pull in some of these classes that I'm going to be working with but I don't really need you to focus on that for now let's just talk about some of the other setup that we have this part here is really just going to set us up with some different signatures that we can go look up I am using a random class but I am seeding it with the same numbers that between each run we should end up with the same data something that I need to call out here and like I mentioned in the beginning of the video I'm going to be comparing string index of to go look for substrings but the aho chorusik algorithm that I've written is actually going to operate on binary data so these signatures here get defined as byte arrays but I also need those available as strings just to try and make it as comparable as possible I am using a 1252 encoding which should give us a one-to-one byte to character mapping this next section here is really just so that I can create the random byte array and then later on I can actually convert that to a string because as mentioned I do need to have a string version and a byte array version in order for these benchmarks to be run now this part isn't perfect as in it's not going to statistically give us a perfect signature frequency in this data however I figured it would be as close as possible and of course this part is probably not done in an optimal way but it's not part of the Benchmark and it's fast enough that I don't really care because I can just go run the benchmarks and leave it waiting in these last three lines I've already mentioned that on this second part here I'm converting the bytes to a string but this line here on 83 and line 85 this is relay related to the implementation that I have in my main project and this is going to allow me to create my a-ho corsic algorithm with the signatures that I want to find as well as this array data iterator that will go over these bytes to search in the original project I am operating on very large files and this is why I need this interface to be able to work with so I'm already at a bit of a disadvantage compared to just the built-in string index of but I wanted to call this out because this is the constraint that I have to work with if we jump to this implementation it is a bit of a dummy implementation in that the API here requires that I have an i enumerable of slices that I can work on and they're just read-only memory however because I just have this byte array that's passed in I'm going to return it as just a single slice the actual implementation inside of my project requires that I have to Loop over a very large file stream but in this particular case that's just not how it's going to operate so let's go look at the different benchmarks that we have for the string index of the first one here is going to be one that we're actually doing a sequential lookup of each of the signatures so all that we're going to be doing is looking for the signature at a particular offset if we don't find it we can go move on to the next signature however if we do find one we're just going to increase where we're at by one and then see if we can find the next substring the reason I'm not jumping the entire length of the substring here is just because in theory you could have a substring that is going to overlap with itself the next Benchmark is very similar except we're going to run this in parallel so this way if you look at the entire body here this is actually going to be almost the exact same it's really that it's just wrapped in this parallel for each and finally the Ajo chorusic implementation here is just calling this search method and it has this callback where we're going to be operating on the different match and this return true just says whether or not we should continue or not so as you can see it's already a bit of an apples to oranges comparison because it's not truly just a loop here I do have a callback and this is going to have some overhead so that is a bit crappy in comparison however like I said I've changed the goal of this video because I was totally wrong I was hoping to be able to say look how fast my aho chorusik implementation is and the reality is look how slow it is so instead of focusing on the negative I now have some targets that I want to work towards so before jumping over to the benchmarks I do want to show you a little bit about my aho chorusik implementation but what I'm not going to do is spend a lot of time explaining what the a-hole chorusik algorithm actually does in detail so the basis of the aho chorusik algorithm is going to use trees as in this tree structure and in my particular case I'm going to be using a binary tree this is a particular type of data structure it's not quite like a tree as in tree but you can go look up more information on this and essentially using a tree structure we're able to create a mapping of the different signatures that we want to look for and then if I scroll down a little bit to the implementation of the actual lookup the way that this works is that it will walk through the data and then try to see how much of a particular signature it can match and the algorithm itself essentially is optimized to be able to step through until it finds a match and then progress from there so you can go look up plenty of different implementations of the Ajo chorusik algorithm in fact this one that I have here is actually just pulled from GitHub I do want to go look at a different implementation that can use something like a double array for the tree structure because right now this particular line right here on 59 even though I'm using a dictionary this contains key method call is actually quite expensive when I look at where the time is spent the last thing that I just wanted to mention especially if you're thinking about the fact that I'm comparing this to a string index of is that I am using a for each Loop here and then I'm going to be walking through the different slices that we pull out of this for each Loop now in my particular Benchmark this is only going to be one iteration because I'm only ever pulling out one slice for the full length of the data so really it's just this inner loop that's going to be going over the set of data the challenge with this though even compared to the index of for the string is that ultimately I do have this callback that I need to invoke and the overhead associated with that method call plus creating a parameter to pass back is going to be a little bit more expensive than otherwise we would just have in a string index of so yes of course course I'm being defensive about why my algorithm is slow but in hindsight I think I probably should have realized it's an apples and oranges comparison so not totally fair but I think it's still going to be interesting so with that said let's go look at some of the data all right so first off I apologize to anyone watching this on mobile I understand that the numbers are probably hard to see I did try my best to actually manipulate this data a little bit in paint so it's a little bit messy you can see in the top right oops the top left here but I just did my best to try and pull in as much as I could so I could zoom in for you okay so let's get started looking at what we see here if we start looking at the benchmarks here we can see in the first three lines we have the sequential index of parallel index of and then Ajo chorusik following that if we jump over to the mean column we can see that in the most basic case we have 10 times as long to go run aho chorusik than just the built-in index of now this is pretty terrible bowling comparison and I figured okay maybe as the amount of signatures that we want to look at because this is only for one and maybe given a variable length of those signatures coupled with how long the actual sources maybe we would start to see something kind of change up there so let's keep looking to see if that happens but one more thing we should double check here is if we look at the amount of memory that we allocate for this it's interesting that the base index of that we do sequentially is very little just 32 bytes the Ajo chorusic one is actually only a little bit more but the parallel for each is quite a bit more when we compare especially to this index of that's done sequentially one final thing to note is that in this most basic case the actual parallel version is slower than the sequential one so all kind of interesting in the beginning but kind of sucks to see that the Ajo chorusik algorithm that I have is that much slower as we start going down and increase using the source length so the next sort of six lines here are going to be the ones that indicate a longer Source we can see that the Ajo chorusic algorithm actually gets a lot slower now the amount of memory that's used is actually comparable in across all of these top benchmarks however the Ajo chorusic algorithm definitely slows down it's now two orders of magnitude worse with respect to Performance so this is pretty crappy to see but again maybe we'll see a difference as we go from the number of signatures and the length of the signatures if we progress down a little bit further what we're noticing is that this length column is changing and that's the signature length but if we do a scan across these Benchmark numbers we're seeing that the number of signatures is the same the length is changing and then the source length is changing the mean is always showing us that the Ajo chorusik implementation is two orders of magnitude slower across the board board so for me this is probably the initial thing that caused me to not want to make this video because I was so excited to show you how fast it was but again if I reframe it this just tells me that I have a lot more performance to try and squeeze out of my algorithm it's also interesting to note that with one signal signature we're not getting any additional performance out of the parallel version and if you think about it if we go back to the code we'd see that the parallel part is only over each signature so because we only have one signature there isn't actually any benefit to going parallel it's literally just overhead that's what we see for one signature let's go to the 10 signature case I'm just going to scroll down a little bit here and try to get it all on screen for you but if we start with 10 and look across we can see that the parallel is actually caught up to Performance now for the sequential one it still has more memory use but if we look at Ajo chorusik in this third line here especially looking at the mean here we can see that yes it's still two orders of magnitude slower again if we continue to scan for what's on most of this page we will see that Ajo chorusik that I have is almost always two orders of magnitude slower it's just awful in comparison and I was pretty disappointed in that but as we get a little bit higher we can start to see especially right around this range here or how many is that that's one million in terms of length and that's 10 signatures that are 16 characters long we actually see for the first time right here that the parallel version actually starts to beat out the sequential version actually if I scroll up a little bit higher that's not true it's we can see it happening even above here where the parallel is beating out the sequential so that's kind of cool that if you think about it the parallel benefit as you might expect is actually definitely measurable so something interesting to consider is can I do my aho chorusik in parallel we'll look at that in a future video because I have done that I just haven't benchmarked it here let's go to the final case where we have a hundred signatures and then we're looking at the different lengths of that if we're starting at this fourth line down here this is the first kind of case where we have 100 signatures parallel definitely beating out the sequential Ajo chorusik is one order of magnitude slower and then actually we start to notice if we scan down we're about one order of magnitude slower than the sequential for AO corsic but the parallel index of is really flying in comparison something interesting to be said there about trying to do this stuff in parallel the amount of memory that we're using we're not talking very many bytes at all but what's interesting is if we're watching this rightmost column we can see that the parallel version is always using more memory for allocations but the Ajo chorusik compared to the index of is actually quite comparable we're talking about about a hundred bytes different so really not all that much all right so those are definitely some disappointing Benchmark results from my algorithm but that's okay like I said at the beginning of this video I thought it was still worth putting this content together because instead of trying to compare myself to the string index of and then being disappointed I can say hey look if string in index of can go that fast why can't mine so I have done on the side some other improvements to my Ajo corsic algorithm I've been benchmarking this kind of stuff in um I don't want to say in isolation that's probably the wrong word I haven't been benchmarking it in comparison to string index of but in comparison to itself so I can see it getting faster I will make another follow-up video with one more optimization that I've made I've talked about it a little bit or hinted at it in this video actually you'll notice that if you look at the benchmarks I used for index of for string I had two variations of it one was parallel one was not why didn't I do that with aocorosik well that's because it will be in the follow-up video to this one when I have it done I will link it right here for you until then it will be another placeholder video thanks so much for watching and let's see what the next results show

Frequently Asked Questions

What is the main focus of the video?

In this video, I focus on comparing string search algorithms in C#, specifically the built-in IndexOf method versus my implementation of the Aho-Corasick algorithm. I aim to benchmark their performance, especially in the context of searching for substrings and binary data.

Why is the comparison between IndexOf and Aho-Corasick considered apples to oranges?

The comparison is considered apples to oranges because my Aho-Corasick implementation is designed for binary data, while IndexOf is built for string data. This fundamental difference in data types makes it challenging to compare their performances directly.

What are the results of the benchmarks presented in the video?

The benchmarks show that my Aho-Corasick implementation is significantly slower than the built-in IndexOf method, often by two orders of magnitude. This was disappointing, but it highlights the need for further optimization in my algorithm.

These FAQs were generated by AI from the video transcript.