Using Spans for Substring in C# DotNet With Benchmark Results
Want to see how you can speed up your string-based algorithms in C# by using spans? Check out this video!
The code maze article can be found here:
https://code-maze.com/csharp-span-to-improve-application-performance/
View Transcript
today we're going to have a quick look at the span type in C sharp and specifically we're going to look at how you might be able to use that in place of strings and using substring this is actually a video topic that was requested multiple times so this is just a reminder that if you leave comments on the videos and the content that I post I'm happy to put that content together for you so without spending too much more time let's jump over to visual studio and have a quick introductory look at how you can use spans in c-sharp okay so what is a span well I thought that this codemaze.com article on spans I actually did a pretty good job explaining the basics so I'm basically borrowing a lot of the examples that I'll have today from this site so this code will be
available online and then if you want you can go click this URL and I'll try to put it in the description as well so you can go visit the site so what is a span well it's an allocation free representation of contiguous memory and you might read that or hear that and say well what the heck does that mean well a way to think about what it's not and what what it's replacing is say a substring it's actually going to require new allocation of memory to represent a smaller portion of a string right so for example if I highlight with my cursor here let's pretend this is a string if I said I want a substring and I want to take this part just to represent that substring we'd actually have to go create a whole copy of it whereas a span actually just lets
us sort of point at this data and then we can give it a length so the other way to look at it too is not just with strings but with a arrays of data you can actually do the same concept where you can point at the array and then say I want this portion of the array the other point to mention here is that a span is a ref struct and this actually has a lot of constraints for where we can use them so you might start noticing that as you're getting more familiar with spans and using them you can't have them in async methods you can't have them as fields of classes and that's because a span is actually allocated on the stack so I should add that here spans are allocated On The Stack so that's actually sort of the the limitation and
actually one of the the reasons why it's uh beneficial in terms of performance another constraint is that we can't use them in lambdas so if you have Anonymous delegates and things like that um you're unable to reference spans on those so again you may start to notice that you know you're like oh I'm gonna go optimize this algorithm I have it's going to be way faster um and then because of the complexity of it you might start to realize well oops like spans don't work here but that's okay because there's another sort of flavor of this span goodness as I put it here but there's this memory type and read-only memory and I should mention too that span also has a read-only span but these memory and read-only memory work very much like spans with respect to Performance and the abilities that a span has
however the difference is that we can use them in a lot of these other constrained places where spans do not work that's a brief introduction but you might still be wondering well I don't really get it I want to see some code I want to see how it can be used so we're going to walk through the example that's from this codemaze.com website and specifically the c-sharp post they have on spans and I'm going to also Benchmark it for you so that you can see the benefit so let's start by just calling out that I am going to be using benchmark.net as you can see here but I'm going to come back to the actual Benchmark itself in just a moment so let's not focus on The Benchmark setup just yet let's get rid of that let's walk through a substring example where we're going
to be counting the number of empty lines in a particular body of text so again this is borrowed from the code maze website and what we're going to be doing is essentially looping through the text that we want to search character by character and we're going to be looking for if you're on a Windows machine it's a backslash R backslash N is a new line right the two characters make up new line and we're actually going to be looking for the backslash n and then what we're able to do is try and see if we can get the um the portion of that string so we're using a substring to do that to represent the line that we're looking at and essentially if the line that we're looking at so the substring that we take out is exactly equal to environment new line so again
in Windows backslash R backslash n then we know that we have an empty line and we can count it and then this is just going to keep track and same with this variable the bottom for the index that we're at so this is a simple algorithm that just looks for empty lines but it's using substring to do it and as we'll see in just a moment when we compare the two algorithms um this is a totally functioning way to go look for empty lines but let's go look at the span variation so one of the first things we do in this method is actually take the text to search and get it as a span which you can see here this is just an extension method that gives us back a read-only span of this string so this span now allocated on the stack is
really just allocating a pointer to this string and a length which is the entire length of that string so this span itself is not a substring it's really representing the entire string otherwise the algorithm is almost identical but where it's going to be different if I scroll down a little bit lower you'll see that instead of calling substring we're actually calling slice now what's really cool about slice is that unlike substring slice is just going to give us back another span that we can look at and because a span is just a pointer plus a length we're not actually allocating an entire other substring and you might be saying to yourself well if we're only looking at it line by line like that's only going to be a small amount of text it can't really be that bad right and to be honest it's not
really that bad in our current example however when you start doing this with a slice and then using spans you'll see that it's dramatically better for performance so like I said the rest of the algorithm is almost identical and you can think about slice in this case just being like a substring except it's the equivalent to a substring of a span without allocating more memory for that substring so hopefully that sort of analogy makes sense and let's go now check out The Benchmark setup and then we'll go look at the results of that Benchmark all right so I'm not going to spend too much time explaining all the details of benchmark.net but in this setup method we have we're able to I'm creating a random with a seed so that between our benchmarks we have the same data and then we have this length property
that will dynamically be changed between our benchmarks so I have a thousand ten thousand and a hundred thousand text length for us to work with and that's going to be the text that we're looking over next this whole block of code in this while loop looks pretty ugly but all that we're going to be doing is because the first part of this just fills our bite buffer with random bytes I'm just going to walk back over that byte buffer and I I set up basically a 10 chance that will drop in some new line characters as we're walking through and then 50 of the time will double up the new lines and that way we get empty lines this I can also pull out and make a parameter if we want to see if there's a difference based on the frequency and we can talk
about this at the end but right now I've just kind of picked these two completely arbitrarily and if you think about it if you multiply these two numbers together so 0.1 times 0.5 that's going to give us the frequency that we actually see empty lines otherwise the only thing that we're doing at the end is taking that byte buffer and then making it the text that we want to search so if you were to go look at this text it's going to be completely jumbled up it's not really going to make any sense it's just random you know characters that we're putting in sprinkled with new lines in between and the whole point of this is just that I wanted to have a big body of text that I could work with alright so that's the setup let's go Press Play I will edit this
video so you don't have to sit through this and let's go look at the results alright so on my screen I have the results if you're not familiar with benchmark.net this is just the output that we see from that program and let's have a quick look at the columns that we have so the method is going to be the different Benchmark that we're looking at remember that text length property I mentioned well we're going to see the variations of that here in the second column and then we're going to focus on the mean which is going to be our run time and then the allocated column which is the number of bytes that we're allocating during that Benchmark so with that said let's start right at the top with the first two lines that are talking about our first Benchmark for substrings and then with
span at a text length of 1000. if we look at the mean column for these two the runtime for substrings was basically 2 000 nanoseconds so that compared to parse with span the span was less than half the time that's pretty impressive and I mean we're only looking at a thousand characters and we're talking about nanoseconds here but it's half the time so that's quite impressive we'll see if that scales up as we start increasing the size of the string that we're searching if we jump over to the allocated column we quite literally allocated zero bytes to the Heap which is quite impressive that means that we were able to do that substring variation using spans and allocating no bytes to be able to perform the same algorithm let's increase that text length by a factor of 10 so now we're looking at the third
and fourth Lines within this Benchmark output and we can see that of course the runtime for substrings has gone up by about an order of magnitude so multiplied it by 10 right and we can still see that parse with span is half the time of that so quite interesting right as we increase the size of the text that we're searching span still ends up coming out ahead in half the time that it takes the original algorithm with substrings if we go to the allocated column as we might expect we get about 10 times the amount of memory allocated for the substrings but we're still at zero bytes allocated for the span implementation this is really cool now the final example I just put it at a hundred thousand you could obviously continue this pattern if you wanted to see it in more extreme cases but
decided to stop here but as we see another order of magnitude for the run time but we're still at roughly half the time when we're using a span versus a substring again jumping to the final column spans still allocate zero bytes to the Heap and that is super impressive for us to be able to do a little algorithm like this and then not have to have garbage collection trigger on the things that we're operating on so that's just a super quick look at spans and how you might be able to start using spans in place of substrings in some of your algorithms and you can actually do the same type of things with byte arrays or other types of arrays as well and I think maybe we can have a follow-up video looking at something like that but the idea is very much the same
when you're using arrays versus strings so hopefully this was helpful hopefully got to see some of the benefits of using spans in terms of the memory allocated and also the performance thanks for watching and we'll see you next time foreign
Frequently Asked Questions
What is a span in C# and how does it differ from a substring?
A span in C# is an allocation-free representation of contiguous memory, which means it allows you to reference a portion of data without creating a new copy, unlike a substring that requires new memory allocation. Essentially, a span points to the original data and specifies a length, making it more efficient for performance.
Why can't spans be used in certain contexts like async methods or class fields?
Spans are allocated on the stack, which imposes certain constraints. Because of this stack allocation, spans cannot be used in async methods or as fields of classes. This limitation is in place to ensure the performance benefits of spans are maintained.
What were the benchmark results comparing spans and substrings?
In the benchmarks I conducted, spans consistently performed better than substrings. For example, at a text length of 1,000 characters, the runtime for substrings was about 2,000 nanoseconds, while spans took less than half that time. Additionally, spans allocated zero bytes to the heap, which is impressive compared to the memory allocated by substrings.
These FAQs were generated by AI from the video transcript.