INSANE Speed Unlocked On Segmented Stream Design in C#
May 10, 2023
• 297 views
c#.netc# tutorialdotnetdotnet coredotnet core tutorialdotnet core c#c# dotnet corec# programminglearn c#csharphow to codecode examplesstreamstreamsC# streamC# streamsC# read streamC# read fileC# read large filereading streams in C#how to read a file C#file streamfilestreamfile stream C#how to read a file in C#C# optimizationC# code optimizationC# spansspans in C#C# ReadOnlySequenceC# stream optimizationsc# speed
This video walks through an optimization of our initial design for a segmented stream in C#. One small change can REALLY improve performance! But that's not all... Stay to the end, because when the follow-up video is published I'll link you directly to ANOTHER really cool optimization!
For more videos on programming with detailed examples, check this out:
https://www.youtube.com/playlist?list=PLzATctVhnsghN6XlmOvRzwh4JSpkRkj2T
Check out more Dev Leader content (including full in-depth articles...
View Transcript
alright so this video is just a quick follow-up of one of my previous videos where we were looking at trying to load in a really large file into C sharp and obviously when we don't have enough RAM to be able to hold the entire contents of something that's you know multiple gigabytes up to be on terabytes um how do we go and do that so in that past video which I'll link I'll put a card up here somewhere um go watch that um and then come back to here if you haven't seen it already but in that video I was talking about the implementation that I had to try and segment that data and I had something functional and then I I had this point as I was recording and kind of showing it like I know that this isn't going to be optimal but
it's working right now so in this video I'm going to jump over to the code we're going to look at a little optimization I made and if you stay right to the end of this video I will link the follow-up to this one which is yet another optimization that I'm super excited about that has some benchmarks um but yeah let's go jump over to the code and I'll show you what I did all right so here in Visual Studio I have my segmented stream class and where I was calling out in the previous video where I was going to observe some potential performance implications like I had deserved that it was slower almost twice as slow as a normal stream I had figured that it had something to do with my read method in particular that I'm overriding and specifically I I saw this bit
of code here I almost jump right to the the juicy part there but uh this bit of code here that if we read what it's doing I was like this has to be the thing that's going super slow so let's just walk through from a few lines above to see what's going on and just to kind of explain a little bit this method as we're iterating through we're trying to take the current location within the Stream go look up a segment so this again is a segmented stream I have a bunch of chunks in memory representing the underlying portions of the file and a caching mechanism for that so that you can seek around and it will keep a cache of the most recently used pieces so as we navigate through this and we are streaming the data this method will go look up the
segment and then asks that segment for the sequence which is going to be the array that is holding the data that it represents from disk so nothing too fancy here and then my goal was that I wanted to start using like things like spans and read-only sequences these are things that in my experience in C sharp I haven't had a ton of exposure to and I'm really excited to try learning about them and using them more so I said okay I'm going to expose the sequence here so if we look at the type of sequence it's a read-only sequence of bytes and then I I said great now that we have that sequence depending on where we're reading we actually don't need to start necessarily at the beginning of the segment we might be you know part way through that segment so if your segment
was say 4K or four megabytes whatever if you're reading the you know the hundredth byte you're starting at the hundredth byte into that sequence or into that segment sorry then we can actually slice that sequence up and what's cool about this is the slice method here is not like a copy where you're just copying the bytes out and putting them into another array kind of like how substring works for Strings you're not actually just making a copy of those bytes into another array we're just getting a view of that data from you know this range here from the segment offset and then the length of the slice so we get this view we haven't actually copied more data going from line 67 to 68 so that's awesome then we put this reader around this and what I was doing is actually using this reader and
then saying while the reader can read get this next byte add it to the buffer and keep moving and if you're familiar with reading data even if it's actually in an arrangers copying when you do stuff one byte at a time it is going to be way slower than other methods that we have so I wasn't exactly sure what tools I had access to to be able to make this better but I figured this is going to be crappy but it's working let's look at what I did I'll just go ahead and delete this code oop so what I was able to do is go get the buffer and this buffer is what we're going to be writing our data into and then I have it as a span so again just like when we're slicing we just have this new view over this buffer
that's from the offset we want to write into right so buffer and offset I'm just going to scroll up a little bit more buffer and offset are passed in to the stream read method so we get the buffer and we're gonna basically just use the length of the slice that we want to write to so now destination span is exactly the length that we and location that we want to be able to copy to the sequence reader actually has this method try copy to that will go read from where it currently is around the subsequence and try to copy that data so that it fits this destination span and what's cool about that is instead of doing one single byte at a time in a while loop I'm able to use all of the optimized code that they have in.net behind try copy to to
do the entire length of this span this alone put me right at par with reading between a normal stream and my segmented stream I'm not going to show benchmarks for it um you can maybe take my word for it it was slightly slower like maybe a couple of percent like less than five percent but this change between what I had on screen and right here is actually enough to make that difference of being twice as slow to that other margin of error so again just to repeat it went from doing one bite at a time copying into the array to this actual reader try copy to and doing the entire span so that's awesome now when I went to go implement this whole thing and this is going to kind of tie into the follow-up video that I'm going to link you to at the
end of this I know that the concept that we just kind of talked about there of being able to have these read-only views of the data and trying to avoid copies this is really how we're getting a lot of performance squeezed out of the stuff that we're looking at when we look at the actual API for streams something about that doesn't sit well with me and the whole idea about what doesn't sit well with me is that this read method literally requires that we pass in a buffer of bytes and that when we read we have to do a copy this means that if we're trying to look through a data set again if we're streaming across that data the API that we're given requires right here requires that we give it a byte array and that we have to start copying things around yes
of course you can just use the seek method and seek to the location you want but if you're just trying to read through things and scan across it the stream API must force you to copy bytes into the buffer this part really bothers me and especially for my use case when I'm trying to go across a lot of data really quickly I don't want to have to copy anything that I don't need to so I'll leave it there as a little bit of a spoiler for what's coming up but just to reflect on what we looked at here basically we were able to squeeze out way more performance just by avoiding single byte copies which I knew was going to be bad in the first place to a different API that lets us copy the whole buffer at a time so thanks for watching if
you stay and watch this video right here this will show you the benchmarks for the other optimization that I think is way better than using streams
Frequently Asked Questions
What is the main focus of this video?
In this video, I'm focusing on optimizing the performance of a segmented stream design in C#. I discuss how I improved the read method to enhance efficiency when dealing with large files.
How did you improve the performance of the segmented stream?
I improved performance by replacing the one-byte-at-a-time reading approach with a method called try copy to, which allows me to copy an entire span of data at once. This change significantly reduced the time it takes to read data and brought the performance closer to that of a normal stream.
What can I expect in the follow-up video you mentioned?
These FAQs were generated by AI from the video transcript.In the follow-up video, I will share even more optimizations and benchmarks related to the segmented stream design. I'm really excited about these improvements, and I think they will provide even better performance than what we discussed in this video.
