This C# Change OBLITERATED My Memory Allocation Benchmarks

Name: This C# Change OBLITERATED My Memory Allocation Benchmarks
Uploaded: 2023-05-12T15:29:06.0000000+00:00
Duration: 12 min 56 s
Description: This is the third video in the series looking at a segmented stream in C#. We examine yet another optimization of the initial design, and the benchmarks are WILD! How many orders of magnitude?! Check it out!

May 12, 2023

• 822 views

This is the third video in the series looking at a segmented stream in C#. We examine yet another optimization of the initial design, and the benchmarks are WILD! How many orders of magnitude?! Check it out!

View Transcript

today we're going to be looking at a follow-up from two previous videos I did about trying to read in large files into memory in C sharp so when I say large files in particular I'm talking about files that aren't going to fit into memory even if we wanted to so you know things that might be terabytes in size and how we can stream through those effectively offer the ability to seek around effectively as well and have multiple sort of readers across that Stream So if you haven't seen the two previous videos I'll be linking them up here so that you can go back and watch them please go check those out first come right back here and continue along so you can get the full context and then I hit mentioned that I found an even better approach and it all came back to streams and my opinion on avoiding streams so in this video we're going to look at some of the code for that I'll show you some benchmarks and let's just jump over to visual studio alright so on my screen we have visual studio and this is actually my segmented stream we're looking at the read method and what I was mentioning in the previous video was that my optimization had worked successfully but the concept of using streams in particular is kind of crappy for what I'm trying to do and why is it crappy well if I scroll up to the the actual signature here for the read method which is on the stream we actually have to provide a byte array buffer that we're going to copy data into and the key word here for what I don't like is that we have to copy data into it so yes we can seek across a stream and jump to any point we want but if we want to actually examine what we have in the Stream we have to copy bytes in my particular implementation of my segment map that backs my segmented stream I actually have this stuff already pulled into memory so why do I need to go ahead and copy that memory around just to be able to read it I figured there has to be a better way in a more efficient way what I came up with is actually something that I think is in the spirit of being able to use things like spans and read-only sequences so I actually was able to add an extension method to my segment map and this actually lets me use my segment map as if I had spans over that segmented data awesome so here's my API for my extension method and what I have is an i enumerable of read-only sequences of bytes and I just called that part slice and then I added another Boolean so this is a a tuple or if you pronounce it Tuple then it's it's one of those so you have a basically uh a double parameter that comes back as part of this I innumerable and because it's an extension method we have this syntax here where we have this that precedes the I segment map and it's on this static class and the static method so if you're not familiar with extension methods it just allows you to instead of passing in a segment app it looks like you're saying segment map dot iterate slices so kind of this nice syntactic sugar that makes it look kind of cool and then really the only other parameters that we have are offset in length and these parameters if you think about a segment map being something that actually resembles the entire disk in this case or a large file what we're able to do is think about any offset within that and any length within those bounds So in theory if you wanted to start at zero and read all you know two terabytes of some image then you're able to do that just by providing zero when the equivalent value for two terabytes is the length so of course we can't pull a whole two terabytes into RAM so that's what the segment map is supposed to help us with but what this API should let us do is actually read without doing all of the additional copies so I still have a fixed me in because I'd like to clean this up and maybe check some validation for some of these inputs but that's to follow and then the rest of this code is actually oddly similar to my segmented stream read method in fact I just copied and pasted over a lot of it and then changed the part that has to do the actual array copying so in particular I might be able to get away with removing some of these variables I just have these left over because I like I said I copied and pasted it didn't want to break too much but let's scroll down a little bit lower and see what's going on so same logic that I had in the other place for actually being able to get segments pulled out in fact I changed the other one already to have debug asserts I'll probably do the same on some of these this was just for sanity checking as I was testing it out so we pull in the right segment for what we're trying to load so for example if we're at a particular offset then these two methods together allow us to get the right segment for that offset and then within that segment we actually know which offset we want to be at the part after that is supposed to help us get the slice out of that segment and we have that as a subsequence beyond that it's just moving all those other counters and stuff along so like I said we might not actually need a bunch of these but I just had these copied over because it was working before in my stream the last little part that's different here is really just that I yield return this subsequence and then I have this other piece right here which is just whether or not there's more data from my perspective I don't actually really need this second piece I just figured it might be nice if you're inside of a for Loop and want to know if there's more data still coming so this method is really simple and really similar to The Stream implementation that I had and the nice thing about that is that it was already working for the stream so I don't have to try too much just to prove that this one works but I am going to follow up with writing some tests okay now that we've seen that let's go ahead and look at some of the benchmarks that I have for this to briefly explain my benchmarks I'm using benchmark.net and I I'm pointing at an image of a hard drive that I have on my desktop this file here image DD is just a 40 gig hard drive I don't have any setup or cleanup I've actually just left these I can remove them I don't really need them right now and then let's go look at the two methods that we're using for benchmarking the first is going to be using the segment stream and what I do in this Benchmark is I make a new segment map I'm trying it out with a 4 Meg segment and then being able to hold up to 50 of those segments in memory so that would be 50 times 4 Megs or 200 megabytes of ram in total for this stream now something that's interesting about these particular benchmarks is because I'm not doing anything to seek back and forth on them the actual value of having more and more segments here isn't actually really noticed because I'm only going to be reading forward in fact holding a cache of segments that we're not going to be looking at again is kind of just a waste of ram but it's going to be the same across both of our benchmarks so I'm not actually too worried about that messing anything up right now the next part is that we just have to wrap our segment map in a stream which is the segment stream that I showed at the beginning of this video and what the two previous videos focus on looking at and finally I just have a 4K buffer that I'm going to be reading into and we read from the start to the end and like I said it's going to be a 40 gig image so we're just basically doing 4K copies of the data in that image buffer by buffer pretty simple and this runs no issues and I'll show you the output after but let's go ahead and look at the other Benchmark which is using my slice iterator extension method so again we go make a segment map I could refactor this and just actually have it pulled into one spot so that when I make a new segment map it has all the same parameters I haven't done that yet but one thing I want to point out is that I cannot use the same instance of a segment map between these two benchmarks and I can't use the same one across runs of this Benchmark the reason why is because this segment map caches that data it will in fact be able to go pull all of that stuff from memory the second time it's used or at least as much of it fit into memory so initially when I was working on this and it was only over a one gigabyte file I might have been able to accidentally allocate a full buffer of memory or full number of segments to Cache that would hold literally the entire file in memory now the subsequent benchmarks would be skewed because they would show that it's doing something different it's not reading from disk it's actually pulling from memory again and of course people that are keen and watching this going well it's not really fair to Benchmark this stuff when it's reading from your disk there could be other things going on in your computer at that time yep I totally get it I'm just trying to prove what will actually happen in real world scenarios and of course if I want to have better benchmarks and just prove some of the nitty-gritty stuff stuff more accurately I can of course go make some other more contrived data sets and then not have them load from disk but what I'm trying to do is show my different pieces working together and I just wanted to see if across some several runs that I'd be able to notice a difference so the final part of this Benchmark is really just this full reach Loop and like we can see I'm just calling the segment map iterate slices and recall that I said it's an extension method right so that's why I can have this nice syntax here I'm starting at the zero offset and then just going to Long max value and the nice thing is this is going to trim at the actual length of my segment map so my segment map is not the length of long max value but this is just a safe way to get to the end and now that I'm reading this and kind of talking about it in the video I could actually have different overrides for this maybe it would be something like you know no parameters is just going to do the whole thing or you could have a an offset only or you could have the length only that kind of thing maybe the length only scenario doesn't really make sense so I'm going to play around with the different apis for this after but I just wanted to prove it first alright so that's probably enough of me blabbing about code I know you're probably interested in the benchmarks so get that drum roll going and let's go see what the output is alright so the first row that we have here is for Segment stream please recall that this is for reading in a 40 gig file and streaming across it and you'll notice that the mean is about 24 and a half seconds so that's not too bad I'm not really concerned about that amount of time I just want to be able to compare the two when we look at the memory allocated though there is a ton of memory allocated because we're doing 4K sections at a time so the total amount that we ended up having to allocate here is actually almost three gigs of small allocations that we had to do if we jump down to the slice iterator it's at 21 seconds so it's faster which is nice it's not like it was half the speed or something ludicrous but a little bit faster I'll take it I'll take all the performance I can out of it if we go look at the memory allocated though boom boom boom we have multiple orders of magnitude less memory allocated this is actually mind-boggling to me um I think I probably could have assumed it would have been a lot less like it is um but I mean just seeing the result is like kind of kind of wild to me so the actual API itself gives me the same benefit where I'm able to start scanning across the different parts of that file and I've already got pieces of it pulled into memory but without having to do all those little allocations this amount of memory that it's ending up using is so much better that I'm just stunned alright so those are pretty awesome Benchmark results but I still have my work cut out for me everything that I've written so far that's actually using this segment map uses the stream approach and that means that everything I've coded so far is used to this stream API and now I have to go change everything to have this full reach iterator approach so I'm gonna have to go revamp a lot of code now I still haven't actually gone through and run this through a profiler to see if there's any other performance I can squeeze out of this that's still the follow and I'm still excited to follow up with more optimizations for this but personally I'm really happy with the progress so far moving away from the stream API and being able to use something like read-only sequences and spans a little bit more effectively so thanks for following along with these videos I'm still hoping to find some other optimizations that I can make some follow-up videos on and if I get things working the way I expect in the way that I hope I might actually follow up with some of my use cases for trying to consume some of this so that I can show you why I'm trying to actually scan across large sets of data again thanks for all your support and you might want to watch these videos over here

Frequently Asked Questions

What is the main focus of this video regarding memory allocation in C#?

In this video, I'm focusing on how to efficiently read large files in C# that don't fit into memory, particularly by using a new approach with streams and my segmented stream implementation.

How does the new slice iterator extension method improve memory allocation compared to the traditional stream approach?

The slice iterator extension method allows me to read data without making multiple small memory allocations, resulting in significantly less memory being used compared to the traditional stream approach, which required copying data into a buffer.

What are the benchmark results comparing the segmented stream and the slice iterator?

The benchmarks showed that the slice iterator was faster, taking about 21 seconds compared to the segmented stream's 24.5 seconds, and it also allocated multiple orders of magnitude less memory, which was quite surprising to me.

These FAQs were generated by AI from the video transcript.