ONLY 0.7% Of The Memory Allocations When Downloading In C#!

Name: ONLY 0.7% Of The Memory Allocations When Downloading In C#!
Uploaded: 2024-06-06T12:00:24.0000000+00:00
Duration: 14 min 12 s

June 6, 2024

• 695 views

A 20MB file isn't all that big -- unless you're stuck with dial-up internet. Remember that? But what IS big is the difference in memory allocation that we can have when we use a couple of helpful tips when downloading files in C#.

The default mechanism using the HttpClient in C# means that we'll allocate the file to a memory stream. However, using the techniques from my previous videos on this topic we can scale that WAY back to less than 1% of the memory allocation to download the very same file.

View Transcript

warning don't try this at home seriously I'm a professional hi my name is Nick centino and I'm a principal software engineering manager at Microsoft but seriously we're going to look at some benchmarks today and it's not the recommended way that I would get you to go do benchmarks in general but they're interesting and they're interesting because we're not going to care about performance in these benchmarks we're going to care about memory usage in this video we're going to look at downloading files from the internet in a streaming approach versus getting it all up front and we're going to compare the memory usage of both of these patterns if you haven't seen my previous videos on this I'll make sure to have a pin up at the top here so you can go check that out and come back but we're going to look at benchmark.us to subscribe to the channel and check out that pin comment for my courses on dome train with that said let's jump over to visual studio and look at these very interesting benchmarks okay to start things off I've added a new project into my solution called streaming data. benchmarks and what we're going to do is add benchmark.pl nougat package for doing benchmarks in your C applications it standardizes absolutely everything for you including the warm-up period all this stuff that you probably wouldn't even think about if you were just trying to stick a timer right around the code that you're trying to run and it's truly awesome and simple to use so we're going to do that and then I've added added in a reference to my other project again if you've watched the previous videos this is where the code lives that we're going to try and Benchmark let's jump over to the code itself and see what's up so at the very top here if I just clean this up a little bit you can see that the single line entry point that we have is Benchmark Runner to run all of the benchmarks in the current assembly but we only have one class for benchmarks I've just called it benchmarks it's not very creative naming and from here we're going to see the the four different Benchmark methods that we're interested in the important point that I want to keep reiterating as I go through this is that we're not interested in the runtime characteristics of these benchmarks this can't be overstated we're downloading files from the internet there are too many variables that we have no control over here to have reliable consistent benchmarks I could go run this get one set of results and then I don't even know where the server is physically in the world that I'm downloading these files from who knows what happened in between my machine and that machine so I don't want you to go looking at the performance part when we go through these benchmarks just completely ignore it it's not going to be valuable it's not the point of this video the point that we're comparing is memory usage in these two patterns the two patterns that we are looking at again recommend that you check out the prior videos on this but when we go to download files from the internet in C we have at least two different ways that we can do it but the two that we're exploring in this if I scroll a little bit lower you can see see that we're going to either use this class that I created that's called a stream downloader and this one will use some header settings that allow us to basically start downloading once we get the headers received in the web request and the default way that this works in C if you're not using any other special parameters is that when you ask the HTTP client to go download the file it will essentially Go download the entire file into a memory stream before continuing on now in previous videos people left comments and they said hey it would be really good if you could kind of prove this to us right like we get that there's a memory stream we get there's a network stream but how big is the performance difference with respect to memory usage you might be shocked by this and you might not be but the difference is truly dramatic when we go to look at the results and it's kind of wild if you're not really thinking about this stuff and just kind of using the built-in stuff as a shortcut and not really thinking about it it's uh it's pretty intense like I said we're going to look at four different benchmarks here and it's really two versions of doing it with the network stream uh so the web request is backed by the network stream and two versions where we're using a memory stream the other consideration that I had here that makes up the other two benchmarks though is that I wanted to show writing this to a null Stream So a null stream is really just going to be uh like an empty null stream that doesn't really have an implementation so it's not writing any bytes anywhere it's just basically allowing you to write to it and then the memory is not used and then I wanted to also show that we could use a file stream because this might be more representative in a real world application and I didn't know if we would see memory differences this way I just figured I would show you two different variations in case anything interesting came up between the two of course if I go to just use a null stream someone's going to ask well what happens if you go to use a real stream versus a null stream and I think it's totally valid that's why I wanted to include it something that I did not include in these benchmarks that will be interesting to think about when we see the results is that we are talking about downloading one single large file large in this context can have a lot of different meaning I'm talking about a 20 megabyte file in this case but you could extend this to go to much larger files and you will quickly see why it's really important to use the one variation of this because the benchmarks will definitely show you the answer to that the other thing that we could look at doing is instead of one big file relatively big we could many small files and we're not going to look at many small files in this example but it's something that I want you to think about so let's have a quick step through these benchmarks I'm going to start by going through the setup and I wanted to show you that what we're able to do is leverage the dependency injection stuff that I put in the previous video I said I want my stream downloader class to be able to work well with dependency injection so we're able to use add HTTP client here to get a IH HTTP client Factory which is great and then that gets in injected into the stream downloader really just by doing these lines here we can set up the pendency injection and get our stream downloader nice and easy from there I'm also pulling out the HTTP client Factory itself because we are going to use it directly when we're trying to bypass the stream downloader cuz we're going to basically Force the memory stream call if we go through these you can see the network stream variation here going to a null stream I wanted to make sure I could pull out any of the dependency injection setup stuff into the setup that way as much as possible within the benchmarks themselves we're only focused on what we're trying to measure that's always a goal with benchmarking and again we're not focused on performance here but even memory allocation I just want it to be extra safe that we're only looking at what we're trying to exercise we start by doing the download and again doing a download in a benchmark is not a good idea I just wanted to have a consistent way that I could show how the memory is being used yes I probably could have used used a profiler I have an Enterprise Edition of visual studio so yes I could have done that to compare and contrast it but I figured benchmark.pl thing to use so I wanted to show you the results in a similar familiar format so don't do this please um the rest of what this is doing is essentially trying to make sure that we can get our stream with length because this is the recommended pattern that I have for using my uh my stream downloader and then we're going to copy this to a n stream so you can see stream. null I don't mean the actual null value like that's constant I mean stream. null which is a stream implementation that doesn't do anything next we'll see the memory stream variation of this exact same thing and you'll notice that there is no stream downloader here I've already got that HTTP client Factory created for us and that was ahead of time in the setup but to make it an Apples to Apples comparison I did want to say hey look the HTTP client Factory still needs to make an HTTP client for us and that's only fair because the stream downloader is also doing this every time you make a request so that does make it comparable from there though these two things are basically done in a similar way under the hood of the stream downloader but the big difference here that we're going to see is that we have response content red here but response headers red is what the other one uses so response content red is the default Behavior this will give you a memory stream when you call it this way this coming back is a memory stream and then we're going to do the same thing where we take that memory stream and copy it to a null stream now the other two variations are almost the exact same so I will gloss over them super quickly but in these other two we're just writing that information out to a file stream Instead This is a little bit weird I'm basically going to be writing out to the same file so I'm using open or create here so it doesn't matter if the file's there or not we're just going to blast bites into it and that's the same in this other one as well that's the big difference between the two it's just going to be using a file stream instead of an all stream with that said though we have all of our benchmarks that we're going to be looking at because I did say we're not going to do many small files but we can talk about what that would look like once we see some of the results quick note before we go ahead and run these benchmarks you do need to make sure that you are running in release that's important it's going to make sure that there's optimizations that can be turned on and the other thing is that you want to run this without the debugger attached and a good easy way to do that it might seem kind of counterintuitive and you'll see in just a second if you rightclick on your project and your solution Explorer if you wanted to run this without debugging which menu item do you think you would pick well probably not this debug menu that's here but that's exactly where you need to go and then you can select start without debugging kind of weird I feel like this menu item should be named something like run or something else that's not debug and then you pick something that literally says hey do this without debugging but that doesn't matter cuz that's where it is this is just a brief Interruption to remind you that I do have courses available on dome train focused on C so whether you're interested in getting started in C looking for a little bit more of an intermediate course focus on object-oriented programming and some async programming or are you just looking to update your refactoring skills and see some examples that we can walk through together you can go ahead and check them out by visiting the links in the description and the comment below thanks and back to the video all right and here are the results it's going to be quite intense again we don't want to look at the mean here we don't care about the runtime in these examples because there's too much variability when we're downloading files from the internet that's why I don't want you to get in the habit of downloading files in your benchmarks it's just not a good idea what we are interested in though is this final column that says allocated because this is where the intense results of the benchmarks are we can see that using the network stream is significantly less memory allocated than using a memory stream right so the file that we're downloading from the Internet is roughly 21 megabytes and you can see that in the memory stream situation it is fully allocating 21 megabytes to hold that entire file in memory this is the proof this is where we see that happening in the other case with the network stream we're only using 150 kilobytes so this is a dramatic difference it's two orders of magnitude less memory to be used if you think about a server that you might be running and you want to be able to go download files from the internet again we're not looking at the runtime of this it's not fair to measure it this way but at least for memory allocation we are truly able to just stream it without using tons of memory literally we're using I don't know what the ratio is there I'll have to calculate it after but it's you know it's an two orders of magnitude less which is just incredible I would highly recommend that if you were trying to download large files from the internet that you would consider using a network stream something like this stream downloader implementation that I've been able to show in these videos that is using that different header option you don't want to wait for all of the content and have it basically pulled into a memory stream you only want to wait for the headers to come back and then you can stream the rest now I did mention this idea of having many small files and this is something else you could consider and this is why I don't want to say hey as a rule of thumb you must use this you should never use the other one because I don't have data that shows if the other one could be valuable but here's the exercise that I want you to think about if we see that for the network stream it's 150 kiloby and this is a 21 megabyte file that we're downloading so let's consider maybe we had 100 files that we wanted to download and they were 75 kilobytes right that 75 kilobytes is well within this number here so what would happen would it be more effective to go do a hundred of them and just download them all to a memory stream have them in memory and it's just easy to work with and it's not a big memory footprint or would it make more sense to be able to go download 100 of them at 75 kiloby and stream them on the Fly I don't know the answer to that might be that maybe the memory allocation is well within your accepted range if you were doing a hundred of them pulling them all into memory don't know that might be totally cool maybe we would see that the runtime difference starts to show up there again you wouldn't want want to measure with benchmarking this way unless you had complete control over how that stuff was working but in my opinion as soon as you start putting a network in between things uh it's going to get a little bit dicey you could probably simulate it with just using normal streams kind of having a fixed source and being able to do a bunch of copying but I wanted you to think about this scenario because you may encounter something like that in the projects that you're working on so if it's many small files versus one big file you may want to reconsider what you're doing if you thought this video was interesting and you want to see other performance benchmarks you can go ahead and check out this video next thanks and I'll see you next time

Frequently Asked Questions

What are the two methods of downloading files discussed in the video?

In the video, I discuss two methods of downloading files: using a network stream and using a memory stream. The network stream allows us to start downloading once we receive the headers, while the memory stream downloads the entire file into memory before processing.

Why is it not recommended to benchmark file downloads in this way?

It's not recommended to benchmark file downloads in this way because there are too many variables outside of our control, such as server location and network conditions. This can lead to inconsistent and unreliable performance results, which is why I emphasize focusing on memory usage instead.

What is the significant difference in memory usage between the two downloading methods?

The significant difference in memory usage is quite dramatic. When using a memory stream, the entire file, which is about 21 megabytes, is allocated in memory. In contrast, using a network stream only allocates about 150 kilobytes, making it two orders of magnitude less memory-intensive.

These FAQs were generated by AI from the video transcript.