BrandGhost

Watch Out For THIS When Downloading Large Files in C#

Downloading files is a common thing that we have to do in applications and we can download files in C# very easily! However, we have to pay special attention when we're downloading big files because it's easy to make a common mistake that could cause BIG issues in your application. The first thing we'll look at is how to ensure we don't grab the WHOLE file all at once and jam it into memory. Not a great time to be had by doing that with very big files. The next thing we'll look at is just how silly the Stream API is in C#. It's easy to make mistakes when assuming too much about the streams we have!
View Transcript
in C one of the apis that I have to work with the most is probably the stream class the stream class is probably one of the apis that I hate the most hi my name is Nick centino and I'm a principal software engineering manager at Microsoft in this video I want to walk through some Oddities with the stream class when it comes to downloading things off the internet we're going to look at an example where we try to download a video and this could be any large file that you might want to consider and some of the nuances that come into play when we go do this a quick reminder that if you like this kind of content subscribe to the channel and check out that pin comment for my courses on D train with that said let's head over to visual studio and look at this simple example for downloading a video okay on my screen right now I have an HTTP client that I'm creating at the top and we're going to download this video it's about 20 megabytes from sample.com nothing too fancy here just a free sample of a file for video that we could check if we wanted to to make sure that it's the right content and it's 20 megab so it's not like a trivial couple kilobytes that were down loading a couple bytes we really have something that we can go check after make sure it's okay we're going to download this and what I'm going to do is compare a couple of different methods so the traditional way and I've already included sort of this extra parameter that we don't necessarily need to see cuz it's the default but what we're going to do is look at this first method where we try to ask our HTTP client to go to this URL so would say get request we will go to this URL and ask for the content and we'll read it back as a stream I I have a stopwatch around here I have a little disclaimer because I've had some comments on previous videos where people are like hey you can't use that that's not good for benchmarking surprised I'm not benchmarking it it's going to be a quick comparison and what I want to do after is look at some of the details for the stream that we get back I'm going to start with this one we'll check it out in the debugger as well so um I will put a breakpoint here so we'll look at the details in the debugger before looking at it in the console but pretty simple right download a video from this URL let's see what's up all right so the video editor probably did a good job and trim that out because it was boring while it waited there waiting for the whole video to download I have pretty good internet too it should have gone pretty fast but surprise it took a little while how long did it take well it took about 12 and A2 seconds right 12.59 seconds yeah I know it's a stopwatch it's not the most accurate thing in the world for us to be using for situations like this but it's going to get the point across when we go to compare things so we got the stream if I go check out the stream if I hover over it right stream one here it's a Memory stream this is something I want you to pay attention to as we go forward so what's cool about this though is that we have a seekable stream so a stream that we can seek through front to back we can start wherever we want we have it and we have a length what's important about that length is depending on what you're trying to do in your application you're trying to download something from the Internet you may want to care about how big that Target is that you're about to to try downloading there's something interesting about this and if you're not familiar with streams and the different stream apis that we have to work with a memory stream is a stream where all of the data is well in memory which means that we downloaded this video and we technically have a memory stream that's holding or wrapping around this bite array that is the full 20 megabytes of the video it's not downloaded to my dis somewhere it didn't do something else it's just downloaded into an array wrapped in this memory stream which is interesting when we did this we now got a stream that we can seek through that's one of the properties of a memory stream the reason I'm calling this out and the length as well is that when you go to leverage streams there's a couple of interesting things that you can do with them in terms of copying data you know moving through them but the API that we get allows you to make those calls but the API doesn't enforce that the stream that you're working with must be a ble to do those behaviors this is the first example like I said we get the full stream downloaded into memory it's seekable we have the full length all great things for us to work with but the weird thing is that it truly all is in memory 20 megabytes for a small application like this not a big deal right but if you're running a web service and you need to be able to stream things in and stream them back out and you have many requests you could imagine that concurrently you might find yourself in a situation where there's a ton of memory being used and the other thing that was important was the size we need to be able to look at the size of the content we're dealing with so really when I go to look at this next example I want to be looking at the header that we're using or one of the properties for working with the header I had a little bit of a spoiler this HTTP completion option and it's this response headers red versus response content red this one is the default this is what I had originally and if I include it here this is technically the same thing that's going on but before continuing on just a quick note from this video sponsor which is packed publishing packed has lots of great books on CP and.net development and in particular I wanted to talk about this one web development with blazer if you're a csharp and.net developer and you've been looking for a front-end technology that you can leverage C in this book is going to be an awesome fit for you you can learn how to build Blazer applications so you can leverage C in the front end you can check it out in the links Below in the description and the comments thanks and back to the video all right and now I want us to look at this next example that we have here this one's going to be response headers red versus response content red everything else about this example is set up to be the exact same I've just given it a two as the suffix instead of one so we are going to have both of these we can go compare and you'll see that in the output here I'm doing the same thing right we're going to get the stopwatch time that it took we're going to get the length and we're going to see whether or not we can seek so we're going to compare these two variations as soon as we put this this instead of the former and look this is weird it doesn't work why wouldn't this work though it's the same thing all that we've done is we've changed around that we're not dealing with the uh content being fully read versus just reading in the headers we're trying to say hey look as soon as we get the headers we can start working with the response versus waiting for all of that content but this isn't working we get a not supported exception well what's not supported if I hover over this we can see the stream 2 length is not a supported method so calling length on this particular stream will throw an exception also if we check if we can seek it's false the first one if I go back up move this out of the way we can see can seek is true so well what kind of stream is this if we check it out it's a HTTP connection content length read stream it's not a memory stream something that's important to note about this particular stream when we compare it to the memory one is that the entire content is not preloaded in an array when we go to use this it is a little bit more lazy in terms of functionality but in terms of being able to see the result of that we can't do that yet until we try to address this length issue we don't have a stream with length on it and that might be a problem if you really needed to use the length on your stream what can we do about that well there is an HTTP header that should be set it's not forced to be but it should be set if you're dealing with a client that you want to be able to download stuff from the internet those requests hopefully get the response with an HTTP header that is for the content length so we could go check that out all right so something we can do is write some code that looks a little bit like this you can do this in different ways but we are going to ask the response for the headers specifically the content length this is going to come back as an i inumerable of strings or null if it doesn't get those headers from there we go and pick the first one we can then try to do something like saying it's not found it depends what you want to be able to do with this right at the end I just want to be able to get the representation numerically of that header here I have length it should be a numeric representation if it was present and we can go ahead and replace that right here now I do keep the stopwatch just around the stream part here because yeah this is potentially going to take a little bit of extra time it shouldn't be noticeable at all but I want to talk about the time it takes around the stream part and getting it from the internet specifically hopefully when we go to run this we'll see more details I'm not going to debug it instead we're going to check it out in the console okay now we have some data coming back and if we check out we can see that the duration is about 5 Seconds that's a little bit faster than we saw the first time it's about half the time it took I don't know what's going on with that but I am connecting out to the Internet so who knows exactly right we can see that we get the full length back because this one is a memory stream it will have the full thing we can see that with the memory stream we can seek as well so that allows us to go back and forth through the string when we look at the second example doing response headers read you can see the duration is significantly faster right an entire order of magnitude the first time we ever did this first one at the top was about 12 seconds and this one's 132 milliseconds what's kind of crappy is we can see that the length here is zero bytes so I was checking for the header I did go examine the headers coming back on this particular file request and we're not getting the content length header so not going to work all the time so if you wanted to build something around this you would have to check for the header and build in some resiliency around that because we can't guarantee that the server that we're requesting the file from will put that header in place unfortunately the other thing we see here is that we cannot seek on this stream right that's important what you need to make sure that you're doing is when you're writing code working with these streams if you need to be able to seek you need to check the can seek property generally I find and again it's not in forced but if you cannot seek usually you will not have a length as well those usually go hand inand again it's not enforced in code you can go Implement your own stream and do whatever the heck you want with it I would advise that you check that there is unfortunately no ha length as far as I know which is kind of crappy because it would be nice if you could check that before it throws an exception that's why I'm suggesting usually if you check can seek and it's false you probably do not have a length on the stream as well however like I was trying to show if we go back to the code if you are asking for the content length that is a workaround that you would be able to potentially have a length to work with what you could do from there is if this did come back with a length right so let's say it came back and it gave us the length of the original stream like we saw in the first example at the memory stream now we know the full length of the content awesome what we could do from there is build a stream that would essentially wrap that and have the length for us so you could if you wanted to I'm not going to walk through all the details of this but you could go make your own class here so we'll do class stream with length and then we could make it inherit from a stream which is an abstract class I believe it's marked as abstract it is good okay I'm not making stuff up that's what we like to see but you'll notice that like there is a lot of stuff that you have to go Implement on a stream unfortunately right that's why I'm not going to go through all the details of this but what you could do is have a stream that you're wrapping maybe we'll call it like inner right for the inner stream and then we can ask Visual Studio to slap a Constructor on there for us and pass it in and if this is a stream with length what you could do instead is and this I'm not trying to give you the best API for this but you could pass in a length specifically and that way here let me go make this a field as well I'm just using some shortcuts by the way um so this might say initialize property length where is the field is there no that's kind of silly I thought Visual Studio had a oh that's what it's doing it sees that there is a property already and that's why it's suggesting I use it so you could do this instead where instead of all of these going to like inner can write inner can see and you kind of just do these forwarding properties and forwarding methods for the length itself you might say hey look like I'm overriding whatever the inner one says because we know it's going to throw exceptions and if you were able to get the length off of the header itself you could essentially Go Wrap your network stream inside of this one and pass in the length off of the content this unfortunately doesn't fix your issue with seeking you're still wrapping a Str or a stream that you cannot seek through unfortunately but you could at least get a little bit further with the length the entire point of this is that I wanted to walk you through the fact that the stream API can be challenging to navigate and if you're expecting that you can always seek or expecting that you'll always have a length especially when you're working with network resources and things like that it's not always going to line up well for you if you enjoyed this video and you want to see some coverage on benchmarks you can go ahead and check out this video next thanks and I'll see you next time

Frequently Asked Questions

What are the main issues with using the Stream class for downloading large files in C#?

The main issues with the Stream class when downloading large files include the fact that not all streams are seekable and that you may not always have access to the content length. For instance, when using a memory stream, all data is loaded into memory, which can lead to high memory usage for large files. Additionally, some streams, like HTTP connection content streams, may not support seeking or provide a length, which can complicate handling the data.

Why does the method using response headers read perform faster than the one using response content read?

The method using response headers read performs faster because it starts processing as soon as it receives the headers, rather than waiting for the entire content to download. This allows for a more efficient use of resources, especially when dealing with large files, as it doesn't require the whole file to be loaded into memory before you can start working with it.

How can I handle situations where the content length header is not present when downloading a file?

If the content length header is not present, you can implement a workaround by checking for the header and building in some resiliency around that. You can try to retrieve the content length from the response headers and use that information to manage your stream. However, if the header is missing, you need to be prepared for the possibility that you won't be able to seek through the stream, and you should check the can seek property before attempting any operations that require seeking.

These FAQs were generated by AI from the video transcript.
An error has occurred. This application may no longer respond until reloaded. Reload