Against The MSFT Rules! - Refactor Large File Downloads in C#
You folks asked for it, and I delivered! When I put together content about challenges downloading large files in C#, you wanted to see how we could take this a bit further and make helper classes out of it.
The benefit of doing this is that we can encapsulate some of the challenging behavior with downloading large files. Namely:
- We're not getting a stream with a length
- We don't want to pull the whole file into memory
In this video, I walk you through my design decisions between two different APIs we could explore. If you want to see a more in-depth view of this, make sure to check out my livestreams where I explain how I am putting this content together for YouTube!
View Transcript
you guys asked for it and I'm here to deliver you want to see helper classes for downloading files from the internet in C hi my name is Nick centino and I'm a principal software engineering manager at Microsoft in one of my previous videos I was showing how we can download large files from the internet and how we might manage that with streams inside of C there's a few gotches that we have to look out for now some of the comments on that video were that people wanted to see more work that we could do with that how we could take it a little bit further how we might leverage that inside of some help classes so that we could reuse it in the rest of our code I walked through a live streaming exercise where I went ahead and refactored all this and talked through
my API design so in this video I'm going to walk you through the succinct version of all of those API design decisions and talk about why I went particular directions a quick reminder that if you like this kind of content subscribe to the channel and you can also pay attention to those live streams when they come up and check that pin comment for my courses on dome train let's jump over to visual studio and look at some of these helper classes I'm going to start us off with the sample application that we're going to be looking at on my screen I have a sample application and if you pay attention to the beginning of the application I am dealing with a service collection I wanted to make sure that from the start we could think about building this class that we could use with dependency
containers in particular I'm going to end up needing an IH HTTP client Factory and that's because in the previous video I was just newing up an HTTP client that's not really the direction we want to go so you can see that I'm doing add HTTP client here this does come from a Nate package if if I go ahead and check out the project here Microsoft extensions HTTP so if you're building stuff in asp.net core I believe this will already be available for you but you'll want to add this into your dependency container it's just an extension method we get and then you'll see that I'm adding a Singleton which is our stream downloader and this is the class that we're going to be building together from there I'm going to go make the service provider and then that allows us to go resolve our dependencies
in that magic way right it feels almost magical how automatic it ends up becoming for the dependency resolution from there I have this URL and someone on the stream suggested that I go to het.com and go grab a sample uh file that we can go download from there basically what I'm going to do is ask our downloader to get a stream and part of what we're implementing here is a stream where we can also get the length we'll talk about some nuances with that and then we're going to pass it into this other system component that I have this is a totally contrived class this is just to sort of mimic this situation that I had in real code that's going to bring us into one of the big decisions for this API design that we're going to look at here the challenge is that
stream itself the stream class has a length property on it if I F12 and I go into the stream class itself this is you know the stream class and if we check we have a length property here the challenge with the length property is that we can't check there's no dedicated property that says has length I'm going to get into something in a little bit uh that describes some other characteristics or assumptions about these classes but there is no has length property the challenge with that is that because you can directly access length on a stream you might be inclined to say well I'm writing some algorithm or some bit of code that needs the stream it has a length property I'm just going to ask for it but if it doesn't have it implemented it's going to throw an exception and one of the
challenging things is that when we're dealing with network streams and trying to download large files we get a stream back that does not have a length on it and that means that if we're passing around this stream that happens to be something that we're downloading from the internet we can't access the length so what can we do about that that was all explained in the previous video but you'll see some of the API design here as we go into our stream downloader as I mentioned is going to take in this IH HTTP client Factory and that's because when we call get stream one of the first things we need to do is go create that HTTP client from there we're going to call this get a sync and we're using this uh parameter here response headers red this is an important takeaway from the previous
video this flag allows us to essentially start streaming data as soon as we get the headers and we're not going to wait for all of the content to be downloaded before continuing this is important because between these two variations which is using this or when we use the other variation which is the default response content red when you use response content red it will pull the whole stream into memory it will put it into a memory stream backed by an array where all of the data is the problem with that is with large files you're not going to want to do that but before we move on this is just a reminder that I do have courses available on D train if you want to level up in your C programming if you head over to D train you can see that I have a
course bundle that has my getting started and deep dive courses on C between the two of these that's 11 hours of programming in the C language taking you from absolutely no programming experience to being able to build basic applications you'll learn every about variables Loops a bit of async programming and object-oriented programming as well make sure to check it out we're going to use response headers red this is going to allow us to get a stream that is not a memory stream and from there we're going to try to essentially get the length and that's where the other complication comes up this just gets us a stream if we were to check this at debug time doing these two things together we'll have a network stream but try get content length is something that we wrote in the previous video and that's going to check
the headers off of content and make sure we have content length we will parse out the long value that is representing the number of bytes that way once we get to this part here line 88 we're at a point where we have the length and a stream now this is where I want to talk about some of the API design and why I'm returning if we scroll back up here this stream result type I made stream result here and it has a stream andal long length one of the reasons that I'm doing this and not just using a stream that has a length sort of already on it is because there's a bit of a complication in what Microsoft recommends for the stream type apparently and I found this out on the stream I'm going to switch over here to my browser and there's a
stack Overflow answer here that is basically answering someone's question about stream length and can seek when you have can seek set to false so you have a stream that's not seekable and for uh comparison a memory stream is seekable because we have the whole thing in memory we can go any point on it but a network stream and there's other various streams are not seekable so you can't just start downloading something and say oh wait after two gigabytes of downloading let me just kind of scroll back a little bit in this stream you can't seek around so it's a non- seekable stream but what Microsoft recommends from msdn here if a class derve from stream does not support seeking calls to length set length position and seek throw not supported exception the reason that I don't like this is because I have a particular situation
where I do not care about seeking whatsoever the situation I have in my own code which is why I'm creating all of this is that I need a stream and I just want to know the length but because of this type of stream and the recommendation from Microsoft we are not supposed to use the length of the stream if we cannot seek if I go back to to the code now that's the reason why on the stream result I am returning a length separately it's also nullable so let's go back down and look at The Return part here when we're returning we're always going to pass the stream back but when it comes to the length we're checking if the stream is uh seekable then we'll return the streams length otherwise it's a stream where we don't have the length if we look a little
bit above here we have a return when we did get the content length and we'll pass that back separately we either have the length from the content length or or we'll check if it's seekable we'll give the length of the stream back otherwise there's no length when it comes to using this API it feels a little bit clunky for a few reasons so let's go up and explore the usability of this and when I say usability I want to talk about the ergonomics of how it feels to call this API use the result and kind of what the code feels like afterwards if we look at the stream downloader here this is where we're calling get stream a sync we get this stream result object if I hover over you can see that it is a stream result which is great now we have this
component and I didn't really explain it earlier but do something would have to take in a stream and a long length if we wanted to do something with the length and this should read something like this the reason that we would have to do this is that according to the stack Overflow answer that we saw where it was quoting some Microsoft msdn documentation we can't trust the length on on the stream class we can't trust it because if it's a non seekable stream we should expect that length will throw an exception again I don't really like this and we're about to see why it means that I'm going to have to pass along a dedicated length right I can trust that whatever length is coming in here is going to be right and I can't trust the length on the Stream So doing something like
stream. length I can't do that unless I check can seek first if that's the case the way that I've done this we're not supposed to be populating the length property if it's not seekable whole bunch of stuff that doesn't feel good about that and it would mean essentially that all of my Downstream code that needs to use both the length and the stream literally needs to be either provided separately like this or packaged together in an object that I can go pass around instead of just using the stream to me that feels pretty gross so in this example here this uh this you know mocked up little dummy class isn't really doing anything but the point was that I have code in my own projects where I need to use the stream for something but I also need a length and I need to be
able to trust that length unfortunately this is how this would look I don't really like the ergonomics of that it feels gross to have to pass both of these around and if we look at how we're calling this essentially if we want to call do something we need a stream but we also need to have a length and what we would have to do is a null check which also feels really gross you can see that I'm using the null coalescing operator here with the double question mark to check like if the stream length is not null we'll pass it in otherwise if it is null I need to throw an exception right here we can't continue on because do something requires a length to me all of this feels a little bit gross the fact that I have to pass two parameters into do
something just isn't a great fit for me and the other part if we scroll back down this nullable long length also not a big fan this is where I turned to the viewers on the stream and I said I think that I want to show you the other way that I would want to do this and it means we're going to break some of the rules if we go back to stack Overflow if we break this rule here and we say look we have a stream it does not support seeking but we are still going to populate the length I think personally that it's going to make the code that follows a lot cleaner to use so I wanted to explore it on the stream we walk through both and you're about to see the other variation of this and how I feel like it
cleans up the API but I'll leave that up to you and you can leave a comment below based on your decision and what you feel works better so in my code now I'm going to flip over to the other implementation of stream downloader okay this next variation is going to be very similar but there's a couple of differences in particular around the return type so stream result is still an object that I'm using as the response type and you can see that instead of having the length and that's nullable in the Stream result it just has a flag as to whether or not it has a length on it because this method can still return a stream that does not have a length and that's the case if we have a network stream and there are no headers that set the content length we need
a flag for that and that's what I have here and it's just going to return a stream as well packaged along with it the rest of this code is very similar so it's not really worth explaining all that we just saw here but what is different is that when it comes to returning things I'm going to make a dedicated stream that has a length associated with it and this is a custom class it's almost entirely pass through methods but if we jump into it what is different about it is that it takes a stream as a stream that it's going to wrap and that's where all the pass through methods you can see here right the pattern you see where it's like arrow pointing to the same thing these are all pass through what's different though is these two things here and really the important
one is the length so I'm just saying look I'm creating this stream and I'm telling you what the length is I don't care what the inner stream says it has for a length I'm telling you what it is the take ownership part is just when it comes to disposing so if I scroll a little bit lower we can see that when we go to dispose if you've said I would like this stream to take ownership then in fact it will dispose of that inner stream that it's wrapping otherwise not a lot of interesting stuff going on with this stream this is just the part that breaks the behavior with can seek right because can seek could be false and we are still saying look I'm never going to throw an exception when you call length I'm just going to make sure that we're returning the
length that was passed in so yeah that does kind of break the rules but we'll see why I feel this makes it feel a little bit cleaner one of the returns is that we go create that stream with a known length we have this content length coming out when we have the header great and then when we go to make this we give it the content length passed in and this is just returning the two things packaged together otherwise when we go to return we have a stream and we don't have to wrap it with anything and we don't have to wrap it with anything because we might not know the length of it the other thing that we're doing here is that we have a can seek now as I'm reading this potentially what we could do is modify this a little bit further
and say if it can seek we might be able to wrap the whole thing up with another stream with length we could go do that but basically if it can seek we know that by definition for what we saw on stack Overflow if Microsoft is adhering to this if it can seek it will have a length we should be okay Technically when it comes to calling this thing what's that going to look like if we scroll back up we can see that this code no longer compiles and that's okay we're going to get rid of it but we're going to see what we can do instead I have another method call that I'm going to be using here and it looks pretty gross but we have to go change something as well on the system and this is the part that I wanted to illustrate
because this part I didn't write my code this way I didn't want to have to pass a stream and a length around separately I said that feels kind of gross and kind of silly I just want to be able to trust whatever the stream has for a length this is just one parameter it's not two things passed around together this just feels way better to me what I wanted to do is this kind of thing and basically you can see that it says stream with length in particular I am forcing a caller to pass me this type there's a reason for that and it's that I need the safety of knowing that I can ask for this length that means if you have another stream that is seekable you would still have to go wrap this with a stream with length and we're going to
see why this looks a little bit gross because if you check out here when we go to call do something this code looks pretty nasty right I get the stream result back and I asked for the stream but we have to go wrap it with the stream with length and you're probably going Nick how the heck is this any better and I agree this feels pretty gross so far but there's some improvements we can make when we go to check for the length itself we have to go see on the result does this thing have a length or not if it does we can ask for it otherwise we're stuck here throwing an exception when I did this on the stream because I was doing this live and trying to see like how things look and feel I repeated myself on the stream and I
said as soon as I find myself writing throw exception I I find generally that I'm not structuring code in a way that I'm happy about I like to use more of a result pattern and a lot of my code you'll see it's like all the methods are called try to do something and we get a result type back if I have to throw an exception like this not a good feeling but the reality is that this thing requires a stream with a length if we don't have one we can't continue I don't like throwing exceptions what's the way that we work around this well I want to go clean this whole thing up and I think we can do that with some extension methods what we're going to look at now is some extension Methods at the bottom of this whole file we'll jump down
here let me go collapse this guy we don't have to look at the details and we're going to look at stream extensions here I have this extension method called try as stream with length and it's going to operate on any stream that we have and I wanted to give a shout out for folks on YouTube comments thank you for suggesting this attribute we'll look at this in a moment basically this extension method will say if we're talking about a stream that already is of this type we're just going to cast it and return it we don't have to do anything fancy no point in wrapping a stream with length inside of a stream with length it's just silly so we'll return the same thing then otherwise we're going to do a couple of things here I'm going to talk about this TR catch in a
moment but we're going to see if this stream is seekable we are going to create a stream with length and that's because we're going to use that rule that we saw on stack Overflow where if we have a seekable stream it means that we should be able to read the length and this will not throw an exception so we're going to take that here I think technically this actually needs to say that it does need to take ownership so that's my apologies we are going to wrap this thing up and return a stream with length now the last part is that when we hit this end part of the code here we're going to have no stream with length we're going to return false because this thing is not able to go from a normal stream to a stream with length because there's no length
the try catch is just in case just in case that someone breaks this rule where can seek is set to true and there is no length for some reason right if that's the case we are going to catch this I know this is pretty gross to see an empty catch block but what I was explaining to folks on the stream is that generally when I write a method if it says try out the front of it you should not have to look to catch exceptions as the caller this is just a safety net to say hey look if someone did this we could go you know carry on return false you might want to have a debug mode for this you could use conditional compilation to put this only in debug to throw exceptions and in release mode it's safe up to you I'm just
kind of showing you how I was walking through this but the point is that now that we have this method it does make it a lot nicer to go call so let's scroll back up and basically we can comment this code out because I agree that's pretty nasty looking but look how succinct this version is here so we would ask the stream coming back from the result to try and get get it as a stream with length if it works then we're going to continue on if it doesn't work we're able to go return we could go throw an exception ourselves but the point is that we have control without throwing exceptions because down here forcing an exception to be thrown in my opinion if that's not meant to crash the entire application that's not really going to be something you want and because you
cannot control what the server side unless you control the server that you're trying to download the files from if you can't control the server site they don't have to put the content length header on there they should that's the recommended thing to do but they don't have to and if you expect that that should crash your whole application then by all means do this kind of thing otherwise we need to make sure that we can continue on and handle that error case accordingly in my opinion this was a way to go clean everything up with the extension method and the other part that I wanted to come back to was that attribute and this is just a little bit of a nuance thing here but if I go ahead and I copy this attribute out so we don't have it there if we scroll back
up you can see on line 23 do something is saying hey look something's fishy about this stream with length if we hover over it Visual Studio is telling us possible null reference argument for parameter stream it's impossible if we go look through this code because we can see that basically we always populate this out parameter with not a null value as long as it's true but the compiler is not able to figure that out so we have to give it a little bit of guidance and that's what this attribute does here so if we put that back on and I scroll back up you can see on line 23 it now knows when I hover over it it says stream with length is not null here so we're giving the compiler a little bit more information about our API so in summary these are two
different API variations I made for a helper class this thing works if we go to ask for it off of the dependency container so this works with I service collection as long as you have an IH HTTP client Factory registered on the container otherwise you can go build this thing manually passing that in and if you wanted you could have a variation of this where you don't have to pass that interface explicitly but you could have a way to make your own HTTP clients inside of that as well the two big differences and variations here were one following the msdn suggestions around the stream class can seek and the length property which forced us down this path of having to provide those two pieces of information on different API calls Downstream or the second implementation was basically breaking that rule and saying hey look we
have a stream that's not seekable but I do happen to know the length so I'm going to provide you the length if you ask for it in my opinion this feels not so bad I would much rather be able to access the length if we know it and not force myself to basically pass two pieces of information everywhere if you enjoyed this and you want to see more on sort of this investigation in this type of thing I want to have some benchmarks ready to show you the memory usage between the two different header formats we looked at so when you're interested and seeing that and it's ready you can check it out here thanks and I'll see you next time
Frequently Asked Questions
What are helper classes for downloading files in C# and why are they useful?
Helper classes for downloading files in C# are reusable components that simplify the process of downloading files from the internet. They allow you to manage streams effectively, handle large file downloads without consuming too much memory, and integrate well with dependency injection in your applications.
Why is it important to use an IHttpClientFactory instead of creating a new HttpClient instance directly?
Using an IHttpClientFactory is important because it helps manage the lifecycle of HttpClient instances, avoids socket exhaustion, and allows for better configuration and testing. In my previous video, I emphasized that directly creating new HttpClient instances can lead to performance issues, so leveraging IHttpClientFactory is the recommended approach.
What are the implications of using a stream that does not support seeking when trying to access its length?
When using a stream that does not support seeking, attempting to access its length can throw an exception. This is problematic because many network streams do not have a length property. In my API design, I addressed this by creating a separate object that provides the length when available, allowing for safer and cleaner code without relying on potentially throwing exceptions.
These FAQs were generated by AI from the video transcript.