APIs That Don't SUCK - Collections & IEnumerables In DotNet
December 25, 2023
• 725 views
Writing methods in C# and dotnet is easy, but writing APIs that don't suck takes some thought. This is especially important when dealing with iterators, IEnumerables, and collections! How can we make sure that we're building APIs in dotnet with these collection types that don't suck?
In this video, I'll explain how I approach working with iterators, IEnumerables, and collections for my APIs. By the end, you should have some considerations for how you can design your own APIs in dotnet that don'...
View Transcript
but honestly this has solved every single one of my eye inumerable issues as of late in this video I'm going to be talking about designing apis that use either I innumerables or Collections and we're going to look at this from the perspective of both parameters and return types as well in a previous video which I'll link right up here and you can check that out and then come right back I was talking about the usage of innumerables and how that solved a problem for me in the past but then it unfortunately led to more problems with the teams that I was working with and how people were using innumerables in particular this had to do with iterators and the fact that they are lazily evaluated I'm going to be diving into how I choose whether or not to use enumerables readon lists or I readon
collections in the apis that I'm designing before we get into it just a quick reminder to check that pin comment for a link to my free weekly software engineering newsletter all right so the content of this video is looking at API design and in particular we are going to be looking at enumerables iterators and collections so we're not focused on interfaces and objectoriented programming class passes and all that kind of stuff for this conversation although some of the concepts in terms of how I'm evaluating the weight of the decisions I'm making are probably going to be applicable so you might be able to transfer them to other things when we go through these decisions we want to take the perspective of the caller to try and weigh the pros and the cons so we want to think about how to make it easy for the
caller right so we want to make sure that we can get them to provide information to us easily through our API but we also want to make it very obvious about what the API is going to be doing now what we have to balance this out with is the perspective of the person implementing the API and often this is going to be us for these conversations so in terms of what data we're taking in are we able to do everything that we need to do with that data coming in effectively I mean the minimum bar that we want to set for ourselves is that we have enough data but doing it effectively is also important otherwise we might want to consider redesigning the API in the first place so that we can do a good job for ourselves now I'm going to jump over to
visual studio but I just want to remind you before we start looking at code that these are just some of my opinions and how I've arrived at them over years of working in C my intention here is not to tell you that there's only one right way and there's wrong ways and none of that this is all just my opinion my experiences and I want you to take away from this video the different ways that I'm trying to analyze and come up with pros and cons and that way that you can make your own decisions I'm not here to tell you that my way is right I want you to learn how you can do this analysis and make informed decisions for yourself after all if I tried to make up any rules for the things that I'm about to tell you I am sure
that you're going to find exceptions to those rules and I would agree with you there's always going to be situations where the rules are broken let's check out some code all right I'm going to try out this split screen approach so that you can see me talking with my hands as I'm trying to explain the different parts of these apis because there's not going to be a lot of code that we're looking at but there's going to be a lot of explanation about that code the first type of thing that we're going to look at is passing in collections or innumerables as parameters into methods and afterwards we're going to look at returning innumerables or collections so the first part parameters when I think about historically how i' like to approach this again if we're looking at the parameter being passed in I like to
be able to say okay look if I only need to iterate over things just give me an enumerable right and if I need to iterate over things but I also need a count associated with that then I probably want something like an i readon collection and then there's going to be situations where I need a little bit more where enumerating and having a count isn't enough but some algorithms that I write where having a access to a particular index within a collection is going to be valuable and in that case I want an i readon collection so those three types of things are usually what I'm looking at and I mean we could talk about dictionaries as well but I'm going to park that for another conversation this is mostly just about collections of items now you might have noticed that I mentioned innumerables and
two readon variants and it's not very often that I write methods that are something like populate X where I'm passing in a mutable collection and that way my method is responsible for populating something and either adding things or it could even be removing things from a collection I generally don't write methods like that it's not that they're wrong but I often find that I would much rather feed the data back out of the method versus operating on something inside of it that's my personal preference so what we're going to be looking at here are I read only variants of things and innumerables as well now you might have noticed on the screen that I've left some comments in this method and these are going to be some of the goals that I want to talk about for one we're focused on passing parameters into the
method the first one is kind of what I just talked about where we're preventing mutation right so we want to be able to use I read only variations of things instead of just a straight up list or a straight up array and that way we are signaling to the caller that we don't want to go modify anything and in fact we're enforcing that by having read only variants of it yes you could do something a little bit weird like casting these things inside and trying to get back a mutable collection but if we ignore that because that's a little bit weird we should be signaling to the caller of the API that we're just trying to do good things I also mention here that we want to make it easy for the caller to give us that data right so if we think about collection
implementations in C essentially everything implements an I inumerable so that would be the easiest thing for a caller to be able to give us they could just give us an array a list a CU a stack you name it if it's a collection in C it's going to be implementing I inumerable and that way I inumerable as the base type that we take in so this parameter right here as I inumerable would be the easi thing for a caller to provide us so I enumerable is also readon just like the I readon list and I read only collection so that's cool but it doesn't have a count and it doesn't allow us to index either but there's also a couple of other drawbacks to using an i enumerable as the parameter that we're taking in and I did mention that it's the easiest one for
a caller to be able to give us that's totally true but these two points here I think are really important and they've bitten me in the past and that's why I've tried to make changes to how I am creating a lot of my apis and I want to touch on these so the first part is that we don't want to materialize a ton of data if we are working in a system and we take in an i enumerable if at some point we need to be able to take a snapshot of that innumerable be able to take out elements from that and effectively materialize that into to something that we're holding in memory we need to be very careful about that if there's no constraints and it's kind of difficult to enforce constraints at compile time about an innumerable being passed in if we ever
said like two array or two list on that innumerable we could run into a lot of problems if that iterator if that enumerable was backed by an iterator and that ended up materializing something from a database and had tons and tons of records and the other thing that is really common that we want to avoid is that if we're working with a set of data and let's say that we did only want to iterate over it and we could use an enumerable what we don't want to do is iterate over it multiple times if we are dealing with an iterator and that's because if it's just an innumerable and it's backed by an iterator that would mean that if it's going out to a database to do a query it would effectively go do that query multiple times for every different block of enumeration that
you need to do and I want to show you this really quickly with a little bit of an example here so I did include these two for each Loops here right so if we have a 4 each Loop over items and we're doing stuff inside of it if we have another loop that's doing a 4 each over that enumerable if that was a an an enumerable that was backed by an array or a list it's not really a big deal but if it's something that is an iterator and it's backed by something that's going to a database or doing other types of long running operations this will in fact go do that long running operation ation multiple times and then you might be going well hey we can just avoid that by materializing it and you're right but going back to the previous point if
we just were to materialize the stuff ahead of time so we materialize it before running these if we don't know the source of that data it might not be safe for us in the context of this method to be able to materialize it so I think if you wanted to do something like that you would probably need to indicate to the caller that that's going to happen but I just think that there's a better way to do that and I think that the better way to do that is just to give control back to the caller to materialize their enumerable when they need to that way is the implementers of this method we don't have to guess at all so what I wanted to point out is that many times now instead of using an I inumerable here I will probably just stick to an
i readon collection if I don't need to index into anything or an I readon list if I really need to operate on particular indices a lot of the time I do avoid using an I enumerable even as the input unless l i just plan on doing a simple filtering implementation over a set of data and I don't have any plans to do any materialization inside of it so for me I still do use all three I enumerable I read only collection and I readon list however I inumerable is starting to take more and more of a back seat because I am trying to enforce my caller to do the materialization so I don't have to guess about when that's going to happen another quick note is that if you're thinking about where this type of code is implemented if you're working in a backend service
or something like that maybe not much to worry about however if you're writing UI applications and you're writing methods that take in I enumerables if the I inumerable is passed all around in your application and makes it to the user interface level if you go to materialize that on the UI thread and you are not sure where that I enumerable comes from I have seen instances I wish I hadn't but I've seen instances where that kind of thing has passed all the way through a data layer application layer and makes its way to the UI and then someone goes cool give me all of the items in that collection or the innumerable they don't know it's a collection or not and what happens well it starts executing a query against the database in the UI on the main thread guess what the user experience is
like for that now the downside with doing something like forcing people to materialized collections is that if they can't guarantee that it hasn't been materialized in other places we might be doing that a little bit needlessly when you have an i inumerable interface that is backed by a materialized collection already it's not really obvious to us and unfortunately a lot of the time we go well I don't know I can't be sure let me just call two array on it materialize it I'll have a copy but that can get kind of expensive so just something to think about in your sort of application stack and how your methods are working together for me a lot of the time because I am sticking to this IAD only collection or I read on list kind of um parameter passing it's uh reducing the instances where I materializing
enumerables and making extra copies of things when I don't really need to but hopefully that provides a little bit more you know Clarity and thought process around what you should be passing in to a method so let's go look at the other side of this now if you did go back and watch my previous video on innumerables and iterators and how I think they end up getting misused a lot I wanted to call out that innumerables for me for a long time were a Saving Grace there were so many instances where I was dealing with these return types right here that were something like an I read only list or just a list or an array and what would happen was that the call-in code would ask for this collection and uh underlying code underneath the implementation would go materialize a ton of data and
pull it back up and it meant that as the caller you didn't really have a lot of control if you wanted a little bit you were forced to just get everything so I inumerable was a really nice option to kind of have this streaming API you could just ask for as much as you need so for a large majority of my C career I've been using I enumerables as a return type and when I use them in my own development if I'm just working on my own projects I don't run into that many issues but definitely in bigger projects and in team settings I see that this kind of thing ends up getting misused and it's not because I'm better than other people I just think that it's really common for us to slip up and not realize that we might materialize something multiple times
or iterate over data sets multiple times it's just an easy thing to mess up so when I think about some of the goals for return types it's kind of similar to what I think about when I'm passing parameters in I don't really like to deal with mutable things and I have a little bit of a code example here on the screen here from line 70 to 73 that talk about a mutable example that I just don't really think sits well with me and I try to design apis that avoid this so I mention here that with a mutable list so if we think about this being just a list coming back right so just a list like this what happens technically so you can't yield back either but what could happen is that if we were to say innumerable apis that's the name of this
class then we call the method get something someone could call add two right on that list and if we weren't returning back a copy of data so like a new instance of a list that we wanted to pass back up but instead we were referencing our own list that we internal to this class what could happen is that we now have a situation where someone is able to mutate that list on the outside so for example if it looked like this this effectively gives someone direct access to whatever our list is and you might have you know an interface that has multiple different implementations maybe some people are doing a good job of implementing this and I think you could have situations where people are doing a really good job of returning a copy of the data but it's really easy for someone to come
in and make their own implementation and in that particular case they go well I have direct access to what I need let me just pass it back up if they're not doing something like this and providing a copy that means that they're effectively getting access to the underlying collection something like two array or two list and in this particular case I should have just left it as two list we can't cast an array to a list these types of things make a whole copy and that way the call if they were to do something like add two to it that's fine because they're going to have their own copy but I mentioned in the past I had some challenges with list being this type of thing that effectively didn't give a lot of control to the caller it meant that they had to get the
whole collection of whatever they were asking for back and that didn't really sit well with me but there's two weird things going on here the streaming API of an i enumerable is really powerful and that it gives us control but really dangerous in that people don't know if it's lazy evaluated if it's eagerly evaluated because they can't see what it's backed by and as a result there's a bit of a hybrid approach that I think makes a lot of sense here and for me that means going to I readon list I readon list in recent times has been my go-to as return types for collections on methods the first reason cuz we were just talking about mutability is that this thing is immutable that's really handy for me in preventing that situation that I was just explaining and then you might be asking well okay
well why don't you just use an IAD only collection isn't that better well if we think about some of the goals that I mentioned earlier if we have the ability to return something that meets the I readon list interface that means that our caller is able to index into it and get a little bit more information so technically we're doing them more of a favor on the providing part so the output of the method by giving a more specific implementation if we're able to if the algorith them and thing that you're designing doesn't really lend itself well to that because you need to use things like cues and stacks and you don't want to be able to index into them that's okay you could use a i readon collection but a lot of the time what I end up doing is I'm creating arrays or
lists anyway behind the scenes and I'm going to return that all back up but I am masking it with an i readon interface another reason to use I readon list here instead of an I read only collection is performance and we'll see that in a later video in this series another awesome benefit to using an I readon list instead of an I inumerable right here is that it allows us to avoid a pretty dangerous problem that can occur with I inumerable so watch right to the end and after I'll show you how to prevent that with iterators and I enumerables and why that's dangerous but one of the main reasons that I have switched to using an I readon list as the return type instead of an i enumerable is because I have moved a ton of my apis to a paging approach now this
doesn't work all of the time because I don't need every API I write to have paging support but honestly this has solved every single one of my ey inumerable issues as of late so let's look at paging next so with paging if I look at the API that we're using here I want to be able to get an I readon list back that's exactly what we were just talking about but the other thing that we need to be able to do is provide an offset and a limit and if you're familiar with writing SQL queries and I feel like that's a little bit of a dying thing because everyone's using stuff like Entity Framework and that's totally cool but I feel like a lot of people aren't spending time writing SQL queries and if I say offset and limit that might sound kind of weird
to them but in SQL queries there's a lot of this notion of being able to page the data that you're asking back for from the database and from working with databases a lot and being able to pull a ton of data back from databases this is where I got the idea to start structuring a lot of my apis this way I mentioned that my Approach with paging is a bit of a hybrid so we get something that's almost like streaming so we're not getting single items at a time but we are getting pages that we can ask back for at a time and that means that when it comes to materialization we also have some control over that so it's not as much control as an i inumerable where we're getting just one at a time I mean you could technically do that by setting
the limit of the page to just one every single time but we're also not forcing a materialization of something that we're not able to handle so we get the control and we get enough granularity that for me a lot of the time feels really powerful so let's walk through a little bit of code that kind of highlights how this paging might work so let's pretend that we have something that's coming from a database and here I'm just declaring a single array so it's really simple obviously but this could be something that is uh a query going to a database and pulling back a 100 records a million records whatever it happens to be now what you should be doing is in your query if this is truly where it's coming from you would write your query such that it's taking in these paging parameters so
this is a pretty bad example because I'm showing you an array but talking about some database Concepts as well but if you can avoid it you want to be doing all of your paging at the database you don't want to pull back way more records than you need and try to page them in memory just a heads up because that's not exactly what this code is showing you I just think that that's really important to understand because you want to do all of that work elsewhere not on your machine that's doing all the pro processing the databases are really good at handling that for you let them do it and then you only bring back what you need over the network and you're saving a lot of resources that way so in this particular code I have the paging built into this for Loop but
how this would look normally is if you had the database doing that paging for you you'd only have a certain number of Records come back and I just wanted to highlight that you can see I'm creating a new collection here on line 85 and this goes back to what I was talking about a little bit earlier we are not in a situation where we're just sharing some internal data right so we're not using a list that we're just holding as a field inside of the class and then of letting other people manipulate that by returning a mutable collection as the return type we are always going to be making a new one and I wanted to mention here as well that depending on how you're using a limit if you can be confident that that limit is not going to be something that's set really
high so for example if you set the limit to be the maximum integer value and then you let your database pull back the maximum integer value of records that might be a little bit too big and if you were to do that and then try to declare a list with that initial capacity you might be a little bit upset but I have some situations where I enforce that there's a limit of say a th000 items or something that's safe this is going to be context dependent and then that way I can always initialize my collection to be the right size that it needs to be and if I'm able to do that I often just use an array instead of a list from there we're going to popul that collection and then return it so really simple Concepts here but the three things that I
wanted to highlight are that we're going to be doing some type of paging and if you're dealing with the database or some other external data source as much as possible try to force the paging to go beyond that system the next part is that we have a new collection that we're creating that's going to be what we're returning back out and it's a new collection so that we are not altering any internal data in our class and we are using that return type that is a read only version of it so we are giving a reduced scope of what that collection is able to do and then finally we're just going to be populating that collection with the records that are coming back and if you were dealing with stuff in memory then taking this kind of approach totally could make sense so I'm trying
to explain how this works with databases but also showing you how it works in memory so to recap what we talked about as parameters going into methods a lot of the time I am using I readon list and I read only collection I enumerables are still things that I use but generally it's less and less frequent if it's something that's just going to be filtering and I'm very confident that I'm just doing one iteration over the enumerable and I'm never going to be materializing and I never need to do a second pass then I might opt for an I enumerable just to be a little bit more LAX as return types I have gone from using I enumerable to I readon list and that's because a lot of the time I am using a paging implementation if my API isn't set up well for doing
paging I have to consider whether or not I able to just stream the data back or make a copy and make it really obvious that I have an I read only collection to deal with now I do want to start talking about performance of some of this stuff cuz I think that that's really important but I mentioned a little bit earlier that I wanted to talk about a dangerous case with iterators that we can experience with innumerables moving to a paging approach can avoid this for us and if you want to see more information about this dangerous case you can check out this video next when it's ready thanks and I'll see you next time
Frequently Asked Questions
What are the main differences between using IEnumerable, IReadOnlyCollection, and IReadOnlyList in API design?
In my experience, I prefer using IEnumerable when I only need to iterate over items without needing a count or index. If I require a count, I opt for IReadOnlyCollection. When I need to access items by index, IReadOnlyList is my go-to choice. This way, I can signal to the caller what operations are safe and expected.
Why do you recommend avoiding mutable collections in API return types?
I recommend avoiding mutable collections because they can lead to unintended side effects if the caller modifies the collection. By returning IReadOnly collections, I ensure that the caller cannot change the data, which helps maintain the integrity of my API.
How does paging improve the performance of APIs that return collections?
Paging allows me to control the amount of data returned from an API, which is especially useful when dealing with large datasets. Instead of pulling all records at once, I can return a manageable subset, reducing memory usage and improving response times. This approach also encourages efficient querying at the database level.
These FAQs were generated by AI from the video transcript.