BrandGhost

WARNING: 2 Killer C# Iterator Bugs (And How To Prevent Them)

C# iterators are a powerful feature that leverage IEnumerable in C# as well as the yield return syntax. Iterators are lazily evaluated which can be really useful - but it can also be incredibly problematic. In this video, I'll walk through 2 scenarios that will make you think twice when you go to use the yield keyword in C# alongside your IEnumerable! Hopefully this will save you some headaches! Have you subscribed to my weekly newsletter yet? A 5-minute read every weekend, right to your inbox...
View Transcript
the reality is that people spend so much time trying to fix bugs like this so there's got to be a better way right in this video I'm going to be sharing some common challenges that I see with iterators in some of my recent videos I've been talking about iterators innumerables and different collection types and specifically I've been sharing with you that historically I used to really like to lean on innumerables and iterators because I like the streaming approach that they provide however due to a lot of challenges that I've seen working in teams where people aren't really familiar with iterators and this includes myself cuz I get caught up on some issues I've been moving to a paging approach instead before getting into some of these challenges it might be helpful to do a recap on the previous video that I did which talked about iterators enumerables and some API designs that you might want to consider so you can check that out right here and then come right back to watch this one also a quick reminder to check the pin comment to a link to subscribe to my free weekly software engineering newsletter all right let's go to visual studio to check out some code that's really going to demonstrate why there's some problems with this stuff all right on my screen I have a simple program that we're going to be looking at and it's going to start with dealing with a repository pattern so as you can see at the top here I'm just creating a new repository and then all we're going to do is ask the repository for data and print that data out to the command line now let's have a look at what this repository does so you can see inside here that I am creating a new connection Factory and I'm using some funny classes here so we'll look at these in just a moment but these are just dummy classes to demonstrate how some of this connection stuff works behind the scenes without actually having to go out to a whole database to show you how this stuff works on line 42 we can see that I have a get data method it is an i enumerable that's going to return this data record that I've created if we have a quick look at that data record it's really simple it just has a integer value and it's not really important I just wanted to have a custom type to show you and I'm going to have some console right lines that we're going to be looking at because they're going to tell a story for how this data is accessed so what we end up doing in here here is that we try to write to the console and then we're going to open up a new connection read that data and then we have a finally blog here where we write to the console that we're done getting the data there's a bit of foreshadowing here cuz you can see that I have a little fix commented out so yes there are bugs in this code that's on the screen currently we're going to try to fix them and then talk about some of the challenges that come with those fixes so we should check out what read data does so if I go into this we can see that I'm going to uh change the console color that again this is just going to help demonstrate when we're looking at the output that we want to call out when we're truly trying to do some access with this connection and the reason I want to call this out is because that's where some of the problems are going to occur when we're working with this example so I'm going to have the console as red when we are using the connection and we're trying to read the data from that connection I've left a comment here because you can see that I'm just yielding out these records and that's because I'm not truly using anything like Dapper Entity framework or even or even create creating a data reader to a SQL database this is all dummy stuff for now but I'm just simulating how this works if you recall from the earlier videos or just your own experience because we have this yield return here plus this I enumerable that means that this entire method is what's called an iterator and that's extremely important for this conversation because as I've been mentioning in the earlier videos I enumerable can be an iterator if you're looking at the implementation and you see yield return or it could be backed by a materialized collection when it's an iterator it has lazy behavior and when it's backed by a materialized collection that's like an eager approach and it's not lazy so if we unwind this call stack from read data we go back up we can see that get data is calling read data after we've opened the connection we're going to return the result from read data all the way back up to the top here right here on line two and then from there we print that stuff out to the console so far so good right well let's go run this and see what happens happens all right the story that we see being printed out to the console is that we're starting that get data method we end up opening a new connection we exit get data and then from there we start the iterator that is inside read data we can see that we're getting data 1 2 and three so that's coming from the top part of the program and then we have ending iterator inside read data all of that is red because we toggle the console going red inside the beginning of read data and then we put it back to normal at the end of read data so it might not be obvious what's wrong with this but it's a subtle detail and that's that we're not disposing our connection when the connection gets disposed we end up printing to the console that we are disposing that connection and I didn't show you that because I didn't want to give away the little surprise here but that's the bug that we're going to be fixing and you might have even noticed that by looking at the code I was showing you because there was no using statement when we're dealing with the connection so you can see that we open the connection here but what we probably want to see is at the end of all of this that we're closing off the connection in some capacity but that's super easy to fix so let's go back to the code and make sure that we put a using statement where it's needed all right so line 47 here is the spot that we don't have a using statement now I just want to show you two different variations that you can use a using statement so the traditional way that we can use a using statement is to wrap it around the part that we want to use like this so this is the connection that we're going to be using and then we have the curly braces that we're explicitly saying we want to have the lifetime for this is one way that we could go address this but the other way is this new format and it's kind of nice but it's kind of spooky for people that are used to having the curly braces and if we go to a simple using statement what we can see here is there's no more parentheses around this code and there's no more curly braces here but what this is able to do is when connection goes out of scope here just like it would have with the other curly braces we will get the dispose call implicitly at the right location so this syntax that you see right here is very similar to what I had with the curly braces explicitly provided So in theory this should go dispose of our connection and if I scroll back up just to show you pretend connection when it's disposed will say connection is disposing so now we have a proper using statement let's go run this and see how it fixed it and awesome as we can see on the screen we have connection is disposing that's great it means the using statement's working but wait a second we're opening a connection and then the very next thing we see is connections disposing then we're ending get data and then we're trying to read the data from that connection so wait a minute that's not going to work we can't possibly dispose the connection up here and then try to read from it later on that's going to throw an exception because the connection's not available quick pause because for some of you that might have been very obvious you might have expected that this Behavior was going to happen because you've dealt with iterators before but for a lot of people that are watching this and it's not your fault if you're not sure why that's breaking I'm going to show you how to fix it why it's happening but I don't think it's obvious to a lot of people until you've dealt with iterators and you're used to how they function and how they function is that they are a function they're a function pointer and that function is not evaluated until you go evaluate the iterator let's head back and see how we can fix this and just to be clear the issue is not that we don't have the curly braces in here I can assure you and what we're going to do is put this fix in right here that I left for us and I'm leaving in the broken code so that you can see what we used to have and how this is going to fix the issue what's going on as a solution here is that we're explicitly yield returning each individual item that comes out of read data we are trying to evaluate the iterator and then turning this method now into an iterator the difference in how this behaves is that it used to just return a function this read data was an iterator and then we're just passing read data as an iterator or a function function right back up through get data as the return type that means that this method used to complete nearly instantaneously and that's why the using statement was closing off and disposing the connection right away now what's going to happen is that when we open up the connection we're going to go into read data and then when it goes to yield return what it will do is surrender control back up the call stack to be able to provide this data and that means that we're going to be stuck inside of this forre loop as we're stuck in here that means that this using statement is not going to end up disposing the connection because the connection's still in scope we haven't left the method truly on the next iteration of the loop the same thing is going to happen where when we go to yield or turn this back up we're still stuck inside the loop technically and the same thing with the third iteration of this Loop because there's three items inside that read data but finally what will happen is once that third item has been yield returned back out we leave the for each Loop and finally this connection is going to go at a scope so that means this us us will then dispose of the connection so all that we had to do and it feels a little bit silly but once you understand that read data was just returning a function pointer and instead we need to go turn this entire method into an iterator to avoid that this starts to make a little bit more sense it's just a little bit clunky let's go run this and see how it fixed it all right we can see that we're opening a new connection then starting the iterator printing out each data item and ending the iterator inside read data then after that so once we have no more red text we can see that the connection is then disposing so truly we had the connection open the entire time this text was read and then the connections disposing after so this using statement is now executed properly because we turned that method into an iterator itself we had to force iteration down lower in the call stack instead of just returning out that I enumerable so that's a really common craft happy problem that I've seen with I enumerable and iterators having resources held open and closed too soon and stuff like that is just really problematic and it's not super obvious the debugging that goes into that if you're not totally sure how an iterator works is a little bit mind melting but if you're working with someone that has experience with it or you yourself have the experience and you just kind of slipped up it's not so bad once you know some of the patterns but I want to walk through one more kind of funny situation that I've seen with it ators and especially when it comes to user interface based applications let's head back to the code and I'll explain this one to you all right so I'm back in Visual Studio we're going to leave the code just as it was with the repository changes we made because that using statement is now functioning properly however you can see that I've changed the code at the top here and what we're going to do is make a very small layered application where we have the repository that comes from our data layer this would be in the lowest layer of our application then we have a service that can ref reference that repository and that service is where our business logic and application logic would exist it's going to be sort of this middle layer that we'd have in a layered architecture and then finally this UI control is going to have access to the service and if you're thinking about this that means that the UI control doesn't have direct access to the repository but it talks to the service which then has access to the repository but let's go see how these things work when we try to refresh the list contents in our imaginary UI application all right in the UI control you can see that I am taking a reference to that service just as I mentioned up above and when we look at refresh list contents here I'm not truly working with a UI application so I'm going to print stuff out to the console again to simulate what's going on this refresh list contents if you're not familiar with a lot of UI architectures that exist this kind of stuff needs to be able to execute on a main thread so there's lots of separate learning to go do on that but that's a pretty common pattern that you have a user interface thread that's going to do all the user interface painting for us and when we go to work with controls in the user interface this stuff gets done on that main thread so I'm going to mention that we're entering this method and that means we're on the main thread and what we're going to do is try to get that data from the service and then at the end we're going to say that we're ending this method but what does the service do well let's go check out the get data implementation it's pretty simple it's just a straight pass through I'm mentioning here that it's kind of nasty to be able to just pass this data dto this data transfer object directly back up usually what you end up doing in a layered architecture is that if you have something like a data record that comes from the lowest layer you might do some type of translation into a different type that gets passed up to the user interface and that way you don't have things that are database specific that are going to be referenced all the way in the user interface so the reason that this method looks super simple right now is because we're not doing any type of translation we're not working between different domains and joining stuff together we're simply just getting records and then passing them back up so in this example this looks a little bit silly because it doesn't do anything but this is a similar structure to a lot of layered architectures it's just that they're doing more work and we've already seen what repository get data does and we know that repository get data is working as we expect so let's go run this and see what happens when we're looking at the user interface all right we can see that we're starting the refresh of the list contents on the UI thread then from there we're getting data and we have all of the similar stuff that we were just looking at so the disposing is happening all correctly and then we end the refresh of the list contents this looks okay because we're getting the data that we expect right here that's highlighted but the first line says we're already on the UI thread and that would mean that if we're trying to get data from a database we're technically blocking the U y thread with all of this code right here and that means that we would get some type of ghosting or the application would lock up even if it's very brief we might notice something as a user because your cursor wouldn't be able to interact with the user interface that you're touching and then at the end we have this end refresh list content so everything in between here is running on the UI thread and that's definitely not something we want to do so I'm going to show you what I generally see people try to do to fix this kind of thing and we'll see what happens all right so the solution that everyone tries to go to is using async a away we're going to do this stuff on a background task and that means we're going to get it all to work without blocking the user interface so let's check out this Silver Bullet that we got going on here we right to the console that we're doing this async method on the UI thread just like the other one did where we were entering refresh list contents but then what's different is that we're calling all of this code on a task and then we're going to await it so I'm just adding extra console right lines to say that we're inside that task we're going to get that data out saying that we're ending that task and then return it back up so by the time we're finished awaiting this that means that this variable for records is going to have all of the data that we want then from there I'm just mentioning hey look we should have got all that stuff done right so let's go print it out to the console and then end the method with a little marker that says that we're done so just to mention again what we're doing is putting all of that data access onto this task and running that before we go to do stuff on the UI thread back up online for I'm just going to call a wait and then change it to be a sync for this new method that we've just added and let's go see how this fixes things for us all right we can see that we're entering the method and then we have the task starting and then ending right away but wait we should have been able to get all that stuff done on the task though right but it looks like all of it's still happening afterwards what the heck's going on we put it all in a task ask why did it finish instantly and the answer is lazy iterators and that fix that we looked at before where we were putting it in a for each Loop and yielding it back up isn't going to fix it for us this time this is inherently broken the way that we've done it what we need to do to be able to fix this one is materialize the data by calling to array to list or otherwise forcing enumeration while we're still on that background thread so while we're still in the task we need to be able to say something like to array and that way we can get that done on the task so hopefully on the background thread then from there the UI thread will have a fully materialized array list or other type of collection that we can go use to refresh the list box or whatever control we're working with but it's just not obvious for people and I'm not trying to say that like I'm superior to other people because this totally screws me up all the time and I've been programming in c for like almost half my entire life and I'm getting kind of old now the reality is that people spend so much time trying to fix bugs like this so there's got to be a better way right and this is primarily why I move to paging because I've seen so many situations like this where people don't really understand how iterators work it's really cool that they can be lazy evaluated but you run into issues with resources being closed off too early or resources being closed off too late in fact the most recent thing that pushed me overboard to switch over to paging was that I was trying to get fancy with having a lazy loaded cached list of records and I was holding stuff in an i inumerable which meant I ended up having database connections held open in a cache and because of how it was designed I held open all of these connections that were never going to get closed off unless someone fully enumerated the result set I thought the design was super cool but I totally didn't think about the fact that being an iterator it wasn't going to close off that connection until I left that 4 each Loop just like I showed you in the code a little bit earlier I had fixed the issue of the connection being closed off too early but now it was only going to ever close off once everything was finally fully materialized and if you never asked for the last value in that result set it was going to stay open for forever so without going back to the code to change it all to paging because I don't want to waste your time just trying to walk through all of that the big difference is that instead of putting I inumerable and yield return in everything what I do is change everything to be I readon list instead in some situations I read only collection might make sense I like using I readon list a lot of the time because the collections I'm usually returning back are arrays or lists anyway and both of those already Implement I readon list some algorithms I write if there's cues or Stacks or some other type of collection that does does not Implement I readon list I will usually go to I readon collection and if I can't then I'm using I inumerable but I enumerable doesn't work so well with paging but having something like a collection that has the results fully available and having account that's really helpful with paging so what we would be doing is instead of saying hey just give me this result set and I will enumerate it when I'm ready you're basically saying going down the call stack to each layer materialize this so at the very bottom instead of creating a lazy evaluated iterator it goes here's an array or here's a list and you'll see it as an I readon list and that means as you go back up the call stack everyone's just getting that I readon list all the way back up you still get a lot of great control just like you would with a streaming API because you can give it the page size and the offset so you're controlling how many records you want to have come back up and you're able to kind of shift where you're looking at in the results said I had mentioned on earlier videos that doing some of the link things like any or first were really cool with iterators because you didn't have to get the whole result set but if you wanted to do something like that you could just ask for a single record by using a page size of one and then see if anything comes back but another really big reason that I like moving over to paging is when it comes to the performance there's been a lot of great enhancements inet recently and depending on when you're watching this video maybe those enhancements will show up with iterators as well but at this time of recording they're just not there yet so you can check out this video next and see the performance difference thanks and I'll see you next time

Frequently Asked Questions

What are the common challenges with C# iterators that you mentioned in the video?

In the video, I discussed how iterators can lead to subtle bugs, particularly around resource management. A key issue is that connections may be disposed of too early or held open too long, which can cause exceptions or resource leaks. This often happens because of the lazy evaluation nature of iterators, which can be confusing for developers who aren't familiar with how they work.

Why did you decide to switch from using iterators to a paging approach?

I switched to a paging approach because I've encountered numerous situations where iterators caused issues with resource management. Specifically, I found that holding database connections open indefinitely due to lazy loading was problematic. By using paging, I can ensure that data is fully materialized and resources are managed more effectively, which reduces the risk of running into these iterator-related bugs.

How can I fix the issue of a connection being disposed of too early when using iterators?

To fix the issue of a connection being disposed of too early, I recommend turning the method that returns an iterator into an iterator itself by using 'yield return' for each item. This way, the connection remains in scope until all items have been yielded back to the caller, ensuring that the connection stays open for the duration of the iteration.

These FAQs were generated by AI from the video transcript.
An error has occurred. This application may no longer respond until reloaded. Reload