Batch Collections With MoreLINQ - How To Guide And Benchmarks

Name: Batch Collections With MoreLINQ - How To Guide And Benchmarks
Uploaded: 2024-08-17T02:42:13.0000000+00:00
Duration: 24 min 36 s

August 17, 2024

• 534 views

LINQ is a powerful language feature that we have in CSharp. We get a lot of power to process collections of data with a nice fluent syntax.

But have you ever found yourself wanting... more?

Do you need... MORE?!

Well, don't you worry, because MoreLINQ is here to save the day.

This is the first video in the MoreLINQ series where I'll be covering extensions from the popular MoreLINQ library. In this first video, I'll explain batching and we'll go over some performance benchmarks as well!

View Transcript

link is a popular language feature that we have access to in C and it helps us tremendously when we're working with I innumerables and different types of collections but what if you wanted more what if you needed more link hi my name is Nick centino and I'm a principal software engineering manager at Microsoft in this video I'm going to introduce a series on the new get package called more link one of the maintainers of this package approach means said it would be an awesome opportunity to give it a little bit more coverage because link is so widely used and if you go check out this package has millions of downloads as well now there are tons of different extension methods and things that you can use from this package so I can't obviously cover them all in one video but we're going to start this video off with batching if that sounds interesting remember to subscribe to the channel and check out that pin comment for my courses on dome Trin and with that said let's jump over to visual studio and check out batching from more link so as you can see I'm using more link here this is the library that we're going to be talking about and I'm using version 4.3 more link has been adding extension methods over time but of course a c and net revolving as well there have been extension methods that have been included into what we have access to by default so there are things that get added into more link and then all of a sudden maybe it's a couple years later we just get them for free as part of net but there are still tons and tons of extension methods that we don't get for free and they're included in this package so with that said batching is going to be what we're looking at in this video and to start things off I'm going to walk through this very brief example and then I want to dive into some performance characteristics if we wanted to go implement this ourselves batching is going to take a sequence so a collection that we're working with and it could just be an innumerable right so if we go look at the batch method here it is an extension method that works on I inumerable you don't need a materialized collection like an array or a list necessarily it could be from an iterator something that's generating different data for you and in this case what I'm doing is I just have this array declared here on line three and then I'm going to be batching it by three if you haven't really thought through what batching might mean maybe this is brand new to you the idea behind batching is that we're going to break this up into different smaller pieces and because I'm using three here it should be taking this first batch of three and then this next batch of three and then this next batch of three and then the final batch that we'll have won't be of three because there's only one left but it will be one element in the final batch and you can change the batch size right so if you did a batch of one for example the batch size being one means we're just going to take one single element at a time and if you were to make the batch size 10 that means or I guess yeah it is 10 in this case so because there's 10 numbers I didn't start at zero I threw myself off with the batch size of 10 you're going to get one batch and it's going to have all of the numbers in it so to walk you through with a size of three for the batch what I'm going to do is just briefly explain this four each Loop is going to go over this result if I hover over it you can see that the result is an i inumerable of integer arrays that means that the integer arrays each one of those is a batch and in this case we're going to have a sort of an i inumerable of those like we see very often in link we end up getting I innumerables as the result of the different link methods we're calling and that allows us to kind of get this pipelining effect so that you can you know you could do batch and then you could do select on here and you're still going to be operating on an i inumerable as you continue this on so we get an i inumerable of arrays of integers and just wanted to kind of call that out that it's going to be an I inumerable but each element within that enumerable is an array of integers so if we think about the code we have here if we wanted to go print out every number it makes sense that we're going to need two for Loops right because we're going to have one that walks through all of the batches and then this inner for Loop so that's going to be from line 9 to 10 that's going to walk through each number that's in the batch so hopefully that makes sense the way that this code works is we're going to just put uh each batch the numbers in it across a line and then we're going to have a new line to go for the next batch so if I go run this again 10 numbers batch size of three let's check it out before we move on this is just a reminder that I do have courses available on dome train if you want to level up in your sear programming if you head over to dome train you can see that I have a course bundle that has my getting started and deep dive courses on C between the two of these that's 11 hours of programming in the C language taking you from absolutely no programming experience to being able to build basic applications you'll learn everything about variables Loops a bit of async programming and object oriented programming as well make sure to check it out all right if we check out the results here we can see that we have four lines and the first batch is going to be 1 2 3 so we might have guessed right and then we have 456 7 89 so each line itself is a batch and then finally we have 10 on the last line and there are no other numbers that get printed with 10 because it the last one we can't make a batch of three numbers when there's only one left very simple example but we want to go check out the benchmarks now because I think that's going to give you some other interesting thoughts and I didn't want to just leave it very simple with here's how you go batch even though it's pretty cool because I think I've probably written code that does batching many times and I've never thought to just go make an extension method and reuse it so kind of cool that we have access to that but let's jump over to the benchmarks I am going to be using continue to build out this video series with other uh more link uh extension methods and things that we can use I can just keep adding more things into here I'm just starting this one off with batching but if you're not familiar with benchmark.us that we want to go run because we only have one in this case not very exciting but as I build it out we'll have others to pick from the batching benchmarks that I want to look at I'm picking different collection sizes because I think it's really important to see when you're dealing with a very small number of items versus a very large set of items what kind of characteristics do we start to see right does it kind of scale linearly do we have some weird cutof point where we have some differences showing up and then I'm going to look at batch sizes as well and if you haven't used you multiply these together to kind of build a matrix of the different combinations of things we're going to be working with here we will get batch sizes of one even though uh we'll have you know a collection size of 10 and 10 million and we'll also have a batch size of 1 million in some cases and only 10 things in the collection right so what do we expect to happen in that case so we'll go through these Benchmark results after but I just want to walk you through how this is set up because I do think it's important that I don't just show you a handful of results and you go that's cool Nick I don't care maybe you do care but I want to show you how to do this so that you have the tools you need to go Benchmark your own stuff right I think there's other lessons to be learned along the way so I have this Global setup for my benchmarks here this is just going to create the collection you can see innumerable range I'm going to make it an entire array so we will have a materialized collection uh in some cases when we go to run these benchmarks of up to 10 million items in them so we're going to go make that it's just going to be an array of numbers that are basically all Zero by default right so not that exciting but it doesn't really matter in this case because we're just trying to break up the array into different pieces so um what I'm going to do is and I'll explain each method a little bit more in detail but I'm going to compare ma manual batching but using a materialized collection so when I was explaining how batching worked from more link I said it's going to be an i inumerable of integer arrays this one one will be uh an array of arrays so we're going to have a collection of collections the next one down here is going to be manual batching but streaming so it's a very naive approach to trying to build what more link is doing so when we go to look at manual batch streaming uh implementation we'll see that I'm basically trying to go do my own version of batching thrown together very quickly no optimization attempts and it'll be interesting because when we go to check out this next one here that does more link and using batching should we get the exact same results did I happen to implement it the exact same way or did they manage to do it faster and have a lower memory footprint which reminds me if I scroll back up here you can see that I did add the memory diagnoser attribute on here so when we run these benchmarks we can see what kind of memory we're allocating to go down a little bit lower and look at some of the implementations here I guess I did say that it's going to be an array but that's not actually the case uh so this is going to be a list internally so manual batching streaming is going to have an IAD only list here the only reason I did that was just to kind of take a shortcut because I'm resetting the collection inside here you'll see that I go through here and then by the time I get to here I'm just making a new list with the batch size so this creates uh the batch itself and then I yield back the batch so this sounds kind of weird because I think sometimes when we're thinking about building iterators it's kind of like a single Dimension that we're going through but I need to go create a batch and then when I'm done building the batch I need to go yield that whole batch back up to the collar so sounds kind of weird but then I'm just going and recreating this I guess technically I could go make this an array I'm just kind of thinking about this on the fly so uh this could be something like an integer array um like that and then I wouldn't add okay no that's why it's see I'm debugging this on the spot and I shouldn't do that but the reason that I'm not doing that and it would be pretty easy to go fix is I just have to keep track of how big the batch is which is kind of interesting but if you think about the first example I showed in this video when we were walking through batch size of three that last batch was a size of one right so if I used an array and it was a fixed size I would need to indicate that that final batch needs to be a different size so it's very doable but it's kind of cheating just to use a list and make it quick so like I said this is a quick and dirty way to do it it's not optimized just threw it together this last part takes care of if we have a final batch that hasn't been uh yielded back so we will go do that the next one is going to be materializing it so this is where I was saying uh I think a little bit earlier I mentioned it's going to be an array of arrays it's going to be a list of lists which is fine very similar it's not going to be that big of a deal but again it's just because I'm taking shortcuts you could absolutely build this with arrays not really that big of a deal same concept but instead of yielding you'll notice this is not an I inumerable right and that means it cannot have yield in here the reason I wanted to walk through this is again because I've made other videos about showing you the performance characteristics of iterators versus Collections and I just wanted to be able to illustrate some pros and cons that we have with doing that so what we should see is that we have a lower memory footprint when we are basically streaming data with iterators versus fully materializing them now with the batch so each thing that gets yielded back that is going to be a materialized collection but we are streaming back batches in some cases and in this case we're fully materializing all of them and then returning that whole thing back so just to kind of elaborate on what that does now the more link one uh if we go up to this part right here this more link batching I don't have like I'm not looking at the code for that I'm just calling batch so one of the advantages with using more link is you don't have to go write these other methods and hopefully they've done a good job optimizing it so in this case because these are going to be yielding things back I'm just keeping it consistent with iterating over the entire set do we need to do this necessarily no but I wanted to make sure that I just had something that would walk through all of the numbers and like I said just for consistency as we go through all of this that's enough blabbing about the code you probably want to see The Benchmark results so instead of making you sit here for 15 minutes I'm going to go get those generated and then we'll walk through the numbers together my video editor will do a great job and make sure it's all cut out just for you all right the results are in and fair warning get ready for a wall of color so when we jump over to look at the Benchmark results we have here you are going to see a whole lot of numbers of course and the columns unfortunately didn't get printed out properly so they're kind of wrapped around so it's a little bit misleading so as we go through these I just want to highlight The Columns that I want us to be focused on so the first two columns after the name of course are going to be the collection size and then the batch size right next to it okay so collection size then batch size the next one is going to be the mean so this is going to be our performance number that we're looking at in terms of the runtime so this column this sort of like 1 2 3 fourth column over this is going to be one that we're very interested in and the other column that I want us to be paying a lot of attention to is this final column which is going to be the number of bytes that are allocated obviously there are more uh numbers in the middle here I'm not saying that they're not important but the ones that we want to focus on are the ones that I just called out at the beginning of our data sets that we're looking at here we are going to be dealing with a smaller collection size and then kind of stepping through the batch sizes as they increase starting with a collection size of 10 which is our smallest really this is just trying to highlight that if you're working with small sets of data because I think a lot of the time when we think about performance characteristics we're we're trying to push to extremes right we're saying okay if I want to go load in all these records and stream all this data what's going to happen but we also have scenarios where we're dealing with small amounts of data and maybe we have small amounts of data but it's a hot path in our code so I do think it's important to look at even small data sets so when we start talking about a collection size of 10 really the only paging opportunity that's going to break things up is when we have a batch size of one we're going to have 10 different batches in that case but all of the other ones should really result in a single batch that comes back right so just wanted to kind of call that out but if we have a look in every single case here in the beginning so if we look at the performance numbers uh on the in the mean category we can see that we have 54 NS uh for basically every single entry for more link so it is significantly faster than my naive approaches I was kind of expecting this I didn't go through their code and see the optimizations and stuff they made I certainly did not try and optimize what I wrote but basically a whole order of magnitude faster in the first case which is treating every element as its own batch the next case where we are basically doing just one big batch because the batch size is bigger than the collection they were still the exact same runtime right so 54 NS again still a whole order of magnitude faster once we go beyond this you can see that the performance of my code was significantly worse right so in that case they're basically three orders of magnitude faster than mine which is pretty embarrassing for having a naive approach but still and then you can see it gets even worse uh or I guess from their perspective even better for performance but it doesn't stop there because if we jump all the way to the last column and look at the allocation again the column header is missing but the last columns allocation you can see that they're an order of magnitude better in terms of bytes allocated than I was in basically the first two scenarios and then again after that they're significantly better they don't increase their allocation footprint even though we are changing the batch size so you might be wondering well Nick how is that possible like why is it that my implementation the the size was going up so much and that's just because technically the small like I'm going to use air quotes here small optimiz ation I was using was setting an initial collection size so technically that's doing some pre-allocation there so in this case uh it looks like that was just really not working so maybe something I could do as an optimization like I mentioned was using arrays just using one single array allocating it once in at least some of the situations where I'm yielding it back so that might have been better versus making a whole new list maybe I could have kept one list and cleared it each time so it had the same capacity a bunch of different things that could be explored there but like I said mine did not attempt to optimize just wanted to get something in place let's go a little bit further and see if we're going to start moving through this a little bit faster now that we walk through one in detail but when we start looking at the Thousand collection size right you can still see that uh in the first case where we are basically treating every single element as its own batch right because a batch size of one every element is his own batch more link was you know quite fast an order of magnitude fast F than mine what's interesting is we jump to the next one and we look at the performance when the batch size was the same as the collection size my implementation did reasonably well especially compared to more link which is interesting so you can see here there's you know 3,000 NS and I was you know three and a half uh in the the one materialized implementation I had so kind of interesting very comparable if we go look at the allocated column so that would be over here um I'm just trying to this is a very long row I just want to double check so we go check this out they definitely like they're half the allocations that I have and if we go back to the previous one right so the row that was right above we can see that they just completely completely decimated mine in terms of allocation but the thing that I wanted to call out here from the allocation site is that they're very consistent in mine it kepts it kept changing in theirs they're very consist consistent if we go back to the performance column um basically they continue to be very fast it's it's again very consistent with their performance but mine is not scaling as we keep going up in terms of the batch size mine keeps getting worse I suspect that because I'm not doing more iterations like I'm not scaling based on the iterations here when the batch size is increasing I suspect there's something going on with allocating a list and trying to uh give that the initial size so probably some type of overhead that's happening there that's my speculation jumping down a little bit further like I said we'll keep going faster and faster the runtime of my manual batching materialized one is awful it's absolutely awful if we look here if we compare it my implementation where we're materializing the entire set versus just streaming it's an order of magnitude faster to stream back the different batches it's an order of magnitude faster and theirs is an order of magnitude faster than mine so they're definitely doing way better than me which is totally cool the reason I'm highlighting this as well right is like it's not wrong to roll your own solutions to things definitely not you know there's considerations if you want to go add a dependency on a whole package sometimes at companies you're not even allowed to do that because they have constraints around that many reasons why you could or why you may not go include a package so again it's not right or wrong but when you have a team of people that's dedicated at to making something really good you can see that it ends up paying off a lot of the time right so again if we jump over all the way to the allocated column It's relatively consistent after the first block here like mine was pretty rough but if we jump down and kind of scan with our eyes a little bit lower mostly pretty comparable in terms of allocations which is interesting and then and the final set they have again half the allocations that I did so mine doubled in the end but theirs remained consistent which is very impressive overall though the performance once we started getting into this territory theirs was faster every time so there's this faster but not orders of magnitude like we were seeing before which is interesting in fact actually this one that I had when we were doing the materialized one interestingly enough the performance of mine was faster you know this next one compared to theirs is almost the exact same kind of like how this one is too but like mine is a little bit faster kind of snuck in which is unexpected from my perspective if we look at this one let's kind of jump across all the way here I guess I did allocate a little bit less uh memory as well so not sure why it's a bit unusual that would it kind of feels like an outlier to me because everything else does not show this pattern so I did manage to allocate a little bit less memory and I did manage to go at least in one of the cases a little bit faster so I'm kind of interesting there if we jump down even to the next set right where the batch size is is that the same the batch size is not the same so still doing batching I guess it is the same no I can't count there we go the batch size is not the same and if we go across theirs is still more performant you know in the this column here that's the best speed and over in the far right column for the allocations they there's a still better I have like one outlier so far my my money still on more link for batching at least and if we go kind of go to this last bit again my performance for doing a materialized implementation is is just terrible it's significantly faster to go stream it when you have such a disproportionate amount of individual batches that you have to yield back so that's sort of one thing I want to highlight here that's not really specific to more link just kind of this uh design approach in general but even so theirs was much much faster if we quickly scan through the numbers theirs is very consistent in fact it's kind of interesting it almost speeds up as as we're going through which I guess that makes sense there's way more iterations when you have to go yield back all of these little batches so I guess that does make sense it's speeding up but overall very consistent from their perspective I just wanted to double check if I have any others that happen to beat theirs out and no I think out of all these benchmarks unless I missed it as I was going through and you're watching and you noticed I had the one set of benchmarks that went faster and I think it's a fluke so I don't really want to trust it if we check the memory column actually this one that I had down here if we look we can see that I had a little bit less for allocations but theirs was faster the difference in allocations is not really that uh significant the one above I also had less allocations right so these two numbers are smaller uh than the third one but theirs was faster in fact this number here is a little bit greater than this number here for allocations but it was half the runtime I mean you're trading a tiny bit of memory here but significant amount of uh runtime performance so overall these benchmarks definitely suggest that they are quite memory efficient which is very impressive performance it basically blows mine out of the water in every single case so again just want to highlight if you're relying on a team of people that's building a specialized set of tools if they're supporting it which these folks are it's definitely been uh downloaded a tremendous amount and highly supported I think you're going to get a good result so overall that's going to be batching it's from more Link in this case and it's just giving you the ability to take a collection or a stream of data and break it up into smaller chunks and that's going to be based on your batch size but that's going to wrap this video up I hope you found it interesting and there are going to be more videos in this series so when those are ready you can start watching them up here thanks and I'll see you next time

Frequently Asked Questions

What is MoreLINQ and how does it relate to LINQ?

MoreLINQ is a library that extends the capabilities of LINQ in C#. While LINQ provides a set of standard query operators for working with collections, MoreLINQ adds additional extension methods that offer more functionality. In this video, I focus on one of those features: batching.

How does the batching method in MoreLINQ work?

The batching method in MoreLINQ allows you to break a collection into smaller chunks based on a specified batch size. For example, if you have a collection of ten items and you set the batch size to three, it will create batches of three items each, with the last batch containing the remaining items. This method works on any IEnumerable, so it doesn't require a materialized collection.

What were the performance results of using MoreLINQ's batching compared to my own implementation?

In my benchmarks, MoreLINQ's batching was significantly faster and more memory-efficient than my naive implementation. For instance, MoreLINQ consistently showed lower runtime and memory allocation across various collection sizes and batch sizes, outperforming my manual batching methods by a considerable margin.

These FAQs were generated by AI from the video transcript.