Does MoreLINQ Zip Outperform LINQ Zip? Let's Benchmark Them!
September 6, 2024
• 232 views
linqmorelinqprogramminglinq querylanguage integrated querylinq performancelinq tutorialyield returnyield return c#dotnetc# linqlinq c#learn linqintroduction to linq in c#linq in c#.netienumerableienumerable c#more linqdotnet linqlinq how tolinq tutorial C#linq tutorial for beginnersc# linq tutorial for beginnerslinq for beginnersbenchmarkdotnetbenchmarkingbenchmark dotnetbenchmarkdotnet tutorialdotnet performanceLINQ Zip
A duel between two methods for zipping collections in CSharp!
But a challenger approaches:
Our very own naive implementation of LINQ Zip!
In this video, I'll walk through the BenchmarkDotNet benchmarks for comparing the LINQ zip method and MoreLINQ ZipShortest and ZipLongest methods. We'll even see how my naive implementation does against these!
Buckle up and get ready to see how optimized my code is!
Spoiler Alert: It's not.
More Spoiler Alert: You'll still be surprised.
View Transcript
if we want to combine collections together in C and put them side by side we're able to use something called zip but we also have the more link nougat package which has zip shortest and zip longest so which one's going to be more performant hi my name is Nick centino and I'm a principal software engineering manager at Microsoft in a previous video which I'll link up here if you haven't seen it already I was walking through how we can go Implement our own version of zipping collections together and then we could see that we could call the same link zip method or more links zip shortest and zip longest all of these things were basically comparable in terms of their functionality except for zip longest taking the longest of the two collections when combining them however we didn't get to dive into the performance Benchmark
so in this video I'm going to walk through some benchmark.us just a quick reminder to subscribe to the channel and check out that pin comment for my courses on dome train let's jump over to visual studio and check out these benchmarks on my screen I have a class called zip benchmarks and this is just going to be setting up all of our ZIP benchmarks as you might have guessed so I am going to be comparing arrays and enumerables because I think it's important to understand that we might have optimizations under the hood depending on your collection type we are going to check the global setup out first and we can see that I'm just instantiating either two arrays or two enumerables and I'm just going to be doing it to the collection size but because we're dealing with two collections or two innumerables I wanted
to make sure that we could play around with having different collection sizes and that's because when we're doing this I do want to make sure that we can check out the zip longest versus zip shortest Behavior so there are going to be a little bit of differences in the results because when we do compare zip shortest and zip longest zip longest just has different functionality but I do want to have it mixed in so we can see the results alongside the others technically zip shortest and links zip method are doing both a zip shortest implementation in a naive implementation that I wrote in the previous video we are also doing a zip shortest implementation so we just step through two I enumerables at the same time until we hit the end of the first one whichever one is shortest now I like walking through these
benchmarks because I have been known to make mistakes and I do always want to make sure that I'm putting out proper information so as I step through them if you notice anything that I have messed up please let me know I'm happy to go make a follow-up video to correct these the first one that we're going to be looking at is more link zip shortest so you can see that I'm calling zip shortest here and we're going to be doing this on an array so the first set of all of these is going to be dealing with arrays and then more link zip longest so Source array One Source array 2 you can see that pattern as we step through all of these the resulting type that we're going to be making is just a tuple of the first and the second element so in
this case if I scroll back up you'll notice that these are of integer type so the resulting type will be a tuple of two integers before we move on this is just a reminder that I do have courses available on dome train if you want to level up in your C programming if you head over to D train you can see that I have a course bundle that has my getting started and deep dive courses on C between the two of these that's 11 hours of programming in the C language taking you from absolutely no programming experience to being able to build basic application you'll learn everything about variables Loops a bit of async programming and objectoriented programming as well make sure to check it out next up we will have the link zip method I am going to mark this as the Baseline because
I think this is what people are going to be most familiar with when they're working in C it's just built in we don't have to have any other Nate packages and you don't have to go write your own so in my opinion this is a good Baseline to have but just calling that out as we go through all of these results next up we're going to see almost the exact same thing except we are dealing with innumerables and again just a reminder I wanted to have this comparison because we may have other optimizations we could be making behind the scenes if we're dealing with concrete collections I personally have not looked at any of the source code for any implementation of zip shortest zip longest or just links zip method so I'm interested to see if we'll notice anything just by looking at the results
of the benchmarks and finally rate at the the end I added in some calls some benchmarks to our manual one so the implementation that I made and this is just going to be called manual zip shortest you'll notice that I'm doing it again for enumerable and array and then I'm passing in the enumerables here and the arrays here a quick reminder that if I jump into this just by pressing F12 to go to the definition this is my naive implementation of zipping the shortest so I'm just going to ask for both the numerators and then step through with this while loop and just yield return whatever my call back method was if we jump back to the benchmarks this call back method is just making a toule of both of the integers so same type of setup across all of these it's either going to
be an innumerable or an array and then again I am going to be changing up the collection sizes so with that said I think it's time to go run these benchmarks and of course you can thank my video editor no one's going to make you sit here and wait for all of these to finish and now I have a screen full of awesome colors and we're going to be looking at the output of the benchmarks so the first one that we're going to look at is working with small collections of equal sizes and again just a reminder when we're dealing with zip longest it will have different Behavior than zip shortest once the collection sizes are uneven just an important call out to make so I don't want you to focus on the absolute values of these numbers we do need to take them with
a grain of salt as as we go through so when we're dealing with equal size collections these two implementations shortest and longest should have the same behavior so that means we should be able to scan this column or the ratio column for sort of the top performing ones if we check out the mean column if we check out the Baseline so array link zip it has 9 almost 110 NS of execution time if we scroll through the link uh ZIP with in numerable is technically faster than the array implementation so if I highlight that here and this is the array implementation technically the enumerable one was faster than the array uh it's also interesting to note that my two at the bottom here these are my implementations the manual zip shortest these were actually on bar which is kind of funny because I didn't put
any real optimizations in just kind of wrote it for the previous video to have an example how it works so kind of neat that these came out to be pretty performant but we can see that the more link implementations if I back up a little bit the more link implementation so here here these ones are actually slower comparatively uh to the others in some cases almost twice as long so that's a little bit shocking that's not really uh great to see and it's a little bit more obvious when we go to look at the ratio column right so a ratio of one for our Baseline is the array link zip and if we scroll through yeah there's a couple that are faster than the Baseline right here so that's the link zip method on enumerable and my manual zip method on enumerable but the other
ones uh overall for more link are not as performant here if we check out the allocation column and in particular the ratio uh again for more link it's not looking so great that's almost twice as much allocated we aren't talking about not many bytes but still that's not really great um so kind of interesting to to see that big of a difference that early on here let's go down to the next set um these ones are going to be kind of interesting because you'll notice that there's some ridiculously big outliers and that's going to be the zip longest so I just wanted to show you that uh when we go through all of these it's not going to make uh a big deal to start looking at these numbers here that really stand out because it's not fair to compare them here I really wanted
to compare them when the collections were the same I think it's important to compare the link zip method and more link zip shortest though when we do have uneven counts and in fact even with my manual zipping one so to check out here this is the link implementation with an i inumerable and this is the same one with a array I am pretty impressed that the innumerable implementation's faster uh as we go through this I mean it's still kind of dealing with a base collection size of 10 but uh I just figured that when we have an array and a like a materialized collection that there'd probably be some optimization under the hood to check for that and then do something faster with it so looks like not but again I haven't compared uh or checked out the source code overall this is very comparable
to what we saw obviously with the outlier being the longest so I think it's okay to move on from here and start looking at some larger collection sizes let's start with this one down here actually because these collection sizes are the same it makes the numbers a little bit easier to start look looking through if we check out the more link comparisons here these ones are very comparable with each other right so it doesn't matter if you're using zip longest or zip shortest almost the exact same in terms of runtime and that's because their behavior in this case is almost the exact same right it's supposed to result in really just having a collection sizes equal to the count of both of these so then we look at the array link zip so here and this is for the I inumerable so both of these
are the built-in ones they are significantly faster basically half the time than using more link so that's interesting but also I'm kind of impressed that my manual implementation is really keeping up uh I again I just didn't think that would be the case cuz I didn't do any explicit optimizations in fact if we jump all the way over to the allocation ratio column my implementation has a ratio of one which is the same as the array link zip so that's pretty neat we do start to notice again that if we're checking out the builtin methods so our Baseline is array link with zip if we check out the innumerable version of that we do have about 10% extra overhead in terms of allocations that's interesting important to note but again we're dealing with pretty small numbers here I don't think it's really the end of
the world but if that's a consideration of yours then you can see the difference here let's go a little bit lower make some bigger jumps down to this area down here so these are going to be with 1 million elements again the pattern that we're starting to notice here is that all of the more link operations are generally about twice as slow like I have said earlier not checked out the source code but I still find it pretty uh surprising that it's about twice as slow it's just not what I would expect and then if we check out array link zip so this here and if we check out the innumerable version both both of these uh quite fast and again my implementation is keeping up with both of them so pretty impressive in both cases their implementation like the built-in one is continuously like
faster than mine but just by a little bit so again pretty neat and we see the same type of pattern over in the far right column for allocation ratio now I do want to point out as well because it might not be obvious again for folks that aren't very familiar with iterators versus materializing collections this is a collection of a million items right we have two collections that have a million items I'm not allocating them inside of the Benchmark that's done outside but you'll notice that the number of bytes allocated that's not for having a full materialized resulting collection really important note this is one of the reason why iterators are pretty powerful is that you don't need to materialize a full collection to start using it otherwise if we did need to do that having a million items like by definition we would need
over a million bytes to be able to hold the results there something to think about if you're dealing with you know applications where you're really focused on memory being able to stream this kind of stuff can be pretty powerful so let's jump down to the final set here down at the bottom again this is sort of the upper limit with 10 million if I counted correctly and across the board more link is about twice as slow similar patterns that we have for link zip both on arrays and enumerables as well as my naive implementation I do think that's kind of surprising that I could keep up and kind of surprising that moreal link was about twice as slow so perhaps there's something that they're doing under the hood in moral link that I'm not accounting for in these benchmarks maybe there's extra type checking for
different optimizations I haven't looked at so that's entirely possible but these are the Benchmark results that I'm getting so maybe if you are using more Link in your application currently using zip uh shortest in particular because zip shortest will be comparable with link zip directly maybe you want to try benchmarking not using more link zip shortest depending on if you know if there's a different scenario that I'm not accounting for here but it looks like you can cut down your time in half essentially so I think that's pretty impressive otherwise you could try rolling your own it seems Seems like it's uh pretty performant and maybe there's other optimizations you can think through but uh overall least amount of effort in general is just using the link zip method if you need zip shortest Behavior I hope you found that useful maybe for some of
you this was not totally surprising because you'd be expecting the built-in stuff to be more performant uh I kind of do in general just because there's a whole team focused on making these things fast all the time but I was kind of surprised the more link zip ones were a little bit slower overall uh it's not really what I would expect to see like sort of like a twice as slow uh Benchmark here so interesting for me at least and and I hope you find this useful so if you're interested in seeing more Link videos especially where we start to compare these benchmarks and the different behaviors you can check out these videos next when they're ready thanks and I'll see you next time
Frequently Asked Questions
What is the main focus of the video regarding MoreLINQ and LINQ Zip?
In this video, I focus on benchmarking the performance of MoreLINQ's zip shortest and zip longest methods compared to LINQ's built-in zip method. I aim to determine which implementation is more performant when combining collections.
Why did you choose to compare arrays and enumerables in your benchmarks?
I chose to compare arrays and enumerables because there might be optimizations under the hood depending on the collection type. Understanding how each performs in different scenarios can provide valuable insights into their efficiency.
What were your findings regarding the performance of MoreLINQ compared to LINQ?
These FAQs were generated by AI from the video transcript.I found that MoreLINQ's implementations were generally about twice as slow as LINQ's built-in zip method. This was surprising to me, as I expected MoreLINQ to perform comparably, but the benchmarks showed a significant performance difference.
