Iterator Benchmarks That Shocked With Unexpected Results

This article is the follow up to my previous write up sharing potential pitfalls of iterators and materialized collections, but I’ll admit it… When I initially set out to go write this post on iterator benchmarks and contrast them with materialized collections, I was quite confident about what I’d find. In fact, I was so confident in what I’d expect that I coded up all of the sample code and thought I’d record a video on the results as soon as they hit the console output. I pressed the record button, brought up the console window, and realized I couldn’t explain half of what I was seeing.

Realistically, this was a way more interesting outcome for me. Putting together an analysis can be boring if you know what to expect every step of the way. The results were only half what I expected, and the remainder was truly a big shock for me when investigating these iterator benchmarks.

Companion Video!

YouTube player

Benchmark Setup

We’re of course going to be using BenchmarkDotNet for our benchmarks, and you can find all of the code for these over at GitHub. To start, we need an entry point hook for our single Benchmark class that will be defining the permutations of scenarios that we’d like to run. This will be relatively basic as follows:

var config = ManualConfig
    .Create(DefaultConfig.Instance)
    .WithOptions(ConfigOptions.DisableOptimizationsValidator);

var summary = BenchmarkRunner.Run(
    typeof(Benchmarks),
    config);

if (!summary.BenchmarksCases.Any())
{
    throw new InvalidOperationException("Benchmarks did not have results.");
}

Next will be the portion of the Benchmarks class that actually defines the permutations we’d like to have our scenarios run against:

[Orderer(BenchmarkDotNet.Order.SummaryOrderPolicy.Method)]
[ShortRunJob(RuntimeMoniker.Net70)]
[MemoryDiagnoser]
public class Benchmarks
{
    public enum ImplementationScenario
    {
        Iterator,
        PopulateAsList,
        PopulateAsROCollection
    }

    private int[] _dataset;

    [Params(
        1_000,
        100_000,
        10_000_000)]
    public int NumberOfEntriesInDataset;

    [Params(
        ImplementationScenario.Iterator,
        ImplementationScenario.PopulateAsList, 
        ImplementationScenario.PopulateAsROCollection)]
    public ImplementationScenario Implementation;

    [GlobalSetup]
    public void Setup()
    {
        // seed this for consistency
        var random = new Random(1337);
        _dataset = new int[NumberOfEntriesInDataset];
        for (int i = 0; i < _dataset.Length; i++)
        {
            _dataset[i] = random.Next();
        }
    }

  // TODO: all the benchmark methods go here
}

The code above will setup a source array for us to work with that will have 1,000, 100,000, or 10,000,000 random integers depending on the benchmark run. We use a seeded Random for our benchmarks, but the scenarios we’re going to be looking at shouldn’t actually be concerned with the values of the integers. Next, we have one of three scenarios that will be selected to exercise and we can see that code as follows:

    private IEnumerable<int> GetResultsUsingIterator()
    {
        foreach (var item in _dataset)
        {
            yield return item;
        }
    }

    private List<int> GetResultsUsingListAsList()
    {
        var results = new List<int>();
        foreach (var item in _dataset)
        {
            results.Add(item);
        }

        return results;
    }

    private IReadOnlyCollection<int> GetResultsUsingListAsReadOnlyCollection()
    {
        var results = new List<int>();
        foreach (var item in _dataset)
        {
            results.Add(item);
        }

        return results;
    }

The first two methods show an iterator and a materialized collection variant. The third method is also a materialized collection, but we’ve used an IReadOnlyCollection<T> and we’ll come back to this as we dive into the examples. A hint is that I wanted to help make sense of what I was observing in the results of iterator benchmarks!

LINQ Any() Results are as Expected!

I’ll start us off with one of the iterator benchmarks that I was expecting, and that will be calling Any() on the result of our functions that are being tested. This Benchmark code is as follows (and can be found here on GitHub):

    [Benchmark]
    public void LinqAny()
    {
        _ = Implementation switch
        {
            ImplementationScenario.Iterator => GetResultsUsingIterator().Any(),
            ImplementationScenario.PopulateAsList => GetResultsUsingListAsList().Any(),
            ImplementationScenario.PopulateAsROCollection => GetResultsUsingListAsReadOnlyCollection().Any(),
            _ => throw new NotImplementedException(Implementation.ToString()),
        };
    }

My expectation is that the iterator would be both faster AND use less memory than the materialized collection variant. This is because the call to Any() only needs the iterator to indicate that there is in fact at least one item and because iterators are streaming, there’s no allocation of results in a new collection. Conversely, we’d expect that the materialized collections are forced to fully enumerate before they return. This means that we get an entire copy of the dataset, blowing up our memory usage, and we pay the performance hit of full enumeration.

The results below demonstrate this when we look at the Mean and the Allocated columns.

BenchmarkDotNet Iterator Benchmarks Results for LINQ Any

LINQ Count() Results Start to Look Interesting…

The LINQ Count() method is where I started to get caught off guard. The Benchmark method for this is very much like the one we previously looked at, but we use Count() instead of Any():

    [Benchmark]
    public void LinqCount()
    {
        _ = Implementation switch
        {
            ImplementationScenario.Iterator => GetResultsUsingIterator().Count(),
            ImplementationScenario.PopulateAsList => GetResultsUsingListAsList().Count(),
            ImplementationScenario.PopulateAsROCollection => GetResultsUsingListAsReadOnlyCollection().Count(),
            _ => throw new NotImplementedException(Implementation.ToString()),
        };
    }

What I had hypothesized was that we’d observe a similar low memory allocation from the iterator because there was no extra materialization of a secondary collection. This checks out in the results below:

BenchmarkDotNet Iterator Benchmarks Results for LINQ Count

But what I found very peculiar was that the runtime performance for the iterator was not significantly faster than the materialized collections. In fact, it was sometimes slower if not roughly the same. Foolishly, I did forget that the Count() method from LINQ has an optimization that if you have a collection, it can check the count/length of it as a shortcut and not force enumeration. So while that would cut down runtime for materialized collections, shouldn’t the overhead of allocating that whole additional collection still incur some type of additional cost?

The results weren’t totally obvious here, but they start to get more interesting as we continue looking at the remaining iterator benchmarks!

Comparing Collection vs Iterator Benchmarks for Foreach

The code for this Benchmark is as follows:

    [Benchmark]
    public void Foreach()
    {
        switch (Implementation)
        {
            case ImplementationScenario.Iterator:
                foreach (var x in GetResultsUsingIterator())
                {
                }
                break;
            case ImplementationScenario.PopulateAsList:
                foreach (var x in GetResultsUsingListAsList())
                {
                }
                break;
            case ImplementationScenario.PopulateAsROCollection:
                foreach (var x in GetResultsUsingListAsReadOnlyCollection())
                {
                }
                break;
            default:
                throw new NotImplementedException(Implementation.ToString());
        }
    }

And if you’re wondering why I only went with a foreach instead of a traditional for loop with a counter, it’s because IEnumerable and therefore iterators do not have this ability. I wanted to keep the comparisons equivalent.

Let’s dive into the results of the benchmarks:

BenchmarkDotNet Iterator Benchmarks Results for Foreach

When we look at the results of our Benchmark output for foreach, the memory allocation in the Allocated column is on point with what I would expect to see. When we consider an iterator compared to materializing a whole collection, the iterator does not need to allocate an entire extra collection as it streams data.

However, the runtime performance under the Mean column is where I started to get really confused. And I should add, initially I did not have the third variation that has an IReadOnlyCollection<T> return type, so this was added in afterwards to help with my analysis.

Given that iterators are able to stream results, the foreach loop would result in one full enumeration over the dataset. However, materialized collections in this case (unlike LINQ Count() that we saw earlier) absolutely need to do a second full enumeration. The first is for creating the materialized collection and the second is for enumerating over it. How is it that the List<T> return type could be faster than the iterator in every single one of these scenarios? And the other brain teaser (that might give you the answer if you’re brushed up on your dotnet versions) is why is the IReadOnlyCollection<T> return type so much slower when the return type is the only difference?

The ‘Aha!’ Moment

Things weren’t adding up for me. I was sitting at my computer with a dumb look on my face, turning off my OBS recording because I just didn’t expect to see the performance metrics on some of these scenarios.

I recalled that recently we were very fortunate to get a wealth of performance improvements in .NET 7, and one of the things that stuck out to me was foreach performance on arrays and lists. We’re able to take advantage of readonly spans under the hood for these collection types, and the speed at which we can iterate is outrageously fast in comparison.

So this was my first hunch about why my performance data was unexpected! At this point, I had introduced the third variant you see in all of the iterator benchmarks output, which is the method that has an IReadOnlyCollection<T> return type. My hypothesis here was that if dotnet knows it can optimize many of these operations for a List<T>, will those optimizations fall off when it sees an IReadOnlyCollection<T> because it cannot take advantage of the spans?

When we reflect on the previous data for foreach, we can certainly see that the runtime performance of the IReadOnlyCollection<T> implementation tanks compared to the List<T> and the iterator solutions. So indeed, what I had been expecting to observe in all of these benchmarks originally appeared to be overshadowed by List<T> getting some awesome performance optimizations.

ToArray Variations Help Explain Iterator Benchmarks Observations

The final scenario that we’ll look at together is the LINQ ToArray() method. The code for this benchmark is on GitHub, and is as follows:

    [Benchmark]
    public void LinqToArray()
    {
        _ = Implementation switch
        {
            ImplementationScenario.Iterator => GetResultsUsingIterator().ToArray(),
            ImplementationScenario.PopulateAsList => GetResultsUsingListAsList().ToArray(),
            ImplementationScenario.PopulateAsROCollection => GetResultsUsingListAsReadOnlyCollection().ToArray(),
            _ => throw new NotImplementedException(Implementation.ToString()),
        };
    }

    [Benchmark]
    public void LinqTakeHalfToArray()
    {
        _ = Implementation switch
        {
            ImplementationScenario.Iterator => GetResultsUsingIterator().Take(_dataset.Length / 2).ToArray(),
            ImplementationScenario.PopulateAsList => GetResultsUsingListAsList().Take(_dataset.Length / 2).ToArray(),
            ImplementationScenario.PopulateAsROCollection => GetResultsUsingListAsReadOnlyCollection().Take(_dataset.Length / 2).ToArray(),
            _ => throw new NotImplementedException(Implementation.ToString()),
        };
    }

You’ll notice that I also included an additional flavor of this, which is where we attempt to take half of the results of the method using LINQ Take(), and then calling ToArray() on that result. To be clear, I recognize that doing this undermines the entire idea of having a materialized collection because we’ll simply get another iterator after calling Take(). However, I wanted to provide an example of code that is similar to things you might find in production code where people are using LINQ on data sets.

Let’s check out the results below:

BenchmarkDotNet Iterator Benchmarks Results for LINQ ToArray

As we’ve seen with all of the examples so far, iterators remain the champion for memory allocation. Again, this is because they do not need to materialize an entire collection before returning and being used. Conversely, both the other methods have this extra allocation overhead.

The performance characteristics of this seemed shocking to me once more. If ToArray() was using something to do with spans under the hood to crank out better performance for List<T>, why is it that the IReadOnlyCollection<T> was also beating out the iterator? And I believe this is because with both materialized collections, when we call ToArray() we know the size of the result array we’d like to allocate. When we have an iterator, we likely have to go through many resizes because the implementation of ToArray() cannot possibly know the total size until we finish enumeration.

What about the Take-Half example?

Well to me this highlights something quite interesting: If you design your code such that it’s optimized for performance by using materialized collections, if the caller of your code/API inevitably ends up using LINQ on it this could undermine a lot of what you were trying to achieve in the first place.

Aside from the small dataset scenario, the iterator approach outperforms the other two across the board. Significantly less memory ends up getting used and it ends up being more performant. All of the benefits we saw in the previous ToArray() example get lost once we layer LINQ onto it and force usage of an iterator.

So does that mean iterators are awful? Are materialized collections the best? Which should we use?

Summary of Collection vs Iterator Benchmarks

To summarize, let’s get a view of all of our collections and iterator benchmarks into one view:

BenchmarkDotNet Iterator Benchmarks Full Results

My takeaways:

  • In every scenario, iterators allowed us to allocate significantly less memory than materialized collection approaches. If memory is a concern, does that mean you need to use an iterator? Well it might help, but you could also write your code such that those materialized collections are never very large (i.e. write database queries that page smaller segments of data). So you have some options.
  • List<T> appears to have tons of performance benefits over iterators. If we’re using a foreach we get built-in span optimizations under the hood, and operations like ToArray() can take advantage of the known count.
  • IReadOnlyCollection<T> does not seem to be as performance as List<T> but knowing the count of a collection can be helpful for optimizations.

Personally, I may be going back to some of my data-fetching API designs and seeing if I can come up with a more balanced pattern. I really enjoy using the streaming API that is associated with iterators, but there may be performance left on the table that I could squeeze out.

Maybe the Frozen collection types coming in .NET 8.0 will have interesting properties!

author avatar
Nick Cosentino Principal Software Engineering Manager
Principal Software Engineering Manager at Microsoft. Views are my own.

This Post Has 8 Comments

  1. James Curran

    DO you think you could post versions of the results screen-shots which aren’t microscopic?

    1. Nick Cosentino

      Hey there,

      If you open it in a new tab you can see it in much better resolution. The scaling just happens to the image content doesn’t completely blow up in the normal text area. Apologies for that, but largely just the theme & styling on the site.

  2. James M Curran

    Fun fact you didn’t notice (or at least, didn’t mention): In the LinqToArray test for Iterators, the allocated size is the same as it is for the PopulatedAsList for the tests which don’t create a new collection.

    So, basically what we learned is that the `yield return` process includes a whole bunch more overhead than the simple IEnumerator for List, but using yield lets you avoid actually building the list. This is important because you rarely do just one thing iterator/collection being returned. Any() may be very fast for an iterator, but invariably, it’s used in an if() statement, the body of which further manipulates that collection (And, it Any()is truly the only thing done with the iterator/collection, then your API should be returning a boolean)

    I’d imagine that the majority of the time materializing the collection is spent reallocating as it grows. List has an initial capacity of 4 elements and doubles each time the capacity is exceeded, so for just the 10,000 items, it will reallocated (and move all current items) 12 times. It would be interesting to see the times with the list preallocated to the proper size.

    1. Nick Cosentino

      Great point. I think when I made the video associated with this (and perhaps not even recorded in final version, I cannot recall) that I anticipated some of the overhead was this resizing consideration. Under the hood it *has* to resize because there’s no way it can no the final size when using an enumerable.

      But yes, great point to call out for another benchmark!

  3. Enzi

    I believe the reason why List is doing so well compared to IReadOnlyCollection is because the return type of your function is List and not IList

    The recent performance improvements in .NET should affect both equally, the reason why List is benefitting more is, presumably because the compiler is using duck typing and calling its GetEnumerator() method [1], which returns a struct.

    I would guess that if you changed the return type to IList, it and IReadOnlyCollection would do equally well in the test. The fact that a struct Enumerator is used is not the sole reason why List is so fast I assume, as you only iterate once in the test. But since the concrete type of that Enumerator is known at compile time, a bunch of stuff probably gets inlined. If an interface is used, GetEnumerator() returns an IEnumerator, which isn’t known at compile time and thus can’t be optimized in a similar way (the JIT compiler might still be able to optimize, that’s beyond me, but I’d still bet this is the reason why List is so fast)

    [1] https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/List.cs,628

    1. Nick Cosentino

      I agree: it’s likely largely due to the recent performance improvements we received. I made the assumption here that it’s to do with span usage, but admittedly I did not look at the decompiled code or IL code, nor did I go look at all of the docs to go prove this. You may certainly be correct that IList also gets these sort of optimizations as well. I didn’t go through all interfaces, but thought it was an interesting enough data point to look at and touch on.

      Thanks for the comment!!

Leave a Reply