BrandGhost
LINQ Set Operations in C#: Distinct, DistinctBy, Union, Intersect, and Except

LINQ Set Operations in C#: Distinct, DistinctBy, Union, Intersect, and Except

Removing duplicates, finding overlaps, and computing differences between collections are everyday programming tasks, and LINQ set operations in C# give you a precise, readable way to express each one. The classic operators -- Distinct, Union, Intersect, and Except -- have been in LINQ since .NET Framework 3.5. .NET 6 extended each of them with a By variant that accepts a key selector, eliminating a whole category of custom IEqualityComparer implementations. This article covers all of them, with real-world domain examples and clear before-and-after comparisons so you can see exactly what changed and why.

The Domain Model

Most examples use a small Product, Tag, and Employee model:

namespace DevLeader.LinqSetOperations;

public record Product(int Id, string Name, string Category, decimal Price);
public record Tag(string Value);
public record Employee(int Id, string Name, string Department, string Email);

These records give us natural scenarios for deduplication, overlap detection, and difference computation.

Distinct: Removing Duplicates

Distinct() returns each unique element from a sequence. For value types and records (which have structural equality), it works without any extra code:

namespace DevLeader.LinqSetOperations;

int[] scores = [90, 85, 90, 72, 85, 100];
IEnumerable<int> unique = scores.Distinct();
// 90, 85, 72, 100

For reference types without custom equality, you must supply an IEqualityComparer<T>:

namespace DevLeader.LinqSetOperations;

// Products where equality is determined only by Name
public sealed class ProductNameComparer : IEqualityComparer<Product>
{
    public bool Equals(Product? x, Product? y) =>
        string.Equals(x?.Name, y?.Name, StringComparison.OrdinalIgnoreCase);

    public int GetHashCode(Product obj) =>
        StringComparer.OrdinalIgnoreCase.GetHashCode(obj.Name);
}

IEnumerable<Product> catalog = GetProductCatalog();
IEnumerable<Product> uniqueByName = catalog.Distinct(new ProductNameComparer());

Writing IEqualityComparer implementations for this pattern gets old fast, which is exactly why DistinctBy was introduced.

DistinctBy (.NET 6): Key-Based Deduplication

DistinctBy(keySelector) keeps the first element for each unique key value, discarding subsequent duplicates -- no IEqualityComparer required:

namespace DevLeader.LinqSetOperations;

// Before .NET 6 -- custom comparer or GroupBy workaround
IEnumerable<Product> uniqueOld = catalog
    .GroupBy(p => p.Name.ToUpperInvariant())
    .Select(g => g.First());

// .NET 6 -- clean and expressive
IEnumerable<Product> uniqueNew = catalog.DistinctBy(p => p.Name);

// Distinct by category -- keep the first product per category
IEnumerable<Product> onePerCategory = catalog.DistinctBy(p => p.Category);

The "first element wins" semantics match the original sequence order. If you need a specific element per group (e.g., the cheapest product per category), reach for GroupBy with a result selector instead.

DistinctBy with Composite Keys

Composite keys work via anonymous types, just like GroupBy and Join:

namespace DevLeader.LinqSetOperations;

IEnumerable<Employee> employees = GetEmployees();

// Unique (Department, Name) combinations -- handles transfers and name collisions
IEnumerable<Employee> uniqueDeptName = employees
    .DistinctBy(e => new { e.Department, e.Name });

Anonymous types provide structural equality out of the box, so this compares both fields correctly.

Union and UnionBy

Union returns all elements from both sequences, eliminating duplicates across the combined result. Think SQL UNION (distinct) rather than UNION ALL:

namespace DevLeader.LinqSetOperations;

IEnumerable<Tag> webTags    = [new("csharp"), new("dotnet"), new("blazor")];
IEnumerable<Tag> mobileTags = [new("csharp"), new("maui"), new("mobile")];

// Records have structural equality, so "csharp" appears once
IEnumerable<Tag> allTags = webTags.Union(mobileTags);
// csharp, dotnet, blazor, maui, mobile

For reference types without structural equality, supply an IEqualityComparer<T>. Or, in .NET 6+, use UnionBy:

namespace DevLeader.LinqSetOperations;

IEnumerable<Product> storeA = GetStoreAProducts();
IEnumerable<Product> storeB = GetStoreBProducts();

// Before .NET 6 -- IEqualityComparer or intermediate projection
IEnumerable<Product> combinedOld = storeA
    .Concat(storeB)
    .GroupBy(p => p.Id)
    .Select(g => g.First());

// .NET 6 -- UnionBy deduplicates on the key selector
IEnumerable<Product> combinedNew = storeA.UnionBy(storeB, p => p.Id);

UnionBy keeps the element from the first sequence when both sequences contain an element with the same key. This is important when the two copies may differ in non-key fields -- the left sequence wins.

Intersect and IntersectBy

Intersect returns elements present in both sequences. Elements that appear only in one sequence are excluded:

namespace DevLeader.LinqSetOperations;

IEnumerable<Tag> requiredSkills = [new("csharp"), new("sql"), new("azure")];
IEnumerable<Tag> candidateSkills = [new("csharp"), new("python"), new("azure"), new("docker")];

// Only skills that are both required and present
IEnumerable<Tag> matchedSkills = requiredSkills.Intersect(candidateSkills);
// csharp, azure

IntersectBy (.NET 6) intersects on a key rather than full element equality:

namespace DevLeader.LinqSetOperations;

IEnumerable<Employee> teamA = GetTeamAEmployees();
IEnumerable<Employee> teamB = GetTeamBEmployees();

// Before .NET 6 -- create a HashSet of keys, then filter
HashSet<int> teamBIds = teamB.Select(e => e.Id).ToHashSet();
IEnumerable<Employee> sharedOld = teamA.Where(e => teamBIds.Contains(e.Id));

// .NET 6 -- IntersectBy on the key field
IEnumerable<Employee> sharedNew = teamA.IntersectBy(teamB.Select(e => e.Id), e => e.Id);

The second argument to IntersectBy is not a key selector for the inner sequence -- it is the set of key values from the inner sequence. You extract those keys explicitly: teamB.Select(e => e.Id). The first argument's key selector (e => e.Id) determines which field to compare against those values.

Real-world scenario -- finding products that appear in both a promotional catalog and the main catalog:

namespace DevLeader.LinqSetOperations;

IEnumerable<Product> promoProducts = GetPromoProducts();
IEnumerable<Product> mainCatalog   = GetMainCatalog();

// Products eligible for promo that are also in the main catalog
IEnumerable<Product> eligiblePromo =
    promoProducts.IntersectBy(mainCatalog.Select(p => p.Id), p => p.Id);

Except and ExceptBy

Except returns elements from the first sequence that are NOT present in the second. This is the set difference operation:

namespace DevLeader.LinqSetOperations;

IEnumerable<Tag> allFeatureTags   = GetAllFeatureTags();
IEnumerable<Tag> deprecatedTags   = GetDeprecatedTags();

// Tags that are still active
IEnumerable<Tag> activeTags = allFeatureTags.Except(deprecatedTags);

ExceptBy (.NET 6) does the same but compares on a projected key:

namespace DevLeader.LinqSetOperations;

IEnumerable<Employee> allEmployees      = GetAllEmployees();
IEnumerable<Employee> terminatedEmployees = GetTerminatedEmployees();

// Before .NET 6
HashSet<int> terminatedIds = terminatedEmployees.Select(e => e.Id).ToHashSet();
IEnumerable<Employee> activeOld = allEmployees.Where(e => !terminatedIds.Contains(e.Id));

// .NET 6
IEnumerable<Employee> activeNew =
    allEmployees.ExceptBy(terminatedEmployees.Select(e => e.Id), e => e.Id);

Pattern: finding products in the main catalog not yet listed in any promo campaign:

namespace DevLeader.LinqSetOperations;

IEnumerable<Product> mainCatalog  = GetMainCatalog();
IEnumerable<Product> promoProducts = GetPromoProducts();

// Products with no promotion yet
IEnumerable<Product> unpromotedProducts =
    mainCatalog.ExceptBy(promoProducts.Select(p => p.Id), p => p.Id);

Custom Equality for Value Types

When the default equality isn't appropriate -- for instance, comparing products by name case-insensitively -- all set operators accept an optional IEqualityComparer<T>:

namespace DevLeader.LinqSetOperations;

public sealed class CaseInsensitiveTagComparer : IEqualityComparer<Tag>
{
    public static readonly CaseInsensitiveTagComparer Instance = new();

    public bool Equals(Tag? x, Tag? y) =>
        string.Equals(x?.Value, y?.Value, StringComparison.OrdinalIgnoreCase);

    public int GetHashCode(Tag obj) =>
        StringComparer.OrdinalIgnoreCase.GetHashCode(obj.Value);
}

IEnumerable<Tag> sourceTags = [new("CSharp"), new("DOTNET")];
IEnumerable<Tag> filterTags = [new("csharp"), new("blazor")];

// Except with custom comparer -- "CSharp" is excluded because "csharp" matches
IEnumerable<Tag> remaining = sourceTags.Except(filterTags, CaseInsensitiveTagComparer.Instance);
// DOTNET

The singleton pattern on the comparer (Instance) avoids repeated allocations when the comparer is used in hot paths.

Set Operations on Value Types vs Reference Types

Type Default Equality Behavior with Distinct/Union/etc.
int, string, bool Structural Works without comparer
record (C# 9+) Structural (generated) Works without comparer
struct (custom) Structural by default Works correctly
class without override Reference identity Use IEqualityComparer or DistinctBy
class with Equals/GetHashCode override Custom structural Works without comparer

C# records are the cleanest solution for data objects used in set operations. If you're designing new domain types for LINQ-heavy code, consider records as the default -- they eliminate IEqualityComparer boilerplate for the common case.

For systems that use C# enums as category or status fields, DistinctBy, ExceptBy, and IntersectBy on enum fields work cleanly because enums have value-type equality by default.

Combining Set Operations

Set operators compose naturally. A classic pattern is building a "delta" between two snapshots of a collection:

namespace DevLeader.LinqSetOperations;

IEnumerable<Product> catalogPrevious = GetPreviousCatalog();
IEnumerable<Product> catalogCurrent  = GetCurrentCatalog();

// Products added in the latest sync
IEnumerable<Product> added =
    catalogCurrent.ExceptBy(catalogPrevious.Select(p => p.Id), p => p.Id);

// Products removed in the latest sync
IEnumerable<Product> removed =
    catalogPrevious.ExceptBy(catalogCurrent.Select(p => p.Id), p => p.Id);

// Products present in both snapshots
IEnumerable<Product> unchanged =
    catalogCurrent.IntersectBy(catalogPrevious.Select(p => p.Id), p => p.Id);

This delta pattern is common in sync workflows, change-detection pipelines, and incremental data processing. For architectures where this kind of event sourcing matters, the observer design pattern pairs well -- observers can react to added/removed sets rather than individual element events.

For plugin systems, ExceptBy and UnionBy are useful for merging plugin-registered types with core types without duplication.


FAQ

What is the difference between Distinct and DistinctBy in C#?

Distinct() removes duplicate elements based on the element's own equality (either default equality for value types/records, or an optional IEqualityComparer). DistinctBy(keySelector) removes duplicates based on a projected key -- keeping the first element per unique key value -- without requiring a custom comparer. Added in .NET 6.

How does ExceptBy work with a key selector?

ExceptBy(second, keySelector) returns elements from the first sequence whose key (as extracted by keySelector) does not appear in second. The second argument is an IEnumerable<TKey> -- the set of key values to exclude. You typically build this via .Select(e => e.KeyProp) on the exclusion collection.

Does Union remove duplicates from within each source sequence?

Yes. Union deduplicates across the combined result, not just between the two sequences. Elements that appear multiple times within a single source sequence are also deduplicated. If you want to preserve duplicates within each sequence and only combine without deduplication, use Concat instead.

When should I use Intersect vs Where with a HashSet?

Both are valid. IntersectBy is more declarative and signals intent clearly. The Where + HashSet pattern can be more efficient when the inner sequence is large and you want explicit O(1) lookups -- IntersectBy builds its own HashSet internally, so the performance is equivalent, but the explicit pattern lets you reuse the HashSet across multiple operations.

Are LINQ set operations deferred?

Yes. Distinct, DistinctBy, Union, UnionBy, Intersect, IntersectBy, Except, and ExceptBy all use deferred execution. The internal hash set that tracks seen/excluded elements is built as elements are yielded, not when the operator is called.

Can I use set operations across different sequence types?

Only if both sequences share the same element type T. If you have IEnumerable<Product> and IEnumerable<DiscountedProduct> where DiscountedProduct is a subtype, you'll need to project one or both to a common type first. This is where the By variants shine -- IntersectBy lets you correlate IEnumerable<Product> with IEnumerable<DiscountedProduct> using their shared Id field without a common type projection.


Summary

LINQ set operations in C# form a compact, expressive toolkit for deduplication, overlap, and difference:

  • Distinct and DistinctBy remove duplicates -- reach for DistinctBy in .NET 6+ to avoid custom comparers.
  • Union and UnionBy combine sequences while deduplicating -- the left sequence wins when both sides contain the same key.
  • Intersect and IntersectBy return only elements present in both sequences.
  • Except and ExceptBy return elements from the left sequence absent from the right.
  • All operators support deferred execution and optional IEqualityComparer<T> for custom equality.

These operators pair well with other LINQ features -- combine them with LINQ grouping for frequency analysis, or with feature-slice query handlers to encapsulate complex set logic behind clean query interfaces. The factory method pattern is also useful when the equality comparison strategy needs to vary at runtime -- inject the comparer rather than hardcoding it.

LINQ Aggregation in C#: Count, Sum, Min, Max, Average, and Aggregate

Master LINQ aggregation in C# with Count, Sum, MinBy, MaxBy, and custom Aggregate, plus why Count vs Any matters for real-world .NET performance.

LINQ in C#: Complete Guide to Language Integrated Query (.NET 6-9)

Master LINQ in C# with this complete guide covering filtering, projection, ordering, grouping, joins, and every new operator added in .NET 6 through .NET 10.

LINQ Filtering in C#: Where, Any, All, Contains, and OfType

Learn LINQ filtering in C# with Where, Any, All, Contains, and OfType. Covers compound predicates, null handling, and performance tips with .NET 6-9 examples.

An error has occurred. This application may no longer respond until reloaded. Reload