BrandGhost
C# Regex Performance: GeneratedRegex, Compiled, NonBacktracking, and Timeout

C# Regex Performance: GeneratedRegex, Compiled, NonBacktracking, and Timeout

C# Regex Performance: GeneratedRegex, Compiled, NonBacktracking, and Timeout

C# regex performance spans multiple orders of magnitude depending on how you construct and use your patterns. The same pattern applied to the same input can run 10x faster or have zero allocation overhead -- the difference is entirely in which API you choose and how you configure it. With .NET 7 and .NET 8, the performance toolkit expanded significantly. This article breaks down every option so you can make informed decisions.


The Four Performance Modes

Before diving into implementation details, it helps to understand the high-level trade-off space. C# regex offers four distinct execution modes, each optimized for different scenarios. Choosing the wrong mode for a hot path is a common performance mistake. Choosing the right mode is straightforward once you understand the axes: startup cost, runtime speed, allocation profile, and whether the mode is immune to ReDoS attacks.

At a high level, C# regex has four modes:

Mode Startup Runtime Allocation ReDoS safe
Interpreted (default) Fast Slow High No
Compiled Slow (JIT) Fast Lower No
[GeneratedRegex] Zero Fastest Lowest No
NonBacktracking Fast O(n) linear Low Yes

Understanding when to use each is the key to good regex performance in .NET.


Interpreted Regex -- The Default

When you write new Regex(@"d+") without any RegexOptions, you get interpreted mode. The pattern is parsed into an internal bytecode that the regex engine interprets at match time:

// Interpreted -- convenient, but slowest
var regex = new Regex(@"d+");
bool result = regex.IsMatch("Hello 42");

This is fine for one-off use cases or patterns that run infrequently. The overhead is low for a single call. But if you're matching thousands of strings in a loop with the same pattern, this is leaving performance on the table.

The Static Method Cache

The static methods (Regex.IsMatch(input, pattern)) cache up to 15 regex instances internally. This gives you some benefit of reuse, but the cache eviction policy is simple MRU. Don't rely on it for performance-critical code.


RegexOptions.Compiled -- Runtime JIT Compilation

RegexOptions.Compiled tells the regex engine to JIT-compile the pattern to IL at runtime when the Regex instance is created:

// Pre-.NET 7 recommended approach for hot paths
var regex = new Regex(@"d{3}-d{4}", RegexOptions.Compiled);

// Store this as a static field -- the JIT compilation happens once
private static readonly Regex _phonePattern =
    new Regex(@"d{3}-d{4}", RegexOptions.Compiled);

The trade-offs:

  • Startup cost: The first call to new Regex(...) with Compiled is significantly slower (it JIT-compiles)
  • Runtime speed: ~3-5x faster than interpreted for typical patterns
  • Memory: Uses more memory (stores JIT-compiled IL)
  • Thread-safe: Yes, fully thread-safe

The classic pattern is to store Compiled instances as static readonly fields to pay the startup cost exactly once:

public static class Patterns
{
    public static readonly Regex Phone =
        new Regex(@"d{3}-d{4}", RegexOptions.Compiled, TimeSpan.FromMilliseconds(500));
    
    public static readonly Regex Email =
        new Regex(
            @"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$",
            RegexOptions.Compiled | RegexOptions.IgnoreCase,
            TimeSpan.FromMilliseconds(500));
}

For new code targeting .NET 7+, [GeneratedRegex] is strictly better. But if you're maintaining .NET 6 or older code, Compiled with static readonly is the right approach.


[GeneratedRegex] -- Compile-Time Source Generation (.NET 7+)

[GeneratedRegex] is the game changer introduced in .NET 7. Instead of compiling at runtime, a Roslyn source generator emits optimized C# source code at build time, which the compiler then compiles to IL. The result is zero startup cost, no runtime JIT compilation, and generated code that can be further inlined and optimized by the JIT because the engine knows the exact structure at compile time.

using System.Text.RegularExpressions;

public partial class PhoneValidator
{
    // The source generator replaces this at build time
    [GeneratedRegex(
        @"^d{3}-d{4}$",
        RegexOptions.None,
        matchTimeoutMilliseconds: 500)]
    private static partial Regex PhonePattern();

    public static bool IsValid(string phone)
        => PhonePattern().IsMatch(phone);
}

Requirements:

  1. The containing class must be partial
  2. The method must be static partial
  3. The return type must be Regex
  4. The attribute arguments must be compile-time constants (no variables)

Why [GeneratedRegex] Wins

The source generator doesn't just compile the pattern -- it generates code that is purpose-built for that specific pattern. The generated implementation:

  • Has zero startup cost (no runtime JIT)
  • Avoids the intermediate bytecode interpretation layer
  • Produces output that can be further optimized by the JIT because the structure is fully known
  • Can be inspected in generated source files (look in obj/Generated)

In benchmarks, [GeneratedRegex] typically outperforms Compiled by 20-40% on hot paths, while having zero startup overhead vs. Compiled's slow first-call JIT.

Viewing Generated Code

You can inspect what the source generator produces by looking at obj/Debug/net8.0/generated/ in your project. This is useful for understanding what the compiler is doing and for debugging unexpected behavior.


RegexOptions.NonBacktracking (.NET 7+) -- ReDoS Prevention

Traditional regex engines use backtracking -- when a match attempt fails at some point, the engine backs up and tries alternatives. This is powerful but dangerous. Certain patterns applied to certain inputs cause exponential time complexity -- the infamous catastrophic backtracking (ReDoS vulnerability):

// This pattern on adversarial input can hang a thread indefinitely
var dangerous = new Regex(@"^(a+)+$");

// "aaaaaaaaaaaaaaaaaaaaaX" -- the + in (a+)+ backtracks exponentially
// Without timeout, this will hang

RegexOptions.NonBacktracking uses an NFA-to-DFA construction that runs in O(n) linear time regardless of input:

// Safe against catastrophic backtracking
var safe = new Regex(
    @"^(a+)+$",
    RegexOptions.NonBacktracking);

// Completes quickly even on adversarial input
bool result = safe.IsMatch("aaaaaaaaaaaaaaaaaaaaaX");

NonBacktracking Trade-offs

The O(n) guarantee comes at a cost -- NonBacktracking doesn't support:

  • Backreferences (1, k<name>)
  • Lookaheads ((?=...), (?!...))
  • Lookbehinds ((?<=...), (?<!...))
  • Atomic groups

If your pattern uses these features, NonBacktracking will throw a NotSupportedException at construction time. For patterns that validate untrusted input without needing these features, NonBacktracking is the right choice:

// Good candidates for NonBacktracking:
// - Email validation
// - Phone number validation
// - Numeric input validation
// - URL structure validation

var emailValidator = new Regex(
    @"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$",
    RegexOptions.NonBacktracking);

Always Set a Timeout -- Preventing ReDoS

Even without NonBacktracking, you should always set a timeout in production. This is your last line of defense against ReDoS if a pattern slips through:

// Instance timeout
var regex = new Regex(
    @"w+",
    RegexOptions.Compiled,
    matchTimeout: TimeSpan.FromMilliseconds(500));

// [GeneratedRegex] timeout
[GeneratedRegex(
    @"w+",
    RegexOptions.None,
    matchTimeoutMilliseconds: 500)]
private static partial Regex WordPattern();

When a timeout fires, RegexMatchTimeoutException is thrown. Never silently swallow it -- log it and handle it as a potential attack:

try
{
    bool result = regex.IsMatch(untrustedInput);
}
catch (RegexMatchTimeoutException ex)
{
    _logger.LogWarning(
        "Regex timeout on pattern {Pattern} after {Elapsed}ms. Possible ReDoS.",
        ex.Pattern,
        ex.MatchTimeout.TotalMilliseconds);
    throw new SecurityException("Input validation timeout", ex);
}

Setting a Global Default Timeout

You can set a global default via AppContext:

// In Program.cs or startup
AppContext.SetData("REGEX_DEFAULT_MATCH_TIMEOUT", TimeSpan.FromMilliseconds(500));

This applies to all Regex instances that don't explicitly set a timeout. Useful as a safety net but not a substitute for explicit timeouts on user-facing patterns.


Span-Based APIs -- Zero Allocation

.NET 7 and .NET 8 added Span overloads across the regex API surface. These eliminate heap allocations in the match infrastructure -- no MatchCollection, no Match objects, no string arrays. If your hot path processes high volumes of text, these APIs are the primary lever for reducing GC pressure.

EnumerateMatches (.NET 7+)

EnumerateMatches replaces Matches in zero-allocation scenarios. Instead of a MatchCollection containing heap-allocated Match objects, it yields ValueMatch ref structs -- stack-allocated value types that contain only the Index and Length of each match. You read the matched content by slicing the original string yourself: // Old approach -- allocates MatchCollection + Match objects var matches = regex.Matches(largeInput); foreach (Match m in matches) { /* ... */ }

// New approach -- allocates nothing (ValueMatch is a ref struct) foreach (ValueMatch vm in regex.EnumerateMatches(largeInput)) { var slice = largeInput.AsSpan(vm.Index, vm.Length); // process slice without allocating a string }


### EnumerateSplits (.NET 8+)

`EnumerateSplits` is the .NET 8 counterpart for splitting. Instead of allocating a `string[]` with N+1 substrings, it yields `Range` structs pointing into the original input. You use the ranges to slice the source string or span. This is particularly valuable when processing segments sequentially -- the entire pipeline can be zero-allocation from input to output:

```csharp
// Old approach -- allocates a string[]
string[] parts = regex.Split(input);

// New approach -- zero allocation
foreach (var range in regex.EnumerateSplits(input))
{
    ProcessSegment(input.AsSpan(range));
}

IsMatch with Span

IsMatch also has a ReadOnlySpan<char> overload. When you already hold a span (from slicing, AsSpan, or a Memory<char> pool), you can test it directly without materializing a string. This is the fastest possible way to check a pattern against a segment of a larger buffer:


---

## Benchmarking Regex Modes

Benchmarking regex performance requires care. You need `[MemoryDiagnoser]` to see allocations, a realistic input size, and warm-up runs to avoid measuring JIT compilation time. The benchmark below uses BenchmarkDotNet -- the standard choice for .NET microbenchmarks. Run it with `dotnet run -c Release` and avoid running it alongside other CPU-heavy processes. Here's a representative setup comparing all major approaches:

```csharp
using BenchmarkDotNet.Attributes;
using System.Text.RegularExpressions;

[MemoryDiagnoser]
public partial class RegexBenchmarks
{
    private const string Input = "Phone: 555-1234, Alt: 888-9999, Ext: 777-0000";

    private static readonly Regex _compiled =
        new Regex(@"d{3}-d{4}", RegexOptions.Compiled, TimeSpan.FromSeconds(1));

    [GeneratedRegex(@"d{3}-d{4}", RegexOptions.None, matchTimeoutMilliseconds: 1000)]
    private static partial Regex Generated();

    [Benchmark]
    public int Interpreted()
    {
        var r = new Regex(@"d{3}-d{4}");
        return r.Matches(Input).Count;
    }

    [Benchmark(Baseline = true)]
    public int Compiled()
        => _compiled.Matches(Input).Count;

    [Benchmark]
    public int GeneratedRegex()
        => Generated().Matches(Input).Count;

    [Benchmark]
    public int GeneratedEnumerate()
    {
        int count = 0;
        foreach (var _ in Generated().EnumerateMatches(Input)) count++;
        return count;
    }
}

Typical results on .NET 8:

  • Interpreted (new instance per call): ~10x slower, highest allocation
  • Compiled (static instance): fast, moderate allocation
  • GeneratedRegex (static method): ~20% faster than Compiled, lower allocation
  • GeneratedRegex + EnumerateMatches: fastest + near-zero allocation

Combining [GeneratedRegex] with RegexOptions

You can combine multiple RegexOptions flags, and this composability makes [GeneratedRegex] even more powerful. Since RegexOptions is a flags enum, you OR them together:

[GeneratedRegex(
    @"^s*(?<key>[^=s]+)s*=s*(?<value>.+?)s*$",
    RegexOptions.Multiline | RegexOptions.IgnoreCase,
    matchTimeoutMilliseconds: 500)]
private static partial Regex ConfigLinePattern();

For a deeper understanding of how flags enums work in C# (including how RegexOptions combines values), see C# Enum: Complete Guide to Enumerations in .NET.


Decision Guide: Which Mode Should You Use?

Use this guide for new code:

  1. Infrequent/one-off matches: Default interpreted mode via static Regex.IsMatch/Match/Matches
  2. .NET 6 or earlier, frequent matches: new Regex(..., RegexOptions.Compiled) as static readonly
  3. .NET 7+, any frequent matches: [GeneratedRegex] -- always preferred over Compiled
  4. User-supplied input, untrusted data: RegexOptions.NonBacktracking + timeout
  5. High-throughput processing (logs, data pipelines): [GeneratedRegex] + EnumerateMatches/EnumerateSplits

For singleton-style regex service classes, the Singleton Design Pattern in C# gives you the thread-safe initialization patterns to build on. And if you're building a system where different components contribute different validation rules, the Plugin Architecture in C# shows how to architect that extensibility correctly.



Understanding How .NET Compiles Regular Expressions

When you write new Regex(pattern), the engine doesn't use the pattern string directly. It parses it into an internal representation -- an abstract syntax tree -- and then converts that tree into a finite automaton (NFA or DFA depending on mode). Understanding this process helps you understand the performance differences between modes.

In the default interpreted mode, the regex engine walks the compiled NFA at match time. The NFA represents the pattern as a graph of states and transitions. Matching is essentially a graph traversal, and backtracking is the engine exploring alternative branches when a path fails. This is fast to compile (parse → NFA is cheap) but slower per match than alternatives.

In RegexOptions.Compiled mode, the NFA is further compiled to MSIL bytecode via System.Reflection.Emit. The engine emits a custom method that directly implements the matching algorithm as .NET bytecode, which then gets JIT-compiled to native code. First use has a large upfront cost (hundreds of milliseconds for complex patterns) because it runs both the regex compiler and the JIT compiler. But subsequent matches run at native speed because they execute the emitted bytecode directly, without the interpreter overhead.

With [GeneratedRegex] in .NET 7+, the compilation step moves entirely to build time. The Roslyn source generator analyzes the pattern at compile time and emits C# code that implements the match algorithm. This C# code is compiled by the normal build process -- no runtime reflection or Emit required. The result is a generated static class with matching methods that run at native speed with zero startup cost.

With RegexOptions.NonBacktracking, the engine uses a fundamentally different algorithm. Instead of an NFA with backtracking, it converts the pattern to a DFA (or simulates one via an NFA executed without backtracking). The DFA guarantees O(n) linear time complexity -- the engine never revisits a character, and the worst case is proportional to input length regardless of pattern complexity. The trade-off is that constructs that require backtracking (lookaheads, lookbehinds, backreferences) are not supported.


Allocations in Regex Operations -- What Gets Heap Allocated

One reason to care about regex allocation is that .NET's GC works by collecting short-lived objects. A function called thousands of times per second that allocates several kilobytes per call creates continuous GC pressure. For throughput-sensitive applications, minimizing allocations in hot paths matters.

Here's what each approach allocates:

new Regex(pattern) in a loop: Allocates the parsed pattern representation, the NFA state graph, and associated internal caches on every call. This is by far the most expensive option. Never do this.

Static Regex methods (Regex.Match(input, pattern)): Use an internal LRU cache of 15 pre-compiled Regex instances. If you call with the same pattern, it's typically cached. But there's a cache eviction risk if you're using more than 15 patterns.

regex.Matches(input): Allocates a MatchCollection object plus one Match object per match. For a string with 1000 numeric tokens, that's 1001 heap allocations in a single call.

regex.EnumerateMatches(input): Returns a ValueMatchEnumerator (a ref struct, stack allocated). Each ValueMatch is a value type with just Index and Length. Zero heap allocations for the match infrastructure itself. You slice the input string to get the matched text when needed.

regex.Split(input): Allocates a string[] array plus each substring. For an input with N delimiters, you get N+1 substring allocations.

regex.EnumerateSplits(input) (.NET 8+): Returns Range structs via an enumerator. No string array, no substrings. If you can process each segment as a ReadOnlySpan<char>, the entire operation can be zero-allocation.


Performance Checklist

The following checklist summarizes the key decisions for regex-heavy code. Use it as a code review guide or a pre-ship check when you've written or modified regex-intensive components. Each item corresponds to a common performance or safety mistake that's easy to miss in review:

  • Hot paths use [GeneratedRegex] or static readonly Regex with Compiled
  • All production patterns have a matchTimeout
  • User-facing/untrusted input uses NonBacktracking or has rigorous timeout handling
  • High-throughput loops use EnumerateMatches instead of Matches
  • No unnecessary new Regex(...) inside loops (same pattern, different instance each iteration)
  • RegexMatchTimeoutException is caught and logged at the appropriate level

FAQ

What is [GeneratedRegex] in .NET 7?

[GeneratedRegex] is an attribute that triggers a Roslyn source generator at build time. Instead of compiling the regex at runtime (like RegexOptions.Compiled), the generator emits optimized C# code that is compiled into your assembly. This gives zero startup cost and better runtime performance than any runtime approach.

How much faster is [GeneratedRegex] vs Compiled?

In benchmarks, [GeneratedRegex] is typically 20-40% faster at runtime than RegexOptions.Compiled, with lower allocations. More importantly, it has zero startup cost -- Compiled must JIT-compile on first use, which can take milliseconds. For applications with many regex patterns, this startup difference is significant.

What is RegexOptions.NonBacktracking?

NonBacktracking (introduced in .NET 7) changes the matching algorithm from backtracking NFA to a deterministic NFA/DFA approach, guaranteeing O(n) linear time complexity. This makes it immune to catastrophic backtracking (ReDoS) attacks. The trade-off: it doesn't support lookaheads, lookbehinds, or backreferences.

Why do I need a regex timeout in production?

Without a timeout, a regex with certain patterns (especially nested quantifiers like (a+)+) can take exponential time on carefully crafted adversarial input. This is called ReDoS -- Regular Expression Denial of Service. A timeout causes RegexMatchTimeoutException to be thrown, letting you fail safely instead of hanging the thread indefinitely.

Can I use [GeneratedRegex] with MatchEvaluator?

Yes. The generated Regex instance supports all the same methods as a normal Regex instance, including Replace(string, MatchEvaluator). The performance benefit applies to the match phase -- the MatchEvaluator delegate is your code and runs the same regardless of which regex mode you use.

Is static Regex (via [GeneratedRegex]) thread-safe?

Yes. Regex instances are immutable after construction -- all match methods are thread-safe. [GeneratedRegex] static methods return the same underlying instance each time, and multiple threads can call match methods on it concurrently without issue.

When should I NOT use [GeneratedRegex]?

[GeneratedRegex] requires a partial class and compile-time constant pattern. If your regex pattern is built dynamically at runtime (e.g., assembled from user configuration), you cannot use [GeneratedRegex]. In that case, use new Regex(pattern, RegexOptions.Compiled, timeout) and cache the instance.

C# Regex Patterns for Validation: Email, URL, Phone, and Common Inputs

Build reliable C# regex validation for email, URL, phone, and common inputs using GeneratedRegex, NonBacktracking, and timeout for production safety.

C# Regex: Complete Guide to Regular Expressions in .NET

Master C# Regex with this complete guide covering pattern syntax, RegexOptions, GeneratedRegex, performance, and real .NET code examples.

C# Regex Performance: How To Squeeze Out Performance

Regular expressions are powerful for pattern matching, but what about performance? Check out this article for details on C# regex performance from benchmarks!

An error has occurred. This application may no longer respond until reloaded. Reload