BrandGhost
C# Regex: Complete Guide to Regular Expressions in .NET

C# Regex: Complete Guide to Regular Expressions in .NET

C# Regex: Complete Guide to Regular Expressions in .NET

C# Regex -- regular expressions in .NET -- is one of the most powerful tools you can reach for when working with text. Pattern matching, validation, text extraction, replacement, splitting: regex handles all of it. The challenge is that the API surface is large, the performance characteristics vary wildly between approaches, and newer .NET versions introduced game-changing features most developers haven't discovered yet.

This guide covers everything you need. We'll start from the fundamentals of the System.Text.RegularExpressions namespace, walk through the core methods, explore RegexOptions, and then dig into modern .NET 7 and .NET 8 features like [GeneratedRegex], RegexOptions.NonBacktracking, and Span-based zero-allocation APIs.


What Is Regex in C#?

Regular expressions are patterns that describe sets of strings. In C#, the System.Text.RegularExpressions namespace provides everything you need. The central class is Regex, and it has been part of .NET since version 1.0 -- but it has evolved significantly over the years.

At its core, you write a pattern string, create (or use) a Regex instance, and apply it to text. Here's the simplest possible example:

using System.Text.RegularExpressions;

var regex = new Regex(@"d+");
bool hasDigits = regex.IsMatch("Hello 42 world");
Console.WriteLine(hasDigits); // True

The @ prefix creates a verbatim string literal -- critical for regex patterns because backslashes are common and you don't want double-escaping.


The System.Text.RegularExpressions Namespace

Before writing your first pattern, it helps to understand what tools are available. The System.Text.RegularExpressions namespace is comprehensive -- it's not just the Regex class. Each type plays a specific role, and knowing which type does what saves you time when reading documentation or debugging match results. Most developers only use Regex and Match directly, but the full type hierarchy becomes important once you work with capturing groups, repeated captures, or performance-sensitive scenarios.

The namespace contains:

  • Regex -- the main class for pattern matching, replacing, and splitting
  • Match -- represents a single match result
  • MatchCollection -- a collection of all matches
  • Group -- a captured group within a match
  • GroupCollection -- the collection of groups in a match
  • Capture -- a single capture within a group
  • CaptureCollection -- all captures for a group
  • RegexOptions -- flags that modify regex behavior
  • MatchEvaluator -- a delegate for custom replacement logic

In .NET 7+, you also have:

  • [GeneratedRegex] attribute -- compile-time source generation
  • ValueMatch -- ref struct returned by EnumerateMatches

Core Regex Methods

The Regex class exposes six primary methods that cover the vast majority of regex use cases. Each method has a distinct purpose: checking for existence, finding one match, finding all matches, replacing matched text, splitting text, and the modern enumeration-based variants. Knowing which method to reach for in a given situation is a significant part of effective regex use in C#. Choosing IsMatch when you only need to know if a pattern exists avoids unnecessary allocation. Choosing Matches when you need every occurrence is cleaner than looping with NextMatch. The right method makes your code both more readable and more performant.

IsMatch -- Does the Pattern Exist?

IsMatch is the simplest method. It returns true if the pattern matches anywhere in the input:

using System.Text.RegularExpressions;

var regex = new Regex(@"^d{3}-d{4}$");

Console.WriteLine(regex.IsMatch("555-1234")); // True
Console.WriteLine(regex.IsMatch("5551234"));  // False

Match -- Find the First Match

Match returns a Match object representing the first occurrence of the pattern in the input string. The Match object carries three key properties: Value (the matched text), Index (start position in the original string), and Length (number of matched characters). Always check match.Success before accessing these -- an unmatched result returns a Match object with Success = false rather than throwing:

var regex = new Regex(@"w+");
var match = regex.Match("Hello world");

if (match.Success)
{
    Console.WriteLine(match.Value);  // Hello
    Console.WriteLine(match.Index);  // 0
    Console.WriteLine(match.Length); // 5
}

Call match.NextMatch() to step through subsequent matches, or use Matches to get them all at once.

Matches -- Find All Matches

Matches returns a MatchCollection containing every non-overlapping occurrence of the pattern, in left-to-right order. Unlike calling NextMatch() in a loop, Matches gives you the entire collection upfront, which is convenient when you need a count or want to LINQ-query the results. For large inputs where you want zero-allocation iteration, prefer EnumerateMatches (covered below):

var regex = new Regex(@"d+");
var matches = regex.Matches("Price: $12, Qty: 3, Total: $36");

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}
// 12
// 3
// 36

Replace -- Substitute Matches

Replace substitutes matched text with a replacement string or the result of a MatchEvaluator delegate. The two-argument overload handles simple substitutions. The MatchEvaluator overload is where Replace gets powerful -- the delegate receives each Match and returns the replacement string, letting you perform dynamic transformations like uppercasing, number formatting, or database lookups inline:

var regex = new Regex(@"d{4}");

// Simple replacement
string result = regex.Replace("Card: 4242 4242 4242 4242", "****");
Console.WriteLine(result); // Card: **** **** **** ****

// MatchEvaluator delegate for dynamic replacements
string doubled = regex.Replace("4242 4242", m => (int.Parse(m.Value) * 2).ToString());
Console.WriteLine(doubled); // 8484 8484

Split -- Divide by Pattern

Split divides a string at every position where the pattern matches and returns the pieces between those positions. This is more powerful than string.Split because the delimiter can be any pattern -- not just a fixed character. For example, you can split on any combination of whitespace and punctuation in one call. The matched delimiter itself is discarded (unless the pattern uses capturing groups, in which case the captured text is included in the result array):

var regex = new Regex(@"[,;s]+");
string[] parts = regex.Split("alpha, beta;  gamma delta");

foreach (var p in parts)
{
    Console.WriteLine(p);
}
// alpha
// beta
// gamma
// delta

Regex Pattern Syntax Essentials

Regex has its own mini-language built on top of plain text. Learning the syntax is a matter of memorizing a relatively small vocabulary of special characters and quantifiers, then combining them into patterns that describe exactly what you want to match. The elements below cover the vast majority of real-world use cases. More exotic constructs (lookaheads, lookbehinds, balancing groups) are covered in the advanced patterns article in this series. For now, mastering the fundamentals in this table will get you through 90% of practical regex tasks.

Here's a quick reference of common pattern elements:

Pattern Meaning
. Any character except newline
d Digit (0-9)
D Non-digit
w Word character (letter, digit, underscore)
W Non-word character
s Whitespace
S Non-whitespace
^ Start of string (or line with Multiline)
$ End of string (or line with Multiline)
 Word boundary
* Zero or more
+ One or more
? Zero or one (also makes quantifiers lazy)
{n,m} Between n and m repetitions
[abc] Character class
[^abc] Negated character class
(abc) Capturing group
(?:abc) Non-capturing group
(?<name>...) Named capturing group
a|b Alternation

RegexOptions -- Controlling Behavior

RegexOptions is an enum that controls matching behavior. You can combine flags with the bitwise OR operator (|). Since RegexOptions is a flags enum, this pattern works cleanly -- if you want to understand how .NET handles flags enums in depth, check out the C# Enum: Complete Guide to Enumerations in .NET.

Key options:

using System.Text.RegularExpressions;

// Case-insensitive matching
var caseInsensitive = new Regex(@"hello", RegexOptions.IgnoreCase);
Console.WriteLine(caseInsensitive.IsMatch("HELLO")); // True

// Multiline: ^ and $ match line boundaries, not string boundaries
var multiline = new Regex(@"^d+", RegexOptions.Multiline);
var matches = multiline.Matches("42
99
7");
Console.WriteLine(matches.Count); // 3

// Singleline: dot matches 
 too
var singleline = new Regex(@"A.+Z", RegexOptions.Singleline);
Console.WriteLine(singleline.IsMatch("A
Z")); // True

// ExplicitCapture: only named groups are captured (performance win)
var explicitCapture = new Regex(
    @"(?<year>d{4})-(?<month>d{2})-(?<day>d{2})",
    RegexOptions.ExplicitCapture);

// IgnorePatternWhitespace: allows comments inside the pattern
var commented = new Regex(@"
    d{3}   # area code
    -       # separator
    d{4}   # number
", RegexOptions.IgnorePatternWhitespace);

Named Capture Groups

Named groups are one of the most useful features in regex. Instead of using numeric indices ($1, $2), you give groups descriptive names:

var regex = new Regex(@"(?<year>d{4})-(?<month>d{2})-(?<day>d{2})");
var match = regex.Match("Date: 2026-05-07");

if (match.Success)
{
    Console.WriteLine(match.Groups["year"].Value);  // 2026
    Console.WriteLine(match.Groups["month"].Value); // 05
    Console.WriteLine(match.Groups["day"].Value);   // 07
}

This is far more maintainable than remembering which index corresponds to which group -- especially in complex patterns.


Static Regex Methods vs Instance Methods

The Regex class exposes both static and instance methods. The static versions handle one-off calls by caching the compiled regex internally:

// Static -- convenient for one-off use
bool match = Regex.IsMatch("hello", @"w+");

// Instance -- better when reusing the pattern many times
var regex = new Regex(@"w+", RegexOptions.Compiled);
// reuse `regex` across multiple calls

The static methods maintain an internal cache (default size: 15 entries). For production code that runs the same patterns repeatedly, creating an instance with RegexOptions.Compiled -- or better yet, using [GeneratedRegex] -- is the right approach.


Modern .NET Regex APIs (.NET 7+)

.NET 7 and .NET 8 added several major improvements to the regex story. The headline feature is [GeneratedRegex], which moves regex compilation entirely to build time using a Roslyn source generator. But the improvements go further -- RegexOptions.NonBacktracking makes it safe to run regex against untrusted input, and the Span-based APIs (EnumerateMatches, EnumerateSplits, IsMatch(ReadOnlySpan<char>)) eliminate heap allocations in tight loops. If you're still writing new Regex(pattern, RegexOptions.Compiled) in .NET 7+ code, you're leaving real performance gains on the table. These are not minor improvements -- they change the performance and security characteristics of regex in meaningful ways.

[GeneratedRegex] -- Compile-Time Source Generation

Introduced in .NET 7, [GeneratedRegex] moves regex compilation from runtime to build time. The Roslyn source generator emits optimized C# source code at build time, which the compiler then compiles to IL -- yielding the best possible startup performance and runtime efficiency:

using System.Text.RegularExpressions;

public partial class EmailValidator
{
    [GeneratedRegex(
        @"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$",
        RegexOptions.IgnoreCase)]
    private static partial Regex EmailPattern();

    public static bool IsValidEmail(string input)
        => EmailPattern().IsMatch(input);
}

The method must be static partial, return Regex, and the class must be partial. At compile time, the source generator replaces the method body with a highly optimized implementation.

This is superior to RegexOptions.Compiled for hot paths because Compiled still has startup overhead (it compiles at runtime), while [GeneratedRegex] has zero startup cost.

RegexOptions.NonBacktracking (.NET 7+)

Traditional regex uses backtracking, which can lead to catastrophic backtracking on certain patterns -- a ReDoS (Regular Expression Denial of Service) vulnerability. NonBacktracking uses an NFA/DFA approach with O(n) linear time complexity:

var regex = new Regex(
    @"^(a+)+$",
    RegexOptions.NonBacktracking,
    TimeSpan.FromMilliseconds(500));

// Safe even against adversarial input
bool result = regex.IsMatch("aaaaaaaaaaaaaaaaaaaaaaaaaX");

The trade-off: NonBacktracking doesn't support backreferences or lookaheads/lookbehinds. It's best for untrusted input validation.

EnumerateMatches (.NET 7+)

Regex.EnumerateMatches returns a ValueMatchEnumerator that yields ValueMatch ref structs. This is the Span-based, zero-allocation way to iterate over all matches. Unlike Matches, which allocates a MatchCollection and individual Match objects on the heap, EnumerateMatches yields lightweight ValueMatch structs containing only the Index and Length of each match. You slice the original input yourself to access the matched text. This makes it ideal for high-throughput scenarios where heap allocation is a measurable cost:

var regex = new Regex(@"d+");
var input = "Items: 10, 20, 30, 40";

foreach (ValueMatch match in regex.EnumerateMatches(input))
{
    var slice = input.AsSpan(match.Index, match.Length);
    Console.WriteLine(slice.ToString());
}
// 10
// 20
// 30
// 40

No MatchCollection allocation. No individual Match objects on the heap. This is ideal for hot paths.

EnumerateSplits (.NET 8+)

Similarly, Regex.EnumerateSplits provides Span-based, zero-allocation splitting. Introduced in .NET 8, it returns a Regex.SplitEnumerator that yields Range structs. You use those ranges to slice the original input string yourself, avoiding the string array that Split would normally allocate. This pattern is especially useful when splitting large files line by line or tokenizing CSV/TSV data in a streaming context:

var regex = new Regex(@"[,;]+");
var input = "alpha,beta;gamma,,delta";

foreach (var range in regex.EnumerateSplits(input))
{
    Console.WriteLine(input[range]);
}

Always Set a Timeout in Production

This cannot be overstated. Without a timeout, a poorly written pattern combined with adversarial input can cause your application to hang indefinitely. Always pass matchTimeout:

var regex = new Regex(
    @"w+",
    RegexOptions.None,
    TimeSpan.FromMilliseconds(500));

try
{
    var matches = regex.Matches(untrustedInput);
}
catch (RegexMatchTimeoutException ex)
{
    // Log and handle -- never silently swallow
    Console.WriteLine($"Regex timed out: {ex.Message}");
}

If you're using [GeneratedRegex], you can specify the timeout in the attribute:

[GeneratedRegex(@"w+", RegexOptions.None, matchTimeoutMilliseconds: 500)]
private static partial Regex WordPattern();

Organizing Regex Patterns in C# Codebases

A common mistake is scattering regex patterns throughout your codebase as inline strings. A better approach -- particularly as your application scales -- is to centralize them. This ties directly into ideas like feature slicing in C#, where each feature module owns its own pattern definitions.

For medium-to-large codebases, consider:

  1. A dedicated Patterns static class (or namespace-scoped partial classes with [GeneratedRegex])
  2. Feature-local pattern classes where each feature slice owns its regex
namespace MyApp.Validation;

public static partial class ValidationPatterns
{
    [GeneratedRegex(
        @"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$",
        RegexOptions.IgnoreCase,
        matchTimeoutMilliseconds: 500)]
    public static partial Regex Email();

    [GeneratedRegex(
        @"^+?[1-9]d{1,14}$",
        RegexOptions.None,
        matchTimeoutMilliseconds: 500)]
    public static partial Regex PhoneE164();
}

This gives you compile-time optimization, centralized maintenance, and clear ownership. If you're building extensible systems where plugins contribute validation rules, the Plugin Architecture in C# guide covers how to wire up dynamically loaded modules cleanly.


Thread Safety

Regex instances are thread-safe for matching operations. You can freely share a Regex instance across threads. This makes static [GeneratedRegex] methods ideal -- the generated instance is effectively a singleton.

The Singleton Design Pattern in C# covers thread-safe lazy initialization patterns that apply equally to Regex instances when you can't use [GeneratedRegex].


Performance Summary

Understanding the performance trade-offs between regex modes is crucial for production C# applications. The differences are not just academic -- they affect startup time, memory usage, and security posture. A poor choice (like creating a new Regex instance inside a tight loop) can turn a fast regex operation into a bottleneck. A good choice (like using [GeneratedRegex] with EnumerateMatches) can make regex competitive with hand-written string parsing in terms of both speed and allocation.

Approach Startup Cost Runtime Speed Allocation Best For
new Regex(pattern) Interpreted Slower Higher One-off use
new Regex(pattern, Compiled) JIT compile (slow) Fast Lower Frequent reuse
[GeneratedRegex] Zero (build-time) Fastest Lowest Hot paths
NonBacktracking Zero/Low O(n) linear Low Untrusted input

Common Regex Mistakes to Avoid

Even experienced developers fall into a handful of common regex pitfalls. Knowing these in advance saves debugging time.

Creating Regex instances in loops is the most common performance mistake. Each new Regex(pattern) call parses and compiles the pattern. If you do this in a hot loop, you're paying compilation overhead on every iteration. The fix: hoist the instance to a static readonly field, or better yet, use [GeneratedRegex].

Forgetting the @ prefix leads to confusing bugs. The pattern d+ written as "\d+" works, but "d+" does not -- the backslash is interpreted as a C# escape sequence. Always use verbatim string literals (@"d+") for regex patterns.

Not anchoring validation patterns is a subtle but serious bug. The pattern d{5} matches any string containing 5 digits -- including "abc12345xyz". For validation, always anchor with ^ and $: ^d{5}$.

Missing timeouts on user-facing patterns is a security oversight. Any pattern that processes user-supplied data without a timeout is potentially vulnerable to ReDoS. Set matchTimeoutMilliseconds in [GeneratedRegex] or pass TimeSpan to the Regex constructor.

Over-complicated patterns are hard to maintain and slow to execute. If your regex exceeds 2-3 lines, consider whether a combination of simpler patterns or a small state machine would serve better. Complex patterns can also be harder to optimize with [GeneratedRegex].

What's in This Series

This article is the hub for the C# Regex cluster.The spoke articles deep-dive into specific topics:

  • Match, Matches, and IsMatch -- Named groups, capture collections, and Span-based enumeration
  • Replace and Split -- MatchEvaluator, EnumerateSplits, and substitution syntax
  • Performance -- GeneratedRegex vs Compiled, NonBacktracking, Timeout, and benchmarks
  • Lookahead, Lookbehind, and Advanced Patterns -- Zero-width assertions, backreferences, conditionals
  • Validation Patterns -- Email, phone, URL, postal codes, and other common patterns

FAQ

What namespace do I need for Regex in C#?

Add using System.Text.RegularExpressions; at the top of your file. The Regex class, Match, MatchCollection, Group, and RegexOptions all live in this namespace.

What is the difference between Regex.Compiled and GeneratedRegex in C#?

RegexOptions.Compiled JIT-compiles the regex at runtime when the Regex instance is first created -- this has startup cost but fast matching. [GeneratedRegex] (available in .NET 7+) uses a compile-time source generator to produce IL code at build time, giving you zero startup cost and the best possible runtime performance.

How do I make a C# regex case-insensitive?

Pass RegexOptions.IgnoreCase when creating the regex: new Regex(@"pattern", RegexOptions.IgnoreCase). With [GeneratedRegex], add it as the second argument: [GeneratedRegex(@"pattern", RegexOptions.IgnoreCase)].

Is Regex thread-safe in C#?

Yes. Regex instances are thread-safe for all read operations (IsMatch, Match, Matches, Replace, Split). Multiple threads can share the same Regex instance without synchronization. This is one reason [GeneratedRegex] methods work so well -- the generated static instance is safe to call concurrently.

How do I prevent ReDoS attacks with C# Regex?

Two complementary defenses: always set a matchTimeout in the Regex constructor (or the [GeneratedRegex] attribute), and consider RegexOptions.NonBacktracking (.NET 7+) for patterns applied to untrusted input. NonBacktracking runs in O(n) linear time and cannot catastrophically backtrack.

What is the @ prefix in C# regex patterns?

The @ prefix creates a verbatim string literal in C#, which disables backslash escaping. Since regex patterns use backslashes heavily (e.g., d, , w), verbatim strings are almost always the right choice. Without it, you'd need to double every backslash: "\d" instead of @"d".

When should I use static Regex methods vs instance methods?

Static methods like Regex.IsMatch(input, pattern) are convenient for one-off patterns -- they use an internal cache (15 entries by default). Instance methods are better when you reuse the same pattern many times, especially with RegexOptions.Compiled or [GeneratedRegex]. For new .NET code, prefer [GeneratedRegex] over both.

Regex Options in C#: A Beginner's Guide to Powerful Pattern Matching

Regular expressions are powerful for pattern matching, but what are the regex options in C# that we have access to? What do they do and how can we use them?

Regular Expressions in C#: 3 Examples You Need to Know

Check out these 3 simple examples of regular expressions in C#! If you're new to regex and coding in CSharp, these are a perfect starting point for you!

C# Regex Performance: How To Squeeze Out Performance

Regular expressions are powerful for pattern matching, but what about performance? Check out this article for details on C# regex performance from benchmarks!

An error has occurred. This application may no longer respond until reloaded. Reload