C# Source Generators: Reading the Roslyn Syntax Tree
Source generators are one of the most powerful tools in the modern .NET ecosystem -- they let you inspect your code at compile time and emit new C# automatically, with zero runtime overhead. But to use them well, you need to understand the foundational data structure underneath: the Roslyn syntax tree that represents every file in your compilation.
If you have tried reading a source generator tutorial and felt lost the moment it jumped to SyntaxProvider and INamedTypeSymbol, this article is for you. We are going to work through the full picture -- from what a syntax tree actually is, to how you traverse it efficiently using IIncrementalGenerator introduced in .NET 6 and supported through .NET 10. Understanding how to read and traverse a roslyn syntax tree in a c# source generator is what separates generators that are fast, correct, and maintainable from ones that are slow, brittle, and cause confusing compiler errors.
What Is a Roslyn Syntax Tree?
A syntax tree -- also called an Abstract Syntax Tree, or AST -- is a tree-shaped data structure that represents the syntactic structure of your source code. Every class declaration, method, attribute, and parameter becomes a node in this tree. Keywords, identifiers, and punctuation appear as tokens -- the leaf elements attached to nodes -- while whitespace and comments are preserved as trivia on those tokens. Roslyn builds one SyntaxTree per source file, and together all those trees make up the Compilation object your generator receives.
What makes Roslyn's syntax tree distinct is that it is lossless. Whitespace, comments, and #pragma directives are preserved as trivia attached to nodes and tokens, making it suitable for both analysis and code rewriting.
The root node of any C# file's syntax tree is CompilationUnitSyntax. From there the syntax tree branches into namespace declarations, type declarations, member declarations, statements, and expressions. Every node has a concrete type derived from SyntaxNode -- for example, ClassDeclarationSyntax, MethodDeclarationSyntax, or AttributeSyntax -- and those types expose strongly typed properties giving you access to child nodes, tokens, and the raw source text.
The syntax tree is immutable. Once built, it does not change. This immutability lets the incremental generator pipeline cache and reuse trees efficiently -- it can determine whether a file's syntax tree changed since the last pass and skip re-running transforms when nothing has changed.
The Two Layers: Syntax vs Semantics
One of the most important conceptual distinctions in working with Roslyn for a c# source generator is the difference between the syntax layer and the semantic layer. These are two separate models that serve different purposes, and mixing them up is a common source of confusion and serious performance problems.
SyntaxTree: Structure and Text
The SyntaxTree and its nodes (SyntaxNode, SyntaxToken, SyntaxTrivia) represent pure structure -- the grammatical shape of your code without regard for what anything means. A ClassDeclarationSyntax tells you the class name as a string and what modifiers are present on it, but it cannot tell you what interfaces the class implements in a resolved sense, or what the base class's fully qualified name resolves to across the project.
Syntax is fast. It requires no symbol resolution, no type lookups, and no cross-file analysis. You can traverse a syntax tree without touching the broader compilation at all. This makes syntax analysis the right layer for your predicate function in the incremental generator pipeline -- your filter step must be as cheap as possible because it runs against every syntax change in every file the IDE processes.
SemanticModel: Meaning and Types
The SemanticModel operates on top of the syntax tree and adds type resolution. Through the semantic model, you can resolve a ClassDeclarationSyntax into an INamedTypeSymbol, ask what interfaces it implements, find the fully qualified name of a property's type, or determine whether an attribute is the exact one you are looking for by its metadata name rather than its short text name.
Semantic analysis is slower because it requires the full compilation context -- symbol tables, referenced assemblies, and type hierarchy information all have to be consulted. This is why you delay accessing the semantic model until the transform step of your incremental pipeline, and only for nodes that already passed your predicate filter.
The general rule: predicates use syntax, transforms use semantics.
Key Syntax Node Types
Roslyn's syntax model has hundreds of node types, but for most source generators you will work with a core set repeatedly. Getting familiar with these upfront makes your generator code much easier to read and write.
| Syntax Node Type | What It Represents | Key Properties / Common Use Case |
|---|---|---|
ClassDeclarationSyntax |
A class declaration |
Name token, base list, modifiers, member list; primary filter target for attribute-driven generators |
MethodDeclarationSyntax |
A method inside a type | Return type syntax, parameter list, modifiers, body; used for dispatch code and interceptors |
PropertyDeclarationSyntax |
A property declaration | Type syntax, name token, accessor list; common in serialization and mapping generators |
AttributeSyntax |
A single attribute on a declaration | Attribute name and argument list (syntax level only; use SemanticModel to resolve the full type) |
InterfaceDeclarationSyntax |
An interface declaration |
Modifiers, member list, base list |
RecordDeclarationSyntax |
A record or record struct declaration |
Name, modifiers, parameter list, member list |
StructDeclarationSyntax |
A struct declaration |
Modifiers, member list |
All of these syntax tree node types live in the Microsoft.CodeAnalysis.CSharp.Syntax namespace. For AttributeSyntax, the syntax tree gives you the attribute's text name and argument list -- resolving the actual fully qualified attribute type requires the semantic model.
Walking the Syntax Tree -- The Old Way
Before IIncrementalGenerator became the recommended approach, source generators used ISourceGenerator with an ISyntaxReceiver. The receiver's OnVisitSyntaxNode method was called for every node in the syntax tree of every file, and you would collect candidate nodes in a list for later processing. A related visitor-based approach used CSharpSyntaxWalker to walk the syntax tree -- you override VisitClassDeclaration and similar methods for the node types you care about.
Both approaches had serious performance problems. A typical large project has tens of millions of nodes in the syntax tree when you count expressions, tokens, and trivia. Even a lightweight if (node is ClassDeclarationSyntax) check on every node added measurable overhead to every compilation pass, with no caching and no incremental behavior.
If you are maintaining an older generator codebase, you will encounter these patterns. For any new development targeting .NET 6 and later, the incremental generator pipeline is the right choice -- it is faster, more IDE-friendly, and far better at avoiding redundant work during the tight edit-compile loops that happen while typing.
The Modern Way: SyntaxProvider in IIncrementalGenerator
IIncrementalGenerator exposes a SyntaxProvider on the IncrementalGeneratorInitializationContext -- this is where all modern roslyn syntax tree traversal lives in a c# source generator. The provider has two main entry points.
CreateSyntaxProvider
CreateSyntaxProvider accepts a predicate and a transform delegate. The predicate sees every syntax node -- but the incremental pipeline caches transform results and only re-runs them when the corresponding syntax actually changes. Use it when you are not filtering by a specific attribute -- for example, finding every partial class or every record that declares a specific member name at the syntax level.
var classProvider = context.SyntaxProvider
.CreateSyntaxProvider(
predicate: static (node, _) => node is ClassDeclarationSyntax cls
&& cls.Modifiers.Any(SyntaxKind.PartialKeyword),
transform: static (ctx, ct) =>
{
var classDecl = (ClassDeclarationSyntax)ctx.Node;
var model = ctx.SemanticModel;
var symbol = model.GetDeclaredSymbol(classDecl, ct) as INamedTypeSymbol;
return symbol;
})
.Where(static s => s is not null);
The predicate does a type check and a modifier scan -- both purely syntactic, both very fast. The transform then accesses the semantic model to get the full INamedTypeSymbol, passing the cancellation token so the pipeline can abort if the user types again.
ForAttributeWithMetadataName
For attribute-driven generators -- which represent the majority of practical source generators -- ForAttributeWithMetadataName is the preferred API. It was introduced specifically to address the performance problem of attribute filtering: if you use CreateSyntaxProvider and scan every node for a specific attribute by name, you end up doing expensive string comparisons on millions of nodes. ForAttributeWithMetadataName performs this matching internally, using an optimized code path that Roslyn itself understands.
Note:
ForAttributeWithMetadataNamerequires Roslyn 4.3.0 or later, which ships with the .NET 7 SDK. If your project targets .NET 6, update yourMicrosoft.CodeAnalysis.CSharppackage reference to4.3.0or later -- the API was backported as a NuGet package.
You pass it the fully qualified metadata name of the target attribute, a predicate to further narrow matching nodes, and a transform to produce your data model.
var provider = context.SyntaxProvider
.ForAttributeWithMetadataName(
"MyNamespace.GenerateToStringAttribute",
predicate: static (node, _) => node is ClassDeclarationSyntax,
transform: static (ctx, ct) =>
{
var classDecl = (ClassDeclarationSyntax)ctx.TargetNode;
var symbol = (INamedTypeSymbol)ctx.TargetSymbol;
return new ClassModel(
symbol.ContainingNamespace.ToDisplayString(),
symbol.Name,
GetProperties(symbol));
});
Notice that ctx.TargetSymbol in the transform context is already the resolved INamedTypeSymbol for the matched type -- you do not need to call GetDeclaredSymbol yourself. The context also provides ctx.Attributes, which gives you the matched AttributeData instances without additional lookups. These conveniences make ForAttributeWithMetadataName significantly cleaner to work with than the equivalent CreateSyntaxProvider setup.
Accessing the SemanticModel
The semantic model is your gateway to type information in a roslyn source generator, and it is always accessed inside the transform step -- never the predicate. Both CreateSyntaxProvider and ForAttributeWithMetadataName provide a SemanticModel in the transform context, scoped to the syntax tree of the file containing the target node.
The key method you will use most is GetDeclaredSymbol, which maps a declaration syntax node (such as ClassDeclarationSyntax) to its corresponding INamedTypeSymbol. For resolving type references that appear in expressions or type annotations rather than declarations, GetTypeInfo and GetSymbolInfo are the relevant methods.
Once you have an INamedTypeSymbol, you can interrogate it for everything a generator typically needs: its fully qualified name, its base type, the interfaces it implements, its containing namespace, its members, and its applied attributes. Extracting public properties is one of the most common operations:
static ImmutableArray<PropertyModel> GetProperties(INamedTypeSymbol symbol)
{
return symbol.GetMembers()
.OfType<IPropertySymbol>()
.Where(p => !p.IsIndexer && p.DeclaredAccessibility == Accessibility.Public)
.Select(p => new PropertyModel(p.Name, p.Type.ToDisplayString()))
.ToImmutableArray();
}
The ToDisplayString() call on the property type gives you a human-readable, fully qualified type name as a string -- exactly what you need to emit correct, compilable code in the generation step. For generic types it handles the angle bracket syntax correctly, and for nullable reference types it respects the nullability annotation.
Extracting Useful Information from Symbol Analysis
Getting the namespace from a symbol is a frequent operation that has one edge case worth handling explicitly -- types declared in the global namespace (no namespace block):
static string GetNamespace(INamedTypeSymbol symbol)
{
return symbol.ContainingNamespace.IsGlobalNamespace
? string.Empty
: symbol.ContainingNamespace.ToDisplayString();
}
For reading attribute constructor arguments and named properties, you work with AttributeData rather than AttributeSyntax. The AttributeData is obtained from the symbol and already has all arguments resolved through the semantic model:
static string? GetAttributeNamedStringArg(
INamedTypeSymbol symbol,
string attributeFullName,
string argName)
{
var attr = symbol.GetAttributes()
.FirstOrDefault(a =>
a.AttributeClass?.ToDisplayString() == attributeFullName);
if (attr is null)
{
return null;
}
var namedArg = attr.NamedArguments
.FirstOrDefault(kvp => kvp.Key == argName);
return namedArg.Value.Value as string;
}
AttributeData.ConstructorArguments gives you positional constructor arguments as TypedConstant values, while AttributeData.NamedArguments gives you the named property assignments. TypedConstant.Value gives you the actual runtime constant value (a string, int, bool, etc.) as a boxed object.
This level of symbol analysis is what enables generators for complex structural patterns. For instance, syntax tree analysis in a c# source generator can discover all factory-eligible types by examining their constructors and attributes -- closely aligned with how the factory method pattern identifies construction responsibilities. Reading every public property from a class symbol is precisely the mechanism that makes automated builder pattern generation work. Symbol analysis can also identify all types implementing a given interface, enabling generators to produce decorator pattern wrappers automatically -- a technique closely related to adding cross-cutting concerns with generated decorators.
Building Your Data Model from Syntax Analysis
The transform step in an incremental source generator has one job: convert a matched syntax node (with its semantic context) into a plain, self-contained data model that the generation step can consume. This separation is architecturally essential for the incremental engine to function correctly.
The pipeline caches transform results keyed on the input. When the syntax tree for a file has not changed, the cached result is reused and the generation step runs with cached data -- this is what makes incremental generators feel fast during active editing sessions.
Your data models should be plain records or value types that implement structural equality. They should contain only primitives, strings, or ImmutableArray<T> of other plain models. They should never contain SyntaxNode, ISymbol, SemanticModel, or Compilation references.
file sealed record ClassModel(
string Namespace,
string ClassName,
ImmutableArray<PropertyModel> Properties)
{
public virtual bool Equals(ClassModel? other) =>
other is not null
&& Namespace == other.Namespace
&& ClassName == other.ClassName
&& Properties.SequenceEqual(other.Properties);
public override int GetHashCode() =>
HashCode.Combine(Namespace, ClassName, Properties.Length);
}
file sealed record PropertyModel(string Name, string TypeName);
Using record gives structural equality for primitive and string properties, but properties typed as ImmutableArray<T> require a custom equality override -- see the ImmutableArray pitfall below. The file access modifier scopes these types to the generator file, preventing naming conflicts when multiple generators in the same project define records with the same name.
This data model pattern applies to plugin discovery frameworks too. As explored in the Needlr plugin architecture article, discovering registration types at compile time and emitting registration code follows exactly this flow: syntax filter, semantic transform, plain data model, code generation. Building generators that discover strategy pattern implementations via symbol analysis uses the same pipeline -- filter for types implementing the strategy interface, extract metadata into a plain model, and generate the dispatcher.
Common Pitfalls
Capturing Roslyn objects in your data model. This is the most common mistake in roslyn source generator work, and the hardest to diagnose. If your data model holds an INamedTypeSymbol or SyntaxNode reference, the incremental pipeline cannot perform value equality checks on those objects. The engine falls back to rerunning your transform on every keystroke, quietly eliminating all incremental benefits.
Comparing attribute names at the syntax level. When using CreateSyntaxProvider and inspecting attributes by text name, you have to handle the fact that [GenerateToString] and [GenerateToStringAttribute] are the same attribute but appear as different strings in the syntax tree. ForAttributeWithMetadataName handles this automatically.
Doing semantic work in the predicate. Predicates run for every syntax node change in every file. Any call to GetDeclaredSymbol, GetTypeInfo, or any semantic API inside the predicate is a severe performance regression. All semantic work belongs in the transform step.
Ignoring equality for ImmutableArray properties. ImmutableArray<T> does not implement structural equality by default when used in a record. The synthesized equality compares the array by reference, so the incremental cache sees every new array as a different value even if contents are identical. Use a custom equality override for array properties.
Not passing the CancellationToken. The ct parameter in transform delegates fires when the user types again before your generator finishes. Always forward it to APIs like GetDeclaredSymbol(node, ct). Ignoring it blocks the IDE longer than necessary on every keystroke.
Frequently Asked Questions
What is the difference between SyntaxTree and SemanticModel in Roslyn?
A SyntaxTree represents the pure grammatical structure of a source file -- it tells you what nodes exist and their positions, but nothing about types or resolved symbols. The SemanticModel builds on top of the syntax tree to provide type resolution: it can tell you the fully qualified name of a type, whether a method is async, or what interface a class implements. In a source generator, you use syntax trees for fast filtering and the semantic model for data extraction.
When should I use ForAttributeWithMetadataName vs CreateSyntaxProvider in a source generator?
Use ForAttributeWithMetadataName whenever you are filtering nodes by a specific attribute -- Roslyn optimizes this internally and handles attribute name aliasing automatically. Use CreateSyntaxProvider when your filter condition is based on syntactic structure alone, such as finding all partial classes or all records, without relying on a specific attribute.
Why can't I store an INamedTypeSymbol in my data model?
INamedTypeSymbol does not implement value equality compatible with the incremental pipeline's caching. If your data model holds a symbol reference, the pipeline cannot determine whether the cached output is still valid, defeating incremental behavior. Extract names, type strings, and member metadata into plain records before leaving the transform step.
How do I get the fully qualified namespace of a class in a source generator?
Call symbol.ContainingNamespace.ToDisplayString() on the INamedTypeSymbol. If the type is in the global namespace, ContainingNamespace.IsGlobalNamespace will be true -- handle this by treating the namespace as an empty string or omitting the namespace block from your generated code.
Can I traverse the syntax tree manually instead of using SyntaxProvider?
Yes -- SyntaxNode.DescendantNodes() and CSharpSyntaxWalker are useful inside the transform step for traversing the syntax tree -- for example, locating nodes within a method body or finding nested type declarations. For any pipeline work that depends on user-written source files, use SyntaxProvider -- it is the only way to gain incremental caching benefits. Manual top-level traversal bypasses the cache entirely.
How do I read constructor arguments from an attribute in a Roslyn source generator?
Access symbol.GetAttributes() to get the AttributeData collection. Each AttributeData has ConstructorArguments (an ImmutableArray<TypedConstant> for positional arguments) and NamedArguments (an ImmutableArray<KeyValuePair<string, TypedConstant>> for named property assignments). Call .Value on a TypedConstant to get the actual constant value.
What does the file modifier do in source generator data models?
The file access modifier (C# 11+) scopes a type to the file in which it is declared, making it invisible to the rest of the compilation. Using it on your internal record types prevents naming conflicts when multiple generators in the same project define records with the same name, and signals clearly that these types are implementation details of the generator.
Conclusion
Reading and traversing a Roslyn syntax tree in a C# source generator becomes straightforward once you understand the foundational design: syntax for structure, semantics for meaning. Keep your predicates fast and syntax-only. Perform semantic work in the transform step, after your filter has already narrowed the candidate set. Extract everything you need into plain, equatable data models before handing it to the generation step.
The modern IIncrementalGenerator pipeline -- particularly ForAttributeWithMetadataName -- gives you an efficient, caching-aware API that integrates cleanly with the IDE. Combined with a solid understanding of the key node types and symbol interfaces, you have everything you need to build generators that are both correct and fast.
Once the roslyn syntax tree traversal in your c# source generator is solid, the remaining challenge is code generation itself -- which is just disciplined string building on top of the data you have already extracted. The investment in understanding the syntax and semantic layers pays off across every generator you will ever write.

