Chunking Strategies for RAG with Semantic Kernel in C#: Fixed-Size, Sentence, and Semantic Chunking

Q: What's the ideal chunk size for RAG with Semantic Kernel?

There's no universal ideal, but I typically start with 150-250 words or 500-1000 characters. The sweet spot depends on your embedding model's capabilities and your retrieval precision needs. Smaller chunks (100-150 words) give more precise retrieval but might lose context. Larger chunks (300-500 words) preserve more context but reduce precision. Test with your actual documents and queries to find what works best. Most embedding models perform well with chunks under 512 tokens, which translates to roughly 350-400 words for English text.

Q: Should I use async/await when chunking documents in C#?

For chunking itself, usually not. The string manipulation operations are synchronous and fast. However, when implementing semantic chunking that requires embedding generation, you absolutely need [async/await in C#](https://www.devleader.ca/2024/02/27/async-await-in-c-3-beginner-tips-you-need-to-know) to handle the API calls efficiently. For high-volume scenarios where you're chunking hundreds of documents, consider using `Task.WhenAll` or `Parallel.ForEachAsync` to process multiple documents concurrently. The chunking operations themselves run synchronously, but you can parallelize across documents.

Q: How do I handle code snippets or structured data when chunking?

Code and structured data require special handling because splitting mid-function or mid-JSON object breaks meaning completely. I use a two-stage approach: first, identify code blocks or structured sections using delimiters (like markdown code fences or JSON braces), then apply chunking only to prose sections. For code-heavy documentation, consider treating each code block as its own atomic chunk that never gets split. You can also use language-specific parsers to chunk code at function or class boundaries rather than arbitrary character positions.

Q: Can I combine multiple chunking strategies?

Absolutely, and this often produces the best results. A common approach is to use recursive character splitting as your base strategy, then apply semantic analysis to merge or split chunks that fall above or below quality thresholds. Another pattern is to use different strategies for different document types -- paragraph chunking for articles, sentence-boundary for Q&A content, and semantic chunking for complex documents. The key is maintaining consistent metadata so your retrieval system can handle chunks from different strategies uniformly.

03/16/2026

Semantic Kernel C#.NET RAG chunking strategies text chunking retrieval augmented generation fixed-size chunking semantic chunking

When I first built my RAG pipeline with Semantic Kernel, I was disappointed with the retrieval quality. My embeddings were solid, my vector store was configured correctly, and my prompts were well-crafted. The problem? My chunking strategy was naive. I was splitting documents arbitrarily, cutting sentences in half, and losing critical context. Once I implemented proper chunking strategies for RAG with Semantic Kernel in C#, my retrieval accuracy jumped dramatically. Chunking is the often-overlooked bottleneck in RAG quality, and getting it right can make the difference between a mediocre system and one that truly understands your documents.

Why Chunking Matters for RAG Quality

Chunking is the process of breaking large documents into smaller, semantically meaningful pieces before embedding them in your vector store. When you're building RAG systems with Semantic Kernel in C#, you can't just throw entire documents at your embedding model. There are three fundamental constraints that make chunking essential for quality retrieval.

First, embedding models have token limits. OpenAI embedding models like text-embedding-3-small support up to 8,191 tokens per input, though chunking to 256-512 tokens often yields better retrieval quality as semantic coherence degrades in very large chunks. Local BERT-based embedding models typically cap at 512 tokens per input. Even if you could embed an entire document, you'd lose granularity in your retrieval. When a user asks a specific question, you want to retrieve the most relevant paragraph or section, not the entire 50-page document. Second, embedding quality degrades as text length increases. Shorter, focused chunks produce embeddings that better capture the semantic meaning of that specific content. Third, there's the precision versus recall tradeoff in retrieval. Smaller chunks give you precise matches but might miss context, while larger chunks provide more context but reduce precision.

The chunking problem comes down to this tension: you need chunks small enough to be precise and fit within token limits, but large enough to preserve meaning and context. The strategy you choose determines whether your RAG pipeline returns exactly what users need or floods them with irrelevant results.

Strategy 1: Fixed-Size Chunking

Fixed-size chunking is the simplest approach to breaking up documents. You split text based on a fixed number of tokens, characters, or words. This method is predictable, easy to implement, and gives you consistent chunk sizes across your entire corpus. I start here when prototyping because it requires minimal code and helps establish baseline performance.

The main advantage of fixed-size chunking is its simplicity and speed. You know exactly how many chunks you'll generate from a document, and the computational overhead is minimal. It works well for documents with uniform structure where semantic boundaries don't matter much. However, the drawbacks are significant. Fixed-size chunking will cheerfully split sentences mid-word, break up critical context, and create chunks that make no semantic sense. A chunk might start with "...and therefore the conclusion is" with no idea what "therefore" refers to.

The overlap parameter helps mitigate these issues by including the last N words or characters from the previous chunk in the next chunk. This provides some continuity, but it's a band-aid solution. Here's a complete C# implementation of fixed-size chunking with overlap:

public static class TextChunker
{
    public static IEnumerable<string> ChunkByWords(
        string text,
        int chunkSize = 200,
        int overlap = 20)
    {
        var words = text.Split(' ', StringSplitOptions.RemoveEmptyEntries);
        
        for (int i = 0; i < words.Length; i += chunkSize - overlap)
        {
            var chunkWords = words.Skip(i).Take(chunkSize);
            var chunk = string.Join(' ', chunkWords);
            
            if (!string.IsNullOrWhiteSpace(chunk))
                yield return chunk;
                
            if (i + chunkSize >= words.Length)
                break;
        }
    }
    
    public static IEnumerable<string> ChunkByCharacters(
        string text,
        int chunkSize = 1000,
        int overlap = 100)
    {
        for (int i = 0; i < text.Length; i += chunkSize - overlap)
        {
            var length = Math.Min(chunkSize, text.Length - i);
            yield return text.Substring(i, length);
            
            if (i + chunkSize >= text.Length)
                break;
        }
    }
}

This implementation uses yield return to generate chunks lazily, which is memory-efficient for large documents. The word-based version splits on whitespace and groups words, while the character-based version simply slices the string at fixed intervals. Both support overlap to preserve some context between chunks. For most RAG applications, I recommend starting with 150-250 words per chunk and 10-20% overlap.

Strategy 2: Sentence-Boundary Chunking

Sentence-boundary chunking respects the natural structure of language by only splitting text at sentence boundaries. Instead of cutting mid-sentence, you group complete sentences together until you reach your target chunk size. This preserves semantic units and dramatically improves the readability and usefulness of your chunks.

The key advantage here is that each chunk contains complete thoughts. When your RAG system retrieves a sentence-boundary chunk, users get coherent information they can actually understand. The chunks embed better because the text has natural semantic boundaries. The downside is that sentence detection isn't trivial. You need to handle abbreviations (Dr., Inc., etc.), decimal numbers, and various punctuation edge cases.

For most applications, a simple regex pattern works well enough. You can enhance this with libraries like NLP.NET if you need production-grade sentence detection. Here's a C# implementation using regex:

public static class SentenceChunker
{
    // Basic sentence splitter using regex
    private static readonly Regex SentencePattern = new(
        @"(?<=[.!?])s+(?=[A-Z])",
        RegexOptions.Compiled);
    
    public static IEnumerable<string> ChunkBySentences(
        string text,
        int maxSentencesPerChunk = 5,
        int overlapSentences = 1)
    {
        var sentences = SentencePattern.Split(text)
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .ToArray();
        
        for (int i = 0; i < sentences.Length; i += maxSentencesPerChunk - overlapSentences)
        {
            var chunk = string.Join(' ', sentences.Skip(i).Take(maxSentencesPerChunk));
            if (!string.IsNullOrWhiteSpace(chunk))
                yield return chunk;
            
            if (i + maxSentencesPerChunk >= sentences.Length)
                break;
        }
    }
}

The regex pattern (?<=[.!?])s+(?=[A-Z]) uses lookbehind and lookahead to split on whitespace that follows sentence-ending punctuation and precedes a capital letter. This handles most common cases. The overlap parameter lets you include the last sentence or two from the previous chunk, maintaining context flow. I typically use 3-7 sentences per chunk depending on sentence length in my documents.

Strategy 3: Paragraph Chunking

Paragraph chunking splits documents on paragraph boundaries, typically identified by double newlines or other structural markers in your text format. This strategy works exceptionally well for structured documents like articles, documentation, or books where authors have already organized content into logical paragraphs.

Each paragraph usually covers a single idea or subtopic, making this a naturally semantic chunking approach. The chunks are self-contained and meaningful without requiring sophisticated natural language processing. For markdown documents, blog posts, or technical documentation, paragraph chunking often gives you the best balance of simplicity and quality. The limitation is that paragraph length varies wildly. One paragraph might be two sentences while another spans half a page, which can cause issues with token limits.

Here's a C# implementation that handles paragraph chunking with size constraints:

public static class ParagraphChunker
{
    public static IEnumerable<string> ChunkByParagraphs(
        string text,
        int maxWordsPerChunk = 300)
    {
        var paragraphs = text.Split(
            new[] { "

", "

" },
            StringSplitOptions.RemoveEmptyEntries)
            .Select(p => p.Trim())
            .Where(p => !string.IsNullOrWhiteSpace(p))
            .ToList();
        
        var currentChunk = new List<string>();
        var currentWordCount = 0;
        
        foreach (var paragraph in paragraphs)
        {
            var paragraphWords = paragraph.Split(' ', 
                StringSplitOptions.RemoveEmptyEntries).Length;
            
            // If adding this paragraph exceeds max, yield current chunk
            if (currentWordCount + paragraphWords > maxWordsPerChunk && 
                currentChunk.Any())
            {
                yield return string.Join("

", currentChunk);
                currentChunk.Clear();
                currentWordCount = 0;
            }
            
            currentChunk.Add(paragraph);
            currentWordCount += paragraphWords;
        }
        
        // Yield final chunk
        if (currentChunk.Any())
        {
            yield return string.Join("

", currentChunk);
        }
    }
}

This implementation groups paragraphs together until hitting the word limit, then starts a new chunk. It preserves paragraph structure by joining with double newlines. If a single paragraph exceeds the max word count, it becomes its own chunk. For most markdown or plaintext documents, this approach produces highly usable chunks with minimal processing.

Strategy 4: Recursive Character Text Splitting

Recursive character text splitting is inspired by LangChain's RecursiveCharacterTextSplitter. Instead of splitting on a single delimiter, it uses a hierarchy of separators. It tries to split on double newlines first (paragraphs), then single newlines (lines), then sentences, then words, and finally characters. This creates the largest possible semantically meaningful chunks while respecting your size limits.

The brilliance of this approach is that it naturally adapts to your document structure. For well-structured documents, you get paragraph-level chunks. For dense text without clear paragraph breaks, it falls back to sentence or word boundaries. It's more sophisticated than fixed-size chunking but doesn't require complex NLP processing like semantic chunking.

Here's a C# implementation of recursive character splitting:

public static class RecursiveTextSplitter
{
    private static readonly string[] DefaultSeparators = 
    {
        "

", "
", ". ", "! ", "? ", "; ", ", ", " ", ""
    };
    
    public static IEnumerable<string> SplitRecursively(
        string text,
        int maxChunkSize = 1000,
        int overlap = 100,
        string[]? separators = null)
    {
        separators ??= DefaultSeparators;
        return SplitRecursivelyInternal(text, maxChunkSize, overlap, separators, 0);
    }
    
    private static IEnumerable<string> SplitRecursivelyInternal(
        string text,
        int maxChunkSize,
        int overlap,
        string[] separators,
        int separatorIndex)
    {
        if (string.IsNullOrWhiteSpace(text))
            yield break;
            
        if (text.Length <= maxChunkSize)
        {
            yield return text;
            yield break;
        }
        
        if (separatorIndex >= separators.Length)
        {
            // No more separators, chunk by characters
            for (int i = 0; i < text.Length; i += maxChunkSize - overlap)
            {
                var length = Math.Min(maxChunkSize, text.Length - i);
                yield return text.Substring(i, length);
                
                if (i + maxChunkSize >= text.Length)
                    break;
            }
            yield break;
        }
        
        var separator = separators[separatorIndex];
        var splits = text.Split(new[] { separator }, StringSplitOptions.None);
        
        var currentChunk = "";
        foreach (var split in splits)
        {
            var potentialChunk = string.IsNullOrEmpty(currentChunk)
                ? split
                : currentChunk + separator + split;
            
            if (potentialChunk.Length <= maxChunkSize)
            {
                currentChunk = potentialChunk;
            }
            else
            {
                if (!string.IsNullOrEmpty(currentChunk))
                {
                    yield return currentChunk;
                    
                    // Start new chunk with overlap
                    var overlapText = currentChunk.Length > overlap
                        ? currentChunk.Substring(currentChunk.Length - overlap)
                        : currentChunk;
                    currentChunk = overlapText + separator + split;
                }
                else
                {
                    // Split is too large, recurse with next separator
                    foreach (var chunk in SplitRecursivelyInternal(
                        split, maxChunkSize, overlap, separators, separatorIndex + 1))
                    {
                        yield return chunk;
                    }
                }
            }
        }
        
        if (!string.IsNullOrEmpty(currentChunk))
        {
            yield return currentChunk;
        }
    }
}

This recursive implementation tries each separator in sequence, falling back to the next one when text segments are still too large. It maintains overlap between chunks and handles edge cases like splits larger than the chunk size. For general-purpose document chunking in RAG pipelines with Semantic Kernel, this is my go-to strategy.

Strategy 5: Semantic Chunking

Semantic chunking is the most sophisticated approach, using embedding similarity to identify natural topic boundaries in your text. Instead of relying on punctuation or whitespace, it analyzes the semantic meaning of sentences and groups together those that discuss the same topic. When the embedding similarity between consecutive sentences drops below a threshold, that's a chunk boundary.

This produces the highest-quality chunks because they're organized around actual topics and concepts rather than arbitrary size limits. The chunks feel natural when you read them, and they embed exceptionally well because each chunk focuses on a coherent topic. The overhead is significant, though. You need to generate embeddings for every sentence, compute similarities, and identify boundaries. For large documents, this can be expensive in terms of API calls and processing time.

Here's a C# implementation using Semantic Kernel's text embedding capabilities:

#pragma warning disable SKEXP0001
using Microsoft.SemanticKernel.Embeddings;

public class SemanticChunker
{
    private readonly ITextEmbeddingGenerationService _embeddingService;
    private readonly double _similarityThreshold;
    
    public SemanticChunker(
        ITextEmbeddingGenerationService embeddingService,
        double similarityThreshold = 0.75)
    {
        _embeddingService = embeddingService;
        _similarityThreshold = similarityThreshold;
    }
    
    public async Task<IEnumerable<string>> ChunkSemanticallAsync(
        string text,
        CancellationToken cancellationToken = default)
    {
        var sentences = SplitIntoSentences(text);
        if (sentences.Count <= 1)
            return sentences;
        
        // Generate embeddings for all sentences
        var embeddings = new List<ReadOnlyMemory<float>>();
        foreach (var sentence in sentences)
        {
            var embedding = await _embeddingService
                .GenerateEmbeddingAsync(sentence, cancellationToken: cancellationToken);
            embeddings.Add(embedding);
        }
        
        // Find chunk boundaries based on similarity drops
        var chunks = new List<string>();
        var currentChunk = new List<string> { sentences[0] };
        
        for (int i = 1; i < sentences.Count; i++)
        {
            var similarity = CosineSimilarity(
                embeddings[i - 1].Span,
                embeddings[i].Span);
            
            if (similarity < _similarityThreshold)
            {
                // Similarity drop detected, start new chunk
                chunks.Add(string.Join(' ', currentChunk));
                currentChunk.Clear();
            }
            
            currentChunk.Add(sentences[i]);
        }
        
        // Add final chunk
        if (currentChunk.Any())
        {
            chunks.Add(string.Join(' ', currentChunk));
        }
        
        return chunks;
    }
    
    private static List<string> SplitIntoSentences(string text)
    {
        var pattern = new Regex(@"(?<=[.!?])s+(?=[A-Z])", RegexOptions.Compiled);
        return pattern.Split(text)
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .ToList();
    }
    
    private static double CosineSimilarity(
        ReadOnlySpan<float> vector1,
        ReadOnlySpan<float> vector2)
    {
        double dotProduct = 0;
        double magnitude1 = 0;
        double magnitude2 = 0;
        
        for (int i = 0; i < vector1.Length; i++)
        {
            dotProduct += vector1[i] * vector2[i];
            magnitude1 += vector1[i] * vector1[i];
            magnitude2 += vector2[i] * vector2[i];
        }
        
        return dotProduct / (Math.Sqrt(magnitude1) * Math.Sqrt(magnitude2));
    }
}
#pragma warning restore SKEXP0001

This implementation splits text into sentences, generates embeddings for each sentence using Semantic Kernel's embedding service, and identifies chunk boundaries where cosine similarity drops below the threshold. A threshold of 0.75-0.80 works well for most content. When similarity is high (above threshold), sentences are discussing the same topic and belong in the same chunk. When it drops, there's a topic shift.

Semantic chunking is worth the overhead when retrieval quality is critical and you have complex, topic-dense documents. For straightforward documents or high-volume scenarios, the simpler strategies often provide better cost-performance tradeoffs.

Adding Overlap for Context Continuity

Overlap is the secret weapon that makes chunking strategies work better in practice. By including the last few sentences or words from the previous chunk at the start of the next chunk, you create context continuity. When users ask questions that span chunk boundaries, overlap ensures the relevant information appears in at least one chunk.

Without overlap, important context can fall through the cracks. Imagine a chunk ending with "This approach has three advantages:" and the next chunk listing those advantages. Without overlap, you've split a cohesive thought across two chunks, and retrieval might miss the connection. With overlap, the "three advantages" sentence appears in both chunks, preserving the context.

Implementing overlap in C# is straightforward. You track the last N words or sentences and prepend them to the next chunk. Here's an example adding overlap to sentence-based chunking:

public static IEnumerable<string> ChunkBySentencesWithOverlap(
    string text,
    int maxSentencesPerChunk = 5,
    int overlapSentences = 1)
{
    var sentences = SplitIntoSentences(text);
    var overlapBuffer = new Queue<string>();
    
    for (int i = 0; i < sentences.Count; i += maxSentencesPerChunk)
    {
        var chunkSentences = new List<string>();
        
        // Add overlap from previous chunk
        chunkSentences.AddRange(overlapBuffer);
        
        // Add new sentences
        var newSentences = sentences
            .Skip(i)
            .Take(maxSentencesPerChunk)
            .ToList();
        chunkSentences.AddRange(newSentences);
        
        if (chunkSentences.Any())
        {
            yield return string.Join(' ', chunkSentences);
        }
        
        // Update overlap buffer for next chunk
        overlapBuffer.Clear();
        foreach (var sentence in newSentences.TakeLast(overlapSentences))
        {
            overlapBuffer.Enqueue(sentence);
        }
    }
}

private static List<string> SplitIntoSentences(string text)
{
    var pattern = new Regex(@"(?<=[.!?])s+(?=[A-Z])", RegexOptions.Compiled);
    return pattern.Split(text)
        .Where(s => !string.IsNullOrWhiteSpace(s))
        .ToList();
}

The overlap buffer is a queue that holds sentences from the previous chunk. Each new chunk starts with these overlapping sentences, then adds new content. For most applications, I recommend 10-20% overlap. For 200-word chunks, that's 20-40 words of overlap. For sentence-based chunking, 1-2 sentences usually works well. More overlap increases redundancy and storage costs but improves retrieval recall.

Chunking with Metadata

Adding metadata to chunks is a game-changer for retrieval quality in production RAG systems. Instead of just storing the chunk text, you attach information like source document ID, page number, section title, or creation date. When users query your system, you can filter results by this metadata, dramatically improving relevance.

Metadata helps in several ways. First, you can implement multi-stage retrieval where you first filter by metadata (e.g., "only search documents from 2024") then perform semantic search within that subset. Second, metadata provides context to users about where information came from. Third, you can handle document updates by identifying and replacing chunks from specific source documents.

Here's a C# implementation that creates chunks with rich metadata:

public record TextChunk(
    string Content,
    string SourceDocumentId,
    int ChunkIndex,
    string? SectionTitle = null,
    int? PageNumber = null,
    DateTime? CreatedDate = null,
    Dictionary<string, string>? CustomMetadata = null);

public static class ChunkerWithMetadata
{
    public static IEnumerable<TextChunk> ChunkDocumentWithMetadata(
        string documentId,
        string fullText,
        int chunkSize = 200,
        int overlap = 20,
        Dictionary<string, string>? customMetadata = null)
    {
        var chunks = TextChunker.ChunkByWords(fullText, chunkSize, overlap).ToList();
        
        return chunks.Select((chunk, index) => new TextChunk(
            Content: chunk,
            SourceDocumentId: documentId,
            ChunkIndex: index,
            SectionTitle: ExtractSectionTitle(fullText, chunk),
            CreatedDate: DateTime.UtcNow,
            CustomMetadata: customMetadata));
    }
    
    private static string? ExtractSectionTitle(string fullText, string chunk)
    {
        // Find the last markdown header before this chunk
        var chunkStart = fullText.IndexOf(chunk, StringComparison.Ordinal);
        if (chunkStart == -1) return null;
        
        var textBefore = fullText.Substring(0, chunkStart);
        var lines = textBefore.Split('
');
        
        // Look backwards for a markdown header
        for (int i = lines.Length - 1; i >= 0; i--)
        {
            if (lines[i].TrimStart().StartsWith('#'))
            {
                return lines[i].TrimStart('#').Trim();
            }
        }
        
        return null;
    }
}

This implementation wraps chunks in a record that includes source tracking and metadata fields. The ExtractSectionTitle method looks backwards from the chunk position to find the most recent markdown header, automatically tagging each chunk with its section. When storing these chunks in your Semantic Kernel vector store, the metadata becomes searchable and filterable, making your RAG system much more powerful.

Choosing the Right Strategy

Selecting the right chunking strategy depends on your document types, retrieval requirements, and performance constraints. There's no universal best approach, but there are clear patterns for what works well in different scenarios.

For technical documentation with clear structure, paragraph or recursive character splitting works best. The documents already have logical organization, so respect those boundaries. For conversational text, chat logs, or unstructured data, sentence-boundary chunking preserves meaning without relying on paragraph structure. For simple prototypes or high-volume scenarios where processing speed matters, fixed-size chunking with overlap gets you started quickly.

Semantic chunking is worth the overhead for high-value documents where retrieval quality is critical -- think legal contracts, medical records, or research papers. The API costs and processing time pay off in dramatically better retrieval accuracy. For general web content, articles, or blog posts (like the content in my complete Semantic Kernel guide), recursive character splitting or paragraph chunking typically provides the best balance.

Here's a decision matrix to guide your choice:

Strategy	Best For	Typical Chunk Size	Overlap Recommendation
Fixed-Size	Prototypes, uniform text	150-250 words	10-20%
Sentence-Boundary	Conversational text, Q&A	3-7 sentences	1-2 sentences
Paragraph	Blog posts, articles, docs	200-400 words	20-50 words
Recursive Character	General purpose, mixed content	500-1500 characters	100-200 characters
Semantic	High-value complex documents	Variable (topic-based)	Not applicable

In my experience, recursive character splitting is the best starting point for most applications. It's sophisticated enough to produce quality chunks without the complexity of semantic chunking. You can always refine your strategy based on retrieval performance metrics.

FAQ

What's the ideal chunk size for RAG with Semantic Kernel?

There's no universal ideal, but I typically start with 150-250 words or 500-1000 characters. The sweet spot depends on your embedding model's capabilities and your retrieval precision needs. Smaller chunks (100-150 words) give more precise retrieval but might lose context. Larger chunks (300-500 words) preserve more context but reduce precision. Test with your actual documents and queries to find what works best. Most embedding models perform well with chunks under 512 tokens, which translates to roughly 350-400 words for English text.

Should I use async/await when chunking documents in C#?

For chunking itself, usually not. The string manipulation operations are synchronous and fast. However, when implementing semantic chunking that requires embedding generation, you absolutely need async/await in C# to handle the API calls efficiently. For high-volume scenarios where you're chunking hundreds of documents, consider using Task.WhenAll or Parallel.ForEachAsync to process multiple documents concurrently. The chunking operations themselves run synchronously, but you can parallelize across documents.

How do I handle code snippets or structured data when chunking?

Code and structured data require special handling because splitting mid-function or mid-JSON object breaks meaning completely. I use a two-stage approach: first, identify code blocks or structured sections using delimiters (like markdown code fences or JSON braces), then apply chunking only to prose sections. For code-heavy documentation, consider treating each code block as its own atomic chunk that never gets split. You can also use language-specific parsers to chunk code at function or class boundaries rather than arbitrary character positions.

Can I combine multiple chunking strategies?

Absolutely, and this often produces the best results. A common approach is to use recursive character splitting as your base strategy, then apply semantic analysis to merge or split chunks that fall above or below quality thresholds. Another pattern is to use different strategies for different document types -- paragraph chunking for articles, sentence-boundary for Q&A content, and semantic chunking for complex documents. The key is maintaining consistent metadata so your retrieval system can handle chunks from different strategies uniformly.

Conclusion

Chunking strategies for RAG with Semantic Kernel in C# are the foundation of high-quality retrieval systems. The right strategy transforms your RAG pipeline from a mediocre document search into a system that truly understands and retrieves the most relevant information. I've walked you through five core strategies -- fixed-size, sentence-boundary, paragraph, recursive character, and semantic chunking -- each with complete C# implementations and clear guidance on when to use them.

Start with recursive character splitting for general-purpose applications, add 10-20% overlap for context continuity, and enrich your chunks with metadata for powerful filtering capabilities. As your requirements grow, you can evolve toward semantic chunking for high-value documents or combine multiple strategies for different content types. The key insight is that chunking isn't a one-time decision. It's an ongoing optimization process that directly impacts your retrieval quality, and getting it right early saves you significant rework later.

The code examples I've provided are production-ready starting points. Test them with your actual documents, measure retrieval quality, and iterate. Chunking might seem like a technical implementation detail, but it's one of the most impactful decisions you'll make in your RAG architecture.

Build a Document Q&A App with RAG and Semantic Kernel in C#

Build a document Q&A app with RAG and Semantic Kernel in C#. Covers ITextEmbeddingGenerationService, InMemoryVectorStore, and vector similarity search.

Semantic Kernel in C#: Complete AI Orchestration Guide

Master Semantic Kernel in C# with this complete guide. Learn plugins, agents, RAG, and vector stores to build production AI applications with .NET.

RAG with Semantic Kernel in C#: Complete Guide to Retrieval-Augmented Generation

Master RAG with Semantic Kernel in C# using vector stores, embeddings, and InMemoryVectorStore. Complete guide with working .NET code examples.

Table of Contents