RAG with Semantic Kernel in C#: Complete Guide to Retrieval-Augmented Generation

03/01/2026

Semantic Kernel C#.NET RAG retrieval augmented generation vector store embeddings Azure AI Search Qdrant InMemoryVectorStore

Large language models are incredibly powerful, but they have a fundamental limitation -- they only know what they were trained on. When you need an LLM to answer questions about your proprietary data, recent events, or domain-specific knowledge, you run into the hallucination problem. The model will confidently make up answers rather than admit it doesn't know. This is where RAG with Semantic Kernel in C# implementations come in. Retrieval-Augmented Generation addresses this by fetching relevant context from your data before generating responses, grounding the AI's answers in real information rather than letting it fabricate details.

I've been working with Semantic Kernel for months now, and I can tell you that implementing RAG with Semantic Kernel in C# has become remarkably straightforward. The framework provides excellent abstractions for vector stores, embeddings, and semantic search that make building production-ready RAG systems accessible to any .NET developer. In this complete guide, I'll walk you through everything you need to know about RAG with Semantic Kernel in C# development, from the conceptual foundations to working code examples you can run immediately.

What Is RAG and Why Does It Matter?

Retrieval-Augmented Generation is a pattern that enhances LLM responses by retrieving relevant information from external sources before generating an answer. Instead of relying solely on the model's training data, RAG systems first search for pertinent context, then include that context in the prompt to the LLM. This can significantly improve accuracy and reduce hallucination risk -- though hallucinations are not fully eliminated, as models may still misinterpret or ignore retrieved content (Gao et al., 2023 RAG survey).

The RAG pattern works in several steps. First, your knowledge base is converted into embeddings and stored in a vector database. When a user asks a question, that question is also converted into an embedding. The system then performs a similarity search to find the most relevant documents or chunks. Those retrieved documents are injected into the prompt as context, and finally the LLM generates a response grounded in that retrieved information. This process is called "grounding" because the AI's response is anchored to real data rather than floating in the probabilistic space of its training.

The benefits are substantial. RAG systems can work with up-to-date information without retraining the model. They can cite sources and provide transparency about where information comes from. They can significantly reduce hallucination risk -- not eliminate it, but meaningfully lower it -- because the model has concrete context to work with rather than fabricating from training patterns. And they let you build AI applications that understand your specific domain knowledge without the cost and complexity of fine-tuning models. For .NET developers, Semantic Kernel makes implementing RAG more accessible with its clean abstractions and wide range of vector store connectors.

Vector Stores in Semantic Kernel for RAG

Note: Vector Store functionality in Semantic Kernel is currently in preview. APIs may change in future releases.

A vector store is a specialized database designed to efficiently store and search high-dimensional vectors.When we talk about RAG with Semantic Kernel in C#, we're essentially talking about storing document embeddings in a vector store and then querying that store to find semantically similar content. The magic of vector stores is their ability to perform similarity searches -- you provide a query embedding, and the store returns the most similar vectors based on distance metrics like cosine similarity.

Semantic Kernel provides the IVectorStore abstraction that gives you a consistent API regardless of which vector database you're using behind the scenes. This is powerful because you can develop locally with an in-memory store, then switch to a production-grade solution like Azure AI Search or Qdrant by changing just a few lines of configuration code. The abstraction handles the complexity of different vector database APIs and gives you a unified .NET experience.

The framework ships with connectors for a wide range of vector stores. The InMemoryVectorStore is perfect for development and testing. Azure AI Search provides enterprise-grade vector search integrated with Azure's ecosystem. Qdrant is an open-source option that's popular for self-hosted scenarios. You'll also find support for Redis, Pinecone, Weaviate, and Chroma. Each has different strengths, but the beauty of Semantic Kernel's abstraction is that your core RAG code remains the same regardless of which you choose. You're investing in learning the pattern, not vendor-specific APIs.

Text Embeddings: Converting Text to Vectors

Before we can store anything in a vector store, we need embeddings. An embedding is a numerical representation of text as a high-dimensional vector, typically containing hundreds or thousands of floating-point numbers. These vectors capture semantic meaning -- texts with similar meanings will have similar vectors. This is what enables semantic search, where "automobile" and "car" are understood to be related even though they share no common letters.

Semantic Kernel provides the ITextEmbeddingGenerationService interface for generating embeddings. You register an embedding service when building your kernel, just like you would register a chat completion service. OpenAI's text-embedding-ada-002 model is a popular choice that produces 1536-dimensional embeddings. Azure OpenAI offers the same models through their service. The framework abstracts these providers so switching between them requires minimal code changes.

Here's how you generate embeddings with Semantic Kernel in C#:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Embeddings;

var builder = Kernel.CreateBuilder();
builder.AddOpenAITextEmbeddingGeneration(
    "text-embedding-ada-002",
    Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
var kernel = builder.Build();

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
var embeddings = await embeddingService.GenerateEmbeddingsAsync(
    ["Semantic Kernel simplifies AI development", "RAG improves LLM accuracy"]);

Console.WriteLine($"Generated {embeddings.Count} embeddings, each with {embeddings[0].Length} dimensions");

This code registers the OpenAI embedding service, then generates embeddings for two text strings. The GenerateEmbeddingsAsync method accepts a collection of strings and returns a collection of embeddings. Each embedding is a ReadOnlyMemory<float> representing the vector. You can also use GenerateEmbeddingAsync (singular) if you're embedding just one text at a time. These embeddings are what you'll store in your vector database to enable semantic search later.

InMemoryVectorStore: RAG Without External Infrastructure

The InMemoryVectorStore is my go-to for learning RAG Semantic Kernel C# patterns and for writing tests. It implements the full IVectorStore interface but keeps everything in memory, so there's no external infrastructure to set up. This is perfect when you're developing locally or writing integration tests. You get the full RAG experience without Docker containers, connection strings, or cloud accounts.

To use the in-memory vector store, you need to define a record type that represents your stored documents. This record must have a property decorated with [VectorStoreKey] for the unique identifier, [VectorStoreData] for searchable text content, and [VectorStoreVector] for the embedding. The vector attribute specifies the dimensionality, which must match your embedding model's output dimension.

Here's a complete end-to-end RAG example using the in-memory vector store:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Data;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.Extensions.VectorData;

// Define a record type for the vector store
public record DocumentChunk(
    [property: VectorStoreKey] string Id,
    [property: VectorStoreData] string Content,
    [property: VectorStoreVector(1536)] ReadOnlyMemory<float> Embedding);

var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion("gpt-4o", Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
builder.AddOpenAITextEmbeddingGeneration("text-embedding-ada-002", Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
builder.Services.AddSingleton<IVectorStore, InMemoryVectorStore>();
var kernel = builder.Build();

// Populate the vector store
var vectorStore = kernel.Services.GetRequiredService<IVectorStore>();
var collection = vectorStore.GetCollection<string, DocumentChunk>("docs");
await collection.CreateCollectionIfNotExistsAsync();

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
var docs = new[]
{
    "Semantic Kernel is a lightweight SDK for AI orchestration.",
    "RAG stands for Retrieval-Augmented Generation.",
    "Vector stores enable semantic similarity search."
};

foreach (var (doc, i) in docs.Select((d, i) => (d, i)))
{
    var embedding = await embeddingService.GenerateEmbeddingAsync(doc);
    await collection.UpsertAsync(new DocumentChunk($"doc-{i}", doc, embedding));
}

// Query the vector store
var queryEmbedding = await embeddingService.GenerateEmbeddingAsync("What is RAG?");
var results = await collection.SearchAsync(queryEmbedding, top: 2);

await foreach (var result in results.Results)
{
    Console.WriteLine($"Found: {result.Record.Content} (score: {result.Score:F3})");
}

This example sets up the complete RAG pipeline. We register chat completion and embedding services, then add the in-memory vector store. We create a collection, populate it with three documents (each embedded and stored), then perform a semantic search for "What is RAG?" The SearchAsync method finds the most similar documents based on embedding similarity. The results include both the document content and a similarity score. In a production scenario, you'd take these retrieved documents and include them in your prompt to the LLM, but this demonstrates the core retrieval mechanism.

Azure AI Search Integration

When you're ready for production, Azure AI Search provides a robust, scalable vector store with enterprise features. It combines traditional full-text search with vector search capabilities, letting you leverage both keyword matching and semantic similarity. Azure AI Search handles indexing, scaling, and availability, so you can focus on your application logic rather than database operations.

Integrating Azure AI Search with Semantic Kernel requires the Microsoft.SemanticKernel.Connectors.AzureAISearch NuGet package. You'll need an Azure AI Search service endpoint and API key, which you can obtain from the Azure portal. The connector handles all the complexity of the Azure AI Search REST API, including index creation, document ingestion, and vector queries.

The setup pattern is similar to the in-memory store but with Azure-specific configuration:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.AzureAISearch;

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    "your-deployment-name",
    "your-endpoint",
    "your-api-key");
builder.AddAzureOpenAITextEmbeddingGeneration(
    "your-embedding-deployment",
    "your-endpoint",
    "your-api-key");

builder.Services.AddAzureAISearchVectorStore(
    "your-search-endpoint",
    "your-search-api-key");

var kernel = builder.Build();

Azure AI Search automatically creates indexes with the appropriate schema based on your record type attributes. You can customize index behavior with additional attributes that control things like filtering, sorting, and faceting. The vector search uses HNSW (Hierarchical Navigable Small World) graphs for efficient approximate nearest neighbor search, which scales well to millions of documents. When you need features like geo-spatial filtering, multi-language support, or integration with other Azure services, Azure AI Search is an excellent choice for RAG Semantic Kernel C# applications.

Qdrant Vector Store

Qdrant is an open-source vector database that's gained significant traction for RAG applications. It's written in Rust for performance and offers a clean REST API plus client libraries for multiple languages. For .NET developers, the advantage of Qdrant is that you can run it locally in Docker during development, then deploy the same solution to production using Qdrant Cloud or self-hosted instances.

Getting started with Qdrant locally is straightforward with Docker:

docker run -p 6333:6333 qdrant/qdrant

This starts Qdrant on port 6333 with its web UI available for inspecting collections and vectors. For Semantic Kernel integration, you'll need the Microsoft.SemanticKernel.Connectors.Qdrant NuGet package. The connector provides the same IVectorStore interface, so your RAG code remains consistent.

Here's how you configure Qdrant with Semantic Kernel:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.Qdrant;

var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion("gpt-4o", Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
builder.AddOpenAITextEmbeddingGeneration("text-embedding-ada-002", Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);

builder.Services.AddQdrantVectorStore("http://localhost:6333");

var kernel = builder.Build();

Once configured, you use the vector store exactly as you would with the in-memory version. The abstraction handles all the HTTP communication with Qdrant, payload serialization, and vector search operations. Qdrant excels at handling large-scale vector collections with features like payload filtering (searching within subsets of your data), multiple vector fields per document, and advanced distance metrics. If you need control over your infrastructure or want an open-source solution, Qdrant is an excellent choice for production RAG systems.

Building a RAG with Semantic Kernel in C# Pipeline: End to End

Now let's connect all the pieces into a complete RAG with Semantic Kernel in C# pipeline that actually answers questions using retrieved context. The full pipeline involves several steps: chunking your source documents, generating embeddings for each chunk, storing those embeddings in a vector store, then at query time retrieving relevant chunks and using them to augment the prompt to your LLM.

The key insight is that we're building a two-stage system. The first stage is the retrieval layer, which finds relevant information using semantic similarity. The second stage is the generation layer, which uses an LLM to synthesize an answer based on the retrieved context. By separating these concerns, we get the best of both worlds: the precision of semantic search combined with the natural language capabilities of large language models.

Here's a complete working example that demonstrates the full RAG pipeline:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Data;
using Microsoft.Extensions.VectorData;

public record KnowledgeChunk(
    [property: VectorStoreKey] string Id,
    [property: VectorStoreData] string Text,
    [property: VectorStoreVector(1536)] ReadOnlyMemory<float> Embedding);

// Build kernel with all services
var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion("gpt-4o", Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
builder.AddOpenAITextEmbeddingGeneration("text-embedding-ada-002", Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
builder.Services.AddSingleton<IVectorStore, InMemoryVectorStore>();
var kernel = builder.Build();

// Step 1: Populate knowledge base
var vectorStore = kernel.Services.GetRequiredService<IVectorStore>();
var collection = vectorStore.GetCollection<string, KnowledgeChunk>("knowledge");
await collection.CreateCollectionIfNotExistsAsync();

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
var knowledge = new[]
{
    "Semantic Kernel is an open-source SDK that lets you easily build AI agents and integrate AI into your existing applications.",
    "RAG improves LLM accuracy by retrieving relevant context from a knowledge base before generating responses.",
    "Vector databases store embeddings and enable semantic similarity search using distance metrics like cosine similarity.",
    "Text chunking is important because LLMs have token limits and smaller chunks often produce more precise retrieval results."
};

for (int i = 0; i < knowledge.Length; i++)
{
    var embedding = await embeddingService.GenerateEmbeddingAsync(knowledge[i]);
    await collection.UpsertAsync(new KnowledgeChunk($"chunk-{i}", knowledge[i], embedding));
}

// Step 2: Query pipeline - retrieve and generate
var userQuestion = "How does RAG improve accuracy?";
var questionEmbedding = await embeddingService.GenerateEmbeddingAsync(userQuestion);
var searchResults = await collection.SearchAsync(
    questionEmbedding,
    top: 2);

var retrievedContext = new List<string>();
await foreach (var result in searchResults.Results)
{
    retrievedContext.Add(result.Record.Text);
}

// Step 3: Augment prompt with retrieved context
var chatService = kernel.GetRequiredService<IChatCompletionService>();
var chatHistory = new ChatHistory();
chatHistory.AddSystemMessage("You are a helpful assistant. Answer the user's question based on the provided context.");
chatHistory.AddUserMessage($"Context:
{string.Join("

", retrievedContext)}

Question: {userQuestion}");

var response = await chatService.GetChatMessageContentAsync(chatHistory);
Console.WriteLine($"Answer: {response.Content}");

This example demonstrates the complete RAG workflow. We populate a knowledge base with four facts about AI development. When a user asks "How does RAG improve accuracy?", we embed their question, search for the two most similar chunks, then construct a prompt that includes both the retrieved context and the user's question. The LLM generates an answer grounded in the retrieved information. This pattern scales to production -- you'd simply swap the in-memory store for Azure AI Search or Qdrant and potentially add thousands or millions of knowledge chunks.

Chunking Strategy Basics

Chunking is the process of breaking large documents into smaller pieces before embedding them. This is crucial for RAG systems because most embedding models have token limits (typically 8,192 tokens for text-embedding-ada-002), and more importantly, smaller chunks often lead to more precise retrieval. When you search for information about a specific topic, you want to retrieve just the relevant paragraphs, not entire documents that might also contain unrelated content.

There are several chunking strategies to consider. Sentence-based chunking splits text at sentence boundaries, preserving grammatical units. Paragraph-based chunking respects document structure. Fixed-size chunking creates chunks of approximately N tokens or characters, often with overlap between chunks to preserve context across boundaries. The overlap is important because relevant information might span chunk boundaries -- if you chunk without overlap, you might split a key concept in half.

Here's a simple implementation of word-based chunking with overlap:

public static IEnumerable<string> ChunkText(string text, int maxChunkSize = 500, int overlap = 50)
{
    var words = text.Split(' ');
    var chunks = new List<string>();
    var current = new List<string>();
    var wordCount = 0;

    foreach (var word in words)
    {
        current.Add(word);
        wordCount++;

        if (wordCount >= maxChunkSize)
        {
            chunks.Add(string.Join(' ', current));
            // Apply overlap: keep last N words for context
            current = current.TakeLast(overlap).ToList();
            wordCount = current.Count;
        }
    }

    if (current.Count > 0)
        chunks.Add(string.Join(' ', current));

    return chunks;
}

This function splits text into chunks of approximately 500 words each with 50 words of overlap. Each chunk (except the first) begins with the last 50 words of the previous chunk, ensuring context continuity. In production systems, you might use more sophisticated chunking that respects markdown headers, code block boundaries, or uses semantic similarity to find natural breakpoints. Libraries like LangChain and semantic-text-splitter offer advanced chunking capabilities, but this simple approach works well for many scenarios.

Advanced Considerations for Production RAG

As you move from prototype to production RAG Semantic Kernel C# systems, several additional factors come into play. Query performance matters when building RAG with Semantic Kernel in C# applications -- you'll want to monitor embedding generation time and vector search latency. Most vector stores offer approximate nearest neighbor search with tunable accuracy/speed tradeoffs. Understanding these tradeoffs helps you optimize for your specific requirements.

Metadata filtering is another critical feature for production systems. Often you want to restrict searches to specific document types, date ranges, or user permissions. Most production vector stores support filtering on metadata fields alongside vector similarity. You'd add metadata properties to your record type and mark them with appropriate attributes. This lets you combine semantic search with traditional filtering for powerful query capabilities.

Hybrid search combines vector search with traditional keyword search. Sometimes users search for specific terms like product names or error codes where exact matching is more appropriate than semantic similarity. Azure AI Search excels at hybrid scenarios with its unified API for both search types. You can weight the importance of keyword vs semantic matching to tune results for your domain. The Semantic Kernel TextSearch plugin, which I've covered in my plugins guide, provides higher-level abstractions for these scenarios.

Monitoring and observability are essential. You should track metrics like retrieval precision (are the right chunks being found?), user satisfaction with generated answers, and system performance. Logging retrieved chunks alongside generated responses helps you debug issues where the LLM gives poor answers -- is it a retrieval problem or a generation problem? Async/await patterns are particularly important for keeping your RAG pipeline responsive under load.

This guide covered the fundamentals of RAG with Semantic Kernel in C#, but there's much more to explore. If you're new to Semantic Kernel, I recommend starting with my complete guide to Semantic Kernel, which covers the core concepts and architecture that underpin RAG implementations.

For working with AI tools more broadly, my guide on getting started with AI coding tools provides practical advice for integrating AI into your development workflow. And if you're concerned about AI reliability and hallucinations, check out my article on keeping AI from going off the rails, which discusses strategies for managing AI system behavior in production.

Frequently Asked Questions About RAG with Semantic Kernel in C#

What is RAG with Semantic Kernel in C#?

RAG (Retrieval-Augmented Generation) with Semantic Kernel in C# is a pattern that enhances LLM responses by retrieving relevant information from vector stores before generating answers. Semantic Kernel provides the abstractions and connectors to build RAG systems in .NET applications.

Which vector stores work with Semantic Kernel for RAG?

Semantic Kernel supports multiple vector stores including InMemoryVectorStore for development, Azure AI Search for production, Qdrant for open-source scenarios, plus Redis, Pinecone, Weaviate, and Chroma. The IVectorStore abstraction lets you switch between them with minimal code changes.

How do I generate embeddings in Semantic Kernel?

Use the ITextEmbeddingGenerationService interface registered through methods like AddOpenAITextEmbeddingGeneration(). The service converts text into high-dimensional vectors (embeddings) that enable semantic similarity search in your RAG pipeline.

What chunking strategy should I use for RAG?

Start with fixed-size chunking of 500-1000 words with 50-100 word overlap. This preserves context across boundaries. For production, consider semantic chunking that respects document structure like paragraphs, markdown headers, or code blocks.

Can I use RAG with Semantic Kernel in C# without cloud services?

Yes, the InMemoryVectorStore enables RAG with Semantic Kernel in C# without external infrastructure. Perfect for development and testing. You can also self-host Qdrant in Docker for production scenarios without cloud dependencies.

Conclusion and Next Steps

As you build your RAG with Semantic Kernel in C# systems, remember that the technology is rapidly evolving. New embedding models, vector stores, and chunking strategies emerge regularly. The abstractions provided by Semantic Kernel give you flexibility to adopt these improvements without rewriting your core application logic. Start simple with the in-memory vector store, validate your approach works, then graduate to production infrastructure as your requirements demand. RAG is a powerful pattern that's become essential for building AI applications that work with domain-specific knowledge, and Semantic Kernel makes it accessible to every .NET developer.

Build a Document Q&A App with RAG and Semantic Kernel in C#

Build a document Q&A app with RAG and Semantic Kernel in C#. Covers ITextEmbeddingGenerationService, InMemoryVectorStore, and vector similarity search.

Semantic Kernel Vector Store in C#: Azure AI Search, Qdrant, and Beyond

Master the Semantic Kernel vector store in C# with Azure AI Search, Qdrant, and InMemoryVectorStore for RAG and semantic search.

Semantic Kernel in C#: Complete AI Orchestration Guide

Master Semantic Kernel in C# with this complete guide. Learn plugins, agents, RAG, and vector stores to build production AI applications with .NET.

Table of Contents