Text Embeddings with Semantic Kernel in C#: A Practical Guide to ITextEmbeddingGenerationService

03/15/2026

Semantic Kernel C#.NET text embeddings ITextEmbeddingGenerationService OpenAI embeddings cosine similarity RAG semantic search

Text embeddings are numeric representations of text that capture semantic meaning, and if you're building AI applications with .NET, understanding text embeddings with Semantic Kernel in C# is absolutely essential. Whether you're implementing semantic search, building retrieval-augmented generation (RAG) systems, or creating recommendation engines, embeddings are the foundation that makes it all work. In this practical guide, I'll walk you through everything you need to know about working with text embeddings with Semantic Kernel in C# through the ITextEmbeddingGenerationService interface.

What Are Text Embeddings?

Text embeddings transform text into arrays of floating-point numbers called vectors. These vectors capture the semantic meaning of the text in a way that allows computers to understand similarity and relationships between different pieces of text. For example, the sentences "I love programming in C#" and "C# development is my favorite" would have embeddings that are mathematically close to each other, even though they use different words.

The magic of embeddings is that texts with similar meanings end up close together in vector space. This enables semantic search where you can find relevant documents based on meaning rather than just keyword matching. Instead of searching for exact word matches, you can find conceptually similar content. For instance, searching for "database optimization" might return results about "query performance tuning" because the embeddings recognize the semantic relationship.

Embeddings typically have hundreds or thousands of dimensions. OpenAI's text-embedding-3-small model produces 1,536-dimensional vectors by default, while text-embedding-3-large can generate up to 3,072 dimensions. Each dimension captures some aspect of the text's meaning, and together they create a rich representation that machine learning models can work with effectively.

ITextEmbeddingGenerationService: SK's Embedding Abstraction

The ITextEmbeddingGenerationService interface is Semantic Kernel's abstraction for working with text embeddings Semantic Kernel C# applications can use across different AI providers. This abstraction is one of the key benefits of using Semantic Kernel because it means you can write your embedding code once and switch between OpenAI, Azure OpenAI, or other providers without rewriting your application logic.

The interface provides two primary methods that you'll use constantly. First is GenerateEmbeddingAsync() which takes a single text string and returns a ReadOnlyMemory<float> representing the embedding vector. Second is GenerateEmbeddingsAsync() which accepts multiple text strings and returns a collection of embeddings, allowing you to process multiple texts in a single API call for better efficiency.

Provider independence matters more than you might initially think. You might start development using OpenAI's API directly, but later need to switch to Azure OpenAI for enterprise compliance or data residency requirements. With Semantic Kernel's abstraction, this becomes a simple configuration change rather than a major refactoring effort. The same ITextEmbeddingGenerationService interface works regardless of which provider backs it, and your business logic remains completely unchanged.

Setting Up OpenAI Embeddings

Configuring text embeddings Semantic Kernel C# supports for OpenAI is straightforward using the kernel builder pattern. You'll add the OpenAI text embedding generation service to your kernel, specify which embedding model to use, and provide your API key. I recommend storing your API key in environment variables or a secure configuration system rather than hardcoding it.

Here's how to set up OpenAI embeddings in your Semantic Kernel application:

using Microsoft.SemanticKernel;

var builder = Kernel.CreateBuilder();
builder.AddOpenAITextEmbeddingGeneration(
    "text-embedding-3-small",  // 1536 dimensions by default
    Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
var kernel = builder.Build();

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

OpenAI offers several embedding models with different capabilities and price points. Here's a quick comparison of the main options:

text-embedding-ada-002: The older standard model producing 1,536-dimensional embeddings with good general performance. Note: text-embedding-ada-002 has been deprecated by OpenAI. For new projects, use text-embedding-3-small (1,536 dimensions, better performance, lower cost) or text-embedding-3-large (3,072 dimensions) instead.
text-embedding-3-small: Newer model with 1,536 dimensions by default, improved efficiency and better performance than ada-002
text-embedding-3-large: Premium option with up to 3,072 dimensions for complex semantic tasks requiring highest quality

I typically start with text-embedding-3-small for most applications because it offers an excellent balance of performance, cost, and quality. You can always upgrade to the large model later if you need that extra semantic understanding, and because you're using the ITextEmbeddingGenerationService abstraction, switching models is just a configuration change. Remember that you're working with dependency injection patterns here, so the service gets injected wherever you need it.

Setting Up Azure OpenAI Embeddings

If you're working in an enterprise environment or need data to stay within specific geographic regions, Azure OpenAI is often the better choice. The setup is similar to direct OpenAI but requires Azure-specific configuration including your endpoint URL and deployment name. Azure OpenAI gives you more control over data residency, network isolation, and integration with Azure's identity and access management systems.

Here's how to configure Azure OpenAI embeddings:

using Microsoft.SemanticKernel;

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAITextEmbeddingGeneration(
    deploymentName: "text-embedding-deployment",
    endpoint: "https://your-resource.openai.azure.com/",
    apiKey: Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY")!);
var kernel = builder.Build();

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

The deployment name refers to your specific deployment in the Azure OpenAI Service, which you create through the Azure portal. Unlike OpenAI's direct API where you specify the model name, Azure uses deployment names that you define, giving you more control over versioning and scaling. Your endpoint is specific to your Azure OpenAI resource and follows the pattern shown above.

Choose Azure OpenAI over direct OpenAI when you need:

Enterprise SLA guarantees and compliance requirements
Specific data residency or geographic restrictions
Integration with existing Azure infrastructure (Key Vault, Monitor, Virtual Networks)
Private networking and enhanced security controls

The performance characteristics and embedding quality are identical to direct OpenAI since Azure OpenAI uses the same underlying models.

Generating a Single Embedding

Once you have the ITextEmbeddingGenerationService configured, generating an embedding for a single piece of text is remarkably simple. The GenerateEmbeddingAsync() method handles all the complexity of making the API call, processing the response, and returning the embedding vector. This is where the abstraction really shines because you don't need to worry about HTTP requests, error handling, or response parsing.

Here's a complete example of generating a single embedding:

using Microsoft.SemanticKernel;

var builder = Kernel.CreateBuilder();
builder.AddOpenAITextEmbeddingGeneration(
    "text-embedding-3-small",  // 1536 dimensions by default
    Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
var kernel = builder.Build();

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

var text = "Semantic Kernel makes it easy to build AI applications in .NET";
var embedding = await embeddingService.GenerateEmbeddingAsync(text);

Console.WriteLine($"Generated embedding with {embedding.Length} dimensions");
Console.WriteLine($"First 5 values: {string.Join(", ", embedding.Span[..5].ToArray().Select(v => v.ToString("F4")))}");

The return type ReadOnlyMemory<float> is important to understand. This type provides a memory-efficient view over the embedding data without copying it. You can access individual values through the .Span property, which gives you a ReadOnlySpan<float> for fast, allocation-free access to the vector values. The embedding dimensions depend on which model you're using, but text-embedding-3-small gives you 1,536 dimensions as shown.

Each float value in the embedding typically ranges between -1 and 1, though the exact range can vary by model. These values don't have human-interpretable meaning on their own, but together they encode the semantic content of your text. You'll use these embeddings primarily for comparison operations like calculating similarity, as I'll show you in the next sections. The async/await pattern is essential here because embedding generation involves network calls to the AI service.

Batch Embedding Generation

When you need to generate embeddings for multiple texts, using GenerateEmbeddingsAsync() is significantly more efficient than calling GenerateEmbeddingAsync() in a loop. Batch embedding generation sends all your texts to the API in a single request, reducing network overhead and often resulting in lower costs and faster processing. This is especially important when you're embedding entire document collections or processing user queries at scale.

Here's how to generate embeddings for multiple texts at once:

var texts = new[]
{
    "How to use dependency injection in .NET",
    "Async await patterns in C#",
    "Building REST APIs with ASP.NET Core",
    "Entity Framework Core performance tips"
};

var embeddings = await embeddingService.GenerateEmbeddingsAsync(texts);
Console.WriteLine($"Generated {embeddings.Count} embeddings in one batch");

The efficiency benefits of batch processing are substantial. Instead of four separate API calls with their associated latency and overhead, you make a single call that processes all four texts together. The API provider can optimize processing by handling multiple embeddings simultaneously, and you reduce the number of round trips between your application and the service.

Keep in mind that there are token limits for batch embedding requests. OpenAI's embedding models have maximum token limits per request, typically around 8,191 tokens for their embedding models. If you're processing very large documents or many texts, you might need to split them into multiple batches. I usually process embeddings in batches of 10-50 items depending on the average text length, which provides good efficiency without hitting token limits.

Cosine Similarity: Comparing Embeddings

Cosine similarity is the standard metric for comparing text embeddings Semantic Kernel C# applications generate. This metric measures the cosine of the angle between two vectors, giving you a value between -1 and 1 that indicates how similar two pieces of text are semantically. A value of 1 means the embeddings are identical, 0 means they're orthogonal and unrelated, and -1 means they're opposite in meaning, though negative values are rare with modern embedding models.

Understanding the typical similarity ranges helps you interpret results and set appropriate thresholds. Here's what different cosine similarity values typically indicate:

0.8 to 1.0: Very similar or semantically equivalent texts
0.6 to 0.8: Related content sharing themes or topics
0.4 to 0.6: Some connection but increasingly unrelated
Below 0.4: Generally unrelated content

These thresholds vary by use case, so I recommend experimenting with your specific data to find what works best.

Here's a robust implementation of cosine similarity in C#:

public static double CosineSimilarity(ReadOnlyMemory<float> vectorA, ReadOnlyMemory<float> vectorB)
{
    if (vectorA.Length != vectorB.Length)
        throw new ArgumentException("Vectors must have the same dimensions");
    
    var spanA = vectorA.Span;
    var spanB = vectorB.Span;
    
    double dotProduct = 0, magnitudeA = 0, magnitudeB = 0;
    
    for (int i = 0; i < spanA.Length; i++)
    {
        dotProduct += spanA[i] * spanB[i];
        magnitudeA += spanA[i] * spanA[i];
        magnitudeB += spanB[i] * spanB[i];
    }
    
    double magnitude = Math.Sqrt(magnitudeA) * Math.Sqrt(magnitudeB);
    return magnitude == 0 ? 0 : dotProduct / magnitude;
}

This implementation uses spans for efficient memory access and calculates the dot product while simultaneously computing the magnitudes of both vectors. The final division gives you the normalized similarity score. I use double precision for the calculations to maintain accuracy across the large number of dimensions typical in embedding vectors. This function is essential for any semantic search or similarity matching you'll do with embeddings.

Building a Simple Semantic Search Without a Vector DB

You don't always need a specialized vector database to work with embeddings effectively. For smaller collections of documents or prototyping, an in-memory semantic search implementation works perfectly well and helps you understand the fundamentals before introducing additional infrastructure complexity. This approach is ideal when you have a few hundred to a few thousand documents and don't need advanced features like distributed search or real-time updates at massive scale.

Here's a complete implementation of a simple in-memory semantic search:

public class SimpleSemanticSearch
{
    private readonly ITextEmbeddingGenerationService _embeddingService;
    private readonly List<(string Text, ReadOnlyMemory<float> Embedding)> _documents = [];
    
    public SimpleSemanticSearch(ITextEmbeddingGenerationService embeddingService)
    {
        _embeddingService = embeddingService;
    }
    
    public async Task AddDocumentAsync(string text)
    {
        var embedding = await _embeddingService.GenerateEmbeddingAsync(text);
        _documents.Add((text, embedding));
    }
    
    public async Task<IEnumerable<(string Text, double Score)>> SearchAsync(string query, int topK = 5)
    {
        var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query);
        
        return _documents
            .Select(doc => (doc.Text, Score: CosineSimilarity(queryEmbedding, doc.Embedding)))
            .OrderByDescending(x => x.Score)
            .Take(topK);
    }
    
    private static double CosineSimilarity(ReadOnlyMemory<float> a, ReadOnlyMemory<float> b)
    {
        var spanA = a.Span; var spanB = b.Span;
        double dot = 0, magA = 0, magB = 0;
        for (int i = 0; i < spanA.Length; i++) { dot += spanA[i] * spanB[i]; magA += spanA[i] * spanA[i]; magB += spanB[i] * spanB[i]; }
        double mag = Math.Sqrt(magA) * Math.Sqrt(magB);
        return mag == 0 ? 0 : dot / mag;
    }
}

This class demonstrates the core pattern of semantic search: embedding documents at indexing time, then embedding queries at search time and comparing them. The AddDocumentAsync() method generates and stores embeddings for your documents, while SearchAsync() generates an embedding for the query and finds the most similar documents using cosine similarity. The topK parameter lets you control how many results to return.

This approach is sufficient when your document collection fits comfortably in memory and you don't need sub-millisecond search times. For production RAG applications with larger collections, you'll want to use a proper vector store, but this implementation is perfect for learning, prototyping, and small-scale applications. I've used this pattern successfully for internal tools and demos where simplicity matters more than scalability.

Embedding Dimensions and Model Selection

The number of dimensions in your embeddings affects both storage requirements and semantic quality. Higher-dimensional embeddings can capture more nuanced semantic relationships but require more storage space and slightly longer processing times for similarity calculations. Understanding this trade-off helps you choose the right embedding model for your specific use case and constraints.

OpenAI's text-embedding-3-small produces 1,536 dimensions by default and offers excellent quality for most applications. This model represents a sweet spot of performance, cost, and semantic understanding. For comparison, the older text-embedding-ada-002 (now deprecated by OpenAI) also produces 1,536 dimensions but text-embedding-3-small is more efficient and performs better on most benchmarks. If you need the absolute best semantic understanding, text-embedding-3-large with up to 3,072 dimensions provides measurably better results on complex semantic tasks.

Storage considerations become important at scale. Each 1,536-dimensional embedding requires about 6KB of storage as raw floats (1,536 dimensions × 4 bytes per float). If you're storing embeddings for millions of documents, this adds up quickly. A 3,072-dimensional embedding doubles that storage requirement to 12KB per document. For a collection of one million documents, you're looking at 6GB versus 12GB of raw embedding storage before any database overhead.

The performance difference between models matters most for nuanced semantic tasks. If you're building a basic FAQ matching system or simple content recommendation, text-embedding-3-small is probably more than sufficient. But if you're working with technical documents that require understanding subtle distinctions, legal contracts where precision matters, or multilingual content with complex semantic relationships, the investment in text-embedding-3-large might be worthwhile. I recommend starting with the smaller model and upgrading only if you identify specific quality issues in your results.

FAQ

Here are answers to the most common questions I get about working with text embeddings with Semantic Kernel in C# that will help you avoid common pitfalls and make better implementation decisions.

What's the difference between text embeddings and word embeddings?

Word embeddings represent individual words as vectors, while text embeddings represent entire phrases, sentences, or documents. Modern text embeddings like those from OpenAI are contextual, meaning the same word gets different embeddings depending on how it's used in context. Text embeddings Semantic Kernel C# provides work at the text chunk level, which is much more useful for semantic search and RAG applications.

Can I use embeddings from different models together?

No, you should never compare embeddings from different models directly. Each embedding model creates its own vector space with its own dimensions and scaling. An embedding from text-embedding-3-small can only be meaningfully compared with other embeddings from text-embedding-3-small. If you need to switch models, you must regenerate all your embeddings using the new model.

How do I know if my similarity threshold is correct?

The best approach is to create a test set with known similar and dissimilar document pairs, then experiment with different thresholds to find what works for your data. Start with 0.7 as a baseline threshold and adjust up if you're getting too many false positives or down if you're missing relevant results. The optimal threshold varies significantly based on your domain and use case.

Do embeddings work for languages other than English?

Yes, modern embedding models like OpenAI's text-embedding-3 series support multiple languages. The quality varies by language, with more common languages generally having better representation. For specialized multilingual applications, you might want to look at models specifically trained for multilingual semantic understanding, but OpenAI's embeddings work reasonably well across many languages.

Should I normalize my embeddings before calculating similarity?

OpenAI's embeddings are already normalized, meaning they have a magnitude of 1. This is why cosine similarity works well for comparing them directly. If you're working with embeddings from other sources that might not be normalized, you should normalize them first, but with text embeddings Semantic Kernel C# generates from OpenAI or Azure OpenAI, you can use them directly.

Conclusion

Working with text embeddings Semantic Kernel C# provides through ITextEmbeddingGenerationService gives you a powerful foundation for building semantic search, RAG applications, and intelligent content recommendations in .NET. The abstraction layer means you can focus on your application logic rather than the details of specific AI providers, and switching between OpenAI and Azure OpenAI is just a configuration change when your requirements evolve.

I've shown you how to set up embedding generation, work with both single and batch embeddings, calculate cosine similarity, and even build a simple in-memory semantic search system. These fundamentals apply whether you're building a prototype or working toward a production system. As your needs scale, you can graduate from in-memory search to vector databases while keeping the same core embedding generation code.

The key to success with embeddings is understanding that they're representations of semantic meaning that enable similarity comparisons. Start simple, validate your similarity thresholds with real data, and choose your embedding model based on your actual quality requirements rather than always reaching for the most powerful option. With these practical techniques, you're well-equipped to add semantic understanding to your .NET applications.

Build a Document Q&A App with RAG and Semantic Kernel in C#

Build a document Q&A app with RAG and Semantic Kernel in C#. Covers ITextEmbeddingGenerationService, InMemoryVectorStore, and vector similarity search.

Build a Semantic Search Engine with Semantic Kernel in C#

Build a semantic search engine with Semantic Kernel in C#. Learn ITextEmbeddingGenerationService, VectorSearchOptions filter, and scored similarity search.

RAG with Semantic Kernel in C#: Complete Guide to Retrieval-Augmented Generation

Master RAG with Semantic Kernel in C# using vector stores, embeddings, and InMemoryVectorStore. Complete guide with working .NET code examples.

Table of Contents