BrandGhost
ChatCompletionAgent vs AssistantAgent in Semantic Kernel: Which Should You Use?

ChatCompletionAgent vs AssistantAgent in Semantic Kernel: Which Should You Use?

When I first started working with Semantic Kernel agents, I quickly realized that choosing between ChatCompletionAgent and AssistantAgent in Semantic Kernel C# isn't just about picking a class name. This decision fundamentally shapes your application's architecture, determines your infrastructure dependencies, and impacts everything from cost to portability. If you're building AI agents with Semantic Kernel, understanding the difference between ChatCompletionAgent vs AssistantAgent in Semantic Kernel is crucial for making the right architectural choices. In this guide, I'll walk you through both ChatCompletionAgent Semantic Kernel C# and AssistantAgent implementations, compare them side-by-side, and help you decide which one fits your .NET application best.

ChatCompletionAgent: The Lightweight Agent

The ChatCompletionAgent represents the more traditional approach to building AI agents in Semantic Kernel. Think of it as a lightweight wrapper around any chat completion service you throw at it -- whether that's OpenAI's API, Azure OpenAI, or even local models running through Ollama. I've found ChatCompletionAgent to be the Swiss Army knife of agent types because it doesn't lock you into any specific infrastructure.

Under the hood, ChatCompletionAgent works by wrapping a Semantic Kernel instance that's configured with a chat completion service. It's stateless by default, which means you're responsible for managing conversation history through a ChatHistoryAgentThread. This gives you complete control over how conversations are stored, retrieved, and managed. The agent itself doesn't maintain any external state -- everything runs in your application's process space.

What I love about ChatCompletionAgent is its simplicity and flexibility. You configure your kernel with whatever chat service you want, create the agent with instructions, and start invoking it. The agent processes messages through your configured LLM and returns responses. No external dependencies beyond the LLM API endpoint you've chosen.

Here's a complete example showing how straightforward ChatCompletionAgent creation and usage can be:

// ChatCompletionAgent -- lightweight, model-agnostic
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Agents;
using Microsoft.SemanticKernel.Connectors.OpenAI;

var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion("gpt-4o", Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
// Or use Azure: builder.AddAzureOpenAIChatCompletion(...)
// Or use Ollama: builder.AddOllamaChatCompletion(...)
var kernel = builder.Build();

var chatAgent = new ChatCompletionAgent
{
    Name = "Assistant",
    Instructions = "You are a helpful C# programming assistant.",
    Kernel = kernel
};

var thread = new ChatHistoryAgentThread();

await foreach (var response in chatAgent.InvokeAsync("What's the difference between IEnumerable and IQueryable?", thread))
{
    Console.WriteLine(response.Content);
}

Notice how clean this is. The agent is just a configuration wrapper that uses your kernel's chat completion service to process messages. The thread is a simple in-memory collection of chat messages. You could easily swap out the OpenAI connector for Azure or Ollama without changing the agent code itself. This portability is one of ChatCompletionAgent's biggest strengths.

AssistantAgent: OpenAI Assistants API Integration

AssistantAgent takes a completely different approach. Instead of being a lightweight wrapper, AssistantAgent is a bridge to OpenAI's Assistants API. This means your agent isn't just code running in your application -- it's a persistent entity stored in OpenAI's infrastructure, complete with its own managed state, tools, and capabilities.

When you create an AssistantAgent, you're actually creating or connecting to an OpenAI Assistant. The conversation threads are managed by OpenAI, not your application. This fundamentally changes the architecture because state persistence happens automatically in OpenAI's cloud. Your threads survive application restarts, can be accessed from multiple clients, and benefit from OpenAI's built-in tools like code interpreter and file search.

The code interpreter tool is particularly powerful. It allows your assistant to write and execute Python code in a sandboxed environment hosted by OpenAI. Need to perform complex calculations, data analysis, or generate visualizations? The assistant can do it without you writing a single line of tool implementation code. Similarly, the file search tool lets assistants search through documents you've uploaded to OpenAI, enabling RAG scenarios without managing your own vector database.

The tradeoff is obvious -- you're now tightly coupled to OpenAI's infrastructure. You need an OpenAI API key, you're paying for Assistants API usage, and you can't easily swap to Azure OpenAI or local models. But for applications that can accept this dependency, the built-in capabilities are incredibly valuable.

Here's how you create and use an AssistantAgent:

// AssistantAgent -- OpenAI Assistants API backed
using Microsoft.SemanticKernel.Agents.OpenAI;

// Create the OpenAI assistant (one-time setup)
var openAiClient = new OpenAI.OpenAIClient(Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);

var assistantDefinition = await openAiClient.GetAssistantClient().CreateAssistantAsync(
    "gpt-4o",
    new OpenAI.Assistants.AssistantCreationOptions
    {
        Name = "CodeHelper",
        Instructions = "You are an expert C# developer. Analyze code, explain concepts, and write examples.",
        Tools =
        [
            new OpenAI.Assistants.CodeInterpreterToolDefinition() // Built-in code execution!
        ]
    });

// Wrap in SK AssistantAgent
var assistantAgent = new OpenAIAssistantAgent(
    assistantDefinition,
    openAiClient.GetAssistantClient());

// AssistantAgent uses OpenAI-managed threads
var thread = new OpenAIAssistantAgentThread(openAiClient.GetAssistantClient());

await foreach (var response in assistantAgent.InvokeAsync("Write and run a C# program that calculates prime numbers up to 100.", thread))
{
    Console.WriteLine(response.Content);
}

// Clean up (optional -- threads persist in OpenAI by default)
await thread.DeleteAsync();

The code structure is similar to ChatCompletionAgent once you get past the initial setup, but notice the fundamental architectural difference. The assistant and thread both exist as persistent entities in OpenAI's cloud. If your application crashes and restarts, you can reconnect to the same thread and continue the conversation exactly where you left off.

ChatCompletionAgent vs AssistantAgent: A Technical Comparison

Understanding the technical differences between ChatCompletionAgent and AssistantAgent in Semantic Kernel C# is essential for making informed architectural decisions. While both implement the same agent interface and can participate in the same orchestration scenarios, they operate on fundamentally different architectural models.

ChatCompletionAgent is model-agnostic, runs entirely in your application's process space, and gives you complete control over state management. AssistantAgent delegates to OpenAI's Assistants API, leveraging cloud-hosted state and built-in tools at the cost of infrastructure dependency and vendor lock-in.

Here's a structured comparison of the key technical dimensions:

Dimension ChatCompletionAgent AssistantAgent
State Persistence In-memory ChatHistoryAgentThread (you manage storage) Persistent threads stored in OpenAI cloud
Infrastructure Requirements LLM API endpoint only (OpenAI, Azure, Ollama, etc.) OpenAI API + Assistants API access required
Built-in Tools None (you implement all tools as plugins) Code interpreter, file search, function calling
Cost Model Standard chat completion token costs Assistants API costs + thread storage + tool usage
Latency Direct API calls, minimal overhead Additional API round trips for thread management
Portability Model-agnostic, works with any chat completion service OpenAI and Azure OpenAI (via Assistants API)
State Management You control persistence, retrieval, lifecycle Automatic persistence, managed by OpenAI
Code Execution Not available without custom implementation Built-in sandboxed Python code interpreter
File Search Requires custom RAG implementation Built-in vector search across uploaded files
Multi-Client Access Requires shared state store you implement Thread IDs enable access from multiple clients

The choice between these two often comes down to whether you value flexibility and control (ChatCompletionAgent) or built-in capabilities and managed state (AssistantAgent). I've used both extensively in production, and the "right" choice always depends on your specific requirements and constraints.

When to Use ChatCompletionAgent

I reach for ChatCompletionAgent whenever I need flexibility, portability, or fine-grained control over my agent's infrastructure. This is the go-to choice for most .NET AI applications, especially when you're working within existing enterprise architectures or need to support multiple deployment scenarios.

ChatCompletionAgent is the ideal choice for the following scenarios:

  • Model-agnostic deployments -- When you need to support both OpenAI and Azure OpenAI, or when you want the option to swap to local models via Ollama for development or cost optimization. You simply change the kernel configuration and the agent code stays exactly the same.
  • Azure OpenAI scenarios with Chat Completion -- When you need Azure OpenAI for chat completion models and want the flexibility to swap between OpenAI and Azure without code changes. Essential for organizations with Azure infrastructure commitments or data residency requirements. Note that Azure OpenAI also supports the Assistants API (see Azure OpenAI Assistants quickstart).
  • Local model development -- During building AI agents with Semantic Kernel, use Ollama with local models for rapid iteration without incurring API costs.
  • Latency-sensitive applications -- Direct API approach minimizes overhead and reduces round trips, which matters when you're building real-time or interactive experiences.
  • Multi-agent orchestration -- Multi-agent orchestration scenarios often work better with lightweight agents that share the same conversation history, avoiding the complexity of managing multiple persistent OpenAI threads.
  • Custom state management -- When you need full control over how conversation history is stored, retrieved, and managed within your existing data infrastructure.

The most common scenario where ChatCompletionAgent shines is model-agnostic deployments. I've built systems where different customers use different LLM providers, and ChatCompletionAgent's flexibility made this architecture possible without code duplication.

When to Use AssistantAgent

AssistantAgent becomes the right choice when you need OpenAI's built-in capabilities or want to offload conversation state management to OpenAI's infrastructure. I've found several scenarios where AssistantAgent's unique features justify the vendor lock-in and additional costs.

Consider AssistantAgent when you need:

  • Code interpreter capability -- The killer feature for performing calculations, analyzing data, generating charts, or executing any kind of computational logic. OpenAI handles the sandboxing and execution without you implementing custom tools.
  • File search across documents -- Built-in vector search capabilities for RAG applications without managing your own embeddings or vector database. Upload documents to OpenAI, enable file search, and the assistant automatically retrieves relevant context.
  • Persistent threads across restarts -- Cloud-managed threads make conversation continuity trivial. Users can return hours or days later and pick up exactly where they left off using just a thread ID.
  • Managed conversation state -- No need to implement database schemas for storing chat history, handle serialization, or manage lifecycle policies for old conversations.
  • Financial analysis agents -- Where the code interpreter can execute calculations and generate reports
  • Data science assistants -- For statistical analysis and data visualization tasks
  • Coding tutors -- That can demonstrate concepts by running code examples in real-time

The code interpreter capability is the killer feature that drives many AssistantAgent implementations. I've used this for financial analysis agents, data science assistants, and even coding tutors that can demonstrate concepts by running code.

However, I want to be clear about the limitations. AssistantAgent requires the OpenAI Assistants API -- Azure OpenAI does support the Assistants API (Azure OpenAI Assistants documentation), but feature parity with OpenAI may differ; verify current Azure support before committing. The costs are also higher -- you're paying for Assistants API usage, thread storage, and tool execution on top of the base token costs. Consult OpenAI pricing and Azure OpenAI pricing for current rates; make sure your use case justifies these additional expenses.

Migration Path: Going from ChatCompletionAgent to AssistantAgent

One of the nice aspects of Semantic Kernel's agent design is that both ChatCompletionAgent and AssistantAgent implement the same core interfaces. This means if you start with ChatCompletionAgent and later decide you need AssistantAgent's capabilities, the migration path is relatively straightforward. The code change is mostly in construction -- once created, both agent types expose the same invocation patterns.

The primary difference you'll encounter is in how you create the agent and manage threads. ChatCompletionAgent construction is lightweight -- you just configure a kernel and create the agent. AssistantAgent requires creating the assistant definition in OpenAI first, which is typically a one-time setup operation. You'll want to store the assistant ID and reuse it rather than creating new assistants every time your application starts.

Thread management changes more significantly. ChatCompletionAgent uses ChatHistoryAgentThread, which is essentially a wrapper around an in-memory chat history. You can serialize and deserialize this history if you want persistence. AssistantAgent uses OpenAI-managed threads that you create via CreateThreadAsync and reference by ID. The thread lifecycle is different -- OpenAI threads persist by default, so you need to think about cleanup strategies.

The invocation code itself barely changes. Both agents support the same InvokeAsync pattern, return the same message types, and work within the same orchestration frameworks. If you've abstracted your agent interactions behind interfaces (which I recommend in any non-trivial application), swapping implementations is often just a matter of changing your dependency injection configuration.

One gotcha I've encountered: if you built custom plugins for your ChatCompletionAgent, you'll need to convert them to OpenAI function calling format to use them with AssistantAgent. The concepts are similar, but the registration and invocation mechanics differ. Plan for some refactoring time if your agent relies heavily on custom tools.

Using Both Together in AgentGroupChat

Here's something that surprised me when I first explored Semantic Kernel's complete capabilities: you can mix ChatCompletionAgent and AssistantAgent in the same AgentGroupChat orchestration. This opens up some interesting architectural patterns where you leverage the strengths of each agent type within a single multi-agent system.

The pattern I've found most useful is combining lightweight ChatCompletionAgent instances for general reasoning and coordination with specialized AssistantAgent instances that leverage code interpreter or file search. For example, you might have a planning agent (ChatCompletionAgent) that breaks down user requests, a research agent (AssistantAgent with file search) that retrieves relevant documentation, and an implementation agent (AssistantAgent with code interpreter) that writes and tests code.

Each agent maintains its own thread, so the group chat needs to coordinate passing context between agents. Semantic Kernel handles the mechanics of message routing based on your selection strategy, but you need to think carefully about how agents share information and build on each other's work.

Here's an example showing both agent types collaborating in a group chat:

#pragma warning disable SKEXP0110
// Mixing agent types in AgentGroupChat
var groupChat = new AgentGroupChat(chatAgent, assistantAgent)
{
    ExecutionSettings = new AgentGroupChatSettings
    {
        SelectionStrategy = new SequentialSelectionStrategy()
    }
};

groupChat.ExecutionSettings.TerminationStrategy.MaximumIterations = 4;
#pragma warning restore SKEXP0110

groupChat.AddChatMessage(new ChatMessageContent(
    AuthorRole.User,
    "Design and implement a simple calculator class in C#. First plan it, then implement it."));

await foreach (var message in groupChat.InvokeAsync())
{
    Console.WriteLine($"[{message.AuthorName}]: {message.Content}
---");
}

In this example, the ChatCompletionAgent might handle the design and planning phase, while the AssistantAgent with code interpreter actually implements and tests the code. Each agent brings its unique strengths to the collaboration.

The cost implications of mixing agent types are important to consider. You're now paying for both ChatCompletionAgent's token usage and AssistantAgent's Assistants API costs. Make sure the benefits of specialization justify the additional complexity and expense. In my experience, this pattern works best when you have clear separation of concerns and each agent type is doing something the other can't do efficiently.

Cost and Performance Tradeoffs

Understanding the cost and performance characteristics of ChatCompletionAgent versus AssistantAgent is critical for production deployments. The differences aren't trivial, and I've seen projects run into budget surprises when they didn't properly account for Assistants API pricing.

ChatCompletionAgent Cost Structure:

  • Standard chat completion pricing based on input and output tokens only
  • Transparent and predictable -- you pay for what you send and receive
  • No additional overhead beyond token costs
  • Minimal processing costs for managing chat history in your application

AssistantAgent Cost Structure:

  • Base token costs (input and output)
  • Assistants API fees (additional per-request charges)
  • Thread storage costs (accumulates over time)
  • Code interpreter execution costs (when code runs)
  • File search operation costs (per search query)
  • Thread cleanup required to avoid storage cost accumulation

Consult OpenAI pricing and Azure OpenAI pricing for current rates -- pricing changes frequently and specific numbers in any article will age quickly.

Performance Characteristics:

  • ChatCompletionAgent latency -- Direct API calls minimize round trips. Simple query-response patterns are typically faster than AssistantAgent due to fewer API operations.
  • AssistantAgent latency -- Additional API operations for thread management (create thread, add message, run assistant, poll completion) introduce overhead noticeable in high-throughput scenarios.
  • Throughput differences -- ChatCompletionAgent handles higher request rates due to less API overhead. AssistantAgent's multiple operations mean hitting rate limits sooner under heavy load.

For simple query-response patterns, ChatCompletionAgent generally responds faster than AssistantAgent because it requires fewer API round trips. The gap narrows when you're using AssistantAgent's built-in tools (which would require additional round trips if you implemented them yourself with ChatCompletionAgent), but for basic conversation, the lightweight architecture is typically faster. Actual performance depends on your specific workload, network conditions, and API tier -- benchmark with your own traffic before drawing conclusions.

My recommendation: start with ChatCompletionAgent unless you have a specific requirement that only AssistantAgent can fulfill. The cost and performance characteristics are more favorable, and you maintain flexibility. Only adopt AssistantAgent when the built-in tools or managed state provide clear value that justifies the additional expense and complexity.

Frequently Asked Questions

Here are the most common questions developers ask when choosing between ChatCompletionAgent Semantic Kernel C# and AssistantAgent implementations for their .NET AI applications.

Can I use AssistantAgent with Azure OpenAI?

Yes, Azure OpenAI supports the Assistants API. You can use AssistantAgent with Azure OpenAI by configuring the OpenAI client to use your Azure endpoint. See the Azure OpenAI Assistants quickstart for details on setting up and using Assistants with Azure OpenAI. This gives you the benefits of OpenAI's Assistant capabilities while maintaining data residency and compliance requirements within Azure infrastructure.

Which agent type is better for production applications?

It depends entirely on your requirements. ChatCompletionAgent is better for most production scenarios because it's more flexible, less expensive, and offers better performance. It works with any LLM provider and gives you complete control over state management. Use AssistantAgent in production only when you specifically need code execution, file search, or OpenAI-managed persistent threads. The vendor lock-in and cost implications make it a specialized choice rather than a default.

Can I switch between ChatCompletionAgent and AssistantAgent at runtime?

Technically yes, since both implement the same interfaces, but it's not straightforward. The thread management is different (ChatHistoryAgentThread vs OpenAI threads), so you can't directly transfer a conversation from one agent type to another. If you need this flexibility, consider abstracting your agent interactions behind your own interface and implementing adapters for each agent type. This lets you swap implementations, but you'll need to handle thread conversion logic yourself.

Do both agent types support plugins and function calling?

Yes, but the implementation differs. ChatCompletionAgent uses Semantic Kernel plugins, which you add to the kernel and the agent automatically exposes them for function calling. AssistantAgent uses OpenAI's function calling format, which requires you to define functions in the OpenAI assistant definition. The concepts are similar, but you need to adapt your plugin implementations when moving between agent types. Semantic Kernel provides some helpers for this conversion.

How do I persist ChatCompletionAgent conversations across application restarts?

You need to implement this yourself. The ChatHistoryAgentThread contains a ChatHistory object that you can serialize (it's just a list of ChatMessageContent objects). Store the serialized history in a database, file, or cache. When your application restarts, deserialize the history and create a new ChatHistoryAgentThread from it. You're responsible for the entire lifecycle. AssistantAgent handles this automatically via OpenAI's thread storage, which is one of its key advantages if you don't want to build state management infrastructure.

Conclusion

Choosing between ChatCompletionAgent and AssistantAgent in Semantic Kernel C# is fundamentally a choice between flexibility and convenience. ChatCompletionAgent gives you model-agnostic portability, lower costs, better performance, and complete control over your infrastructure. AssistantAgent offers built-in code execution, file search, and managed conversation state at the cost of vendor lock-in and higher expenses.

For most .NET AI applications I build, ChatCompletionAgent is the right starting point. It integrates seamlessly with Azure OpenAI, works with local models via Ollama, and scales efficiently in production. The lightweight architecture and direct API approach deliver excellent performance while keeping costs predictable.

I reach for AssistantAgent only when I need capabilities that would be complex to implement myself -- primarily code interpreter for computational tasks or file search for document-heavy RAG scenarios. The managed threads are nice, but not worth the vendor lock-in unless you have no plans to support alternative LLM providers.

The good news is that Semantic Kernel's design lets you mix both approaches. Start with ChatCompletionAgent for general-purpose agents, add specialized AssistantAgent instances when you need their unique tools, and orchestrate them together in group chats when appropriate. You're not locked into an all-or-nothing choice.

Understanding both agent types and their tradeoffs is essential for building robust AI applications with Semantic Kernel and C#. Make informed decisions based on your specific requirements, infrastructure constraints, and cost sensitivities. The right choice depends on your context, and now you have the knowledge to choose wisely.

Looking to explore more agent frameworks? Check out my complete guide on the GitHub Copilot SDK for .NET, which offers yet another approach to building intelligent agents in C#. The AI agent ecosystem is rich with options, and understanding the tradeoffs between different frameworks and agent types will make you a more effective .NET developer building AI-powered applications.

Building AI Agents with Semantic Kernel in C#: A Practical Step-by-Step Guide

Learn how to build AI agents with Semantic Kernel in C# from scratch. Create ChatCompletionAgent, configure instructions, add plugins, manage conversation threads, and build production-ready AI agents in .NET.

Semantic Kernel in C#: Complete AI Orchestration Guide

Master Semantic Kernel in C# with this complete guide. Learn plugins, agents, RAG, and vector stores to build production AI applications with .NET.

Semantic Kernel Agents in C#: Complete Guide to AI Agents

Master Semantic Kernel agents in C# with ChatCompletionAgent, AgentGroupChat orchestration, and Microsoft Agent Framework integration.

An error has occurred. This application may no longer respond until reloaded. Reload