BrandGhost
Streaming Responses with GitHub Copilot SDK in C#: Real-Time Token Output

Streaming Responses with GitHub Copilot SDK in C#: Real-Time Token Output

When you're building AI-powered applications, implementing streaming responses with GitHub Copilot SDK in C# is one of the most impactful features you can add. Instead of making your users wait 5, 10, or even 30 seconds staring at a loading spinner while the AI generates a complete response, streaming responses GitHub Copilot SDK C# allows you to display tokens as they're generated in real time. This creates a dramatically better user experience -- users see progress immediately, they can start reading while the response is still being generated, and they feel like the application is responsive rather than frozen. In this article, I'll show you exactly how to implement streaming responses GitHub Copilot SDK C# for both console applications and ASP.NET Core web apps, including error handling and performance considerations.

Enabling Streaming Responses GitHub Copilot SDK C#: Streaming = true

The first step to enable streaming responses GitHub Copilot SDK C# is remarkably simple. When you create a session using the patterns we covered in our session management guide, you configure the SessionConfig object with Streaming = true. This single property transforms how the SDK delivers responses to your application. Without streaming enabled, you receive a single AssistantMessageEvent containing the complete response after all tokens have been generated. With streaming enabled, the SDK fires a series of AssistantMessageDeltaEvent events, each containing a small chunk of the response as it's generated by the underlying model.

The key difference is in the event flow:

  • Non-streaming mode: Wait for entire response → receive one complete event
  • Streaming mode: Receive incremental delta events → SessionIdleEvent signals completion
  • Event handling: Accumulate delta chunks for full text, or display immediately for real-time feel

This means your event handler needs to accumulate these delta chunks if you need the full text, or more commonly, you display each chunk immediately to create that real-time feeling. The streaming approach aligns perfectly with the observer pattern that the SDK uses for event delivery.

AssistantMessageDeltaEvent: Token by Token

When streaming is enabled, the primary event you'll work with is AssistantMessageDeltaEvent. Each time the model generates a new token or small chunk of text, you receive one of these events. The event contains a Data property, which in turn has a DeltaContent property -- this is the actual text fragment you need to display. Unlike complete message events, delta events don't contain the full accumulated response. Each one is just the next piece of the puzzle.

The typical pattern for handling these events in a console application is to use Console.Write() instead of Console.WriteLine(). This allows tokens to appear on the same line, creating smooth, continuous text output rather than each token appearing on a new line. As you work with the CopilotClient and CopilotSession core concepts, you'll find that handling delta events becomes second nature. You simply write each DeltaContent value to your output stream, whether that's the console, a web response, or any other destination.

Building a Streaming Console App

Let me show you a complete working example of a console application that demonstrates streaming responses with GitHub Copilot SDK in C#. This demonstrates all the key concepts in a single, self-contained example:

using GitHub.Copilot.SDK;

await using var client = new CopilotClient();
await client.StartAsync();

await using var session = await client.CreateSessionAsync(new SessionConfig
{
    Model = "gpt-5",
    Streaming = true
});

Console.Write("Ask Copilot: ");
var question = Console.ReadLine() ?? "Explain async/await in C#";

var tcs = new TaskCompletionSource();

session.On(evt =>
{
    switch (evt)
    {
        case AssistantMessageDeltaEvent delta:
            Console.Write(delta.Data.DeltaContent);
            break;
        case SessionIdleEvent:
            Console.WriteLine();
            tcs.TrySetResult();
            break;
        case SessionErrorEvent err:
            Console.WriteLine($"
[ERROR] {err.Data.Message}");
            tcs.TrySetException(new Exception(err.Data.Message));
            break;
    }
});

await session.SendAsync(new MessageOptions { Prompt = question });
await tcs.Task;

This console app demonstrates several important patterns. First, notice how I'm using a TaskCompletionSource to convert the event-driven model into something I can await. Without this, the SendAsync call would return immediately, and my application would exit before the streaming completes. The TCS allows me to wait for the SessionIdleEvent that signals the stream has finished. Second, I'm handling SessionErrorEvent to catch any mid-stream failures, which I'll discuss more in a later section. Finally, the Console.Write() pattern creates that smooth, token-by-token output that makes streaming feel responsive.

You can easily add CancellationToken support to this pattern by passing a token into your event handler and checking ct.IsCancellationRequested before each write operation. This is particularly useful for long-running streams where users might want to cancel mid-response. The async/await fundamentals apply here just as they do in any other asynchronous C# code.

Streaming to ASP.NET Core: Server-Sent Events

For web applications, the standard approach for implementing streaming responses with GitHub Copilot SDK in C# is Server-Sent Events (SSE). SSE is a simple HTTP-based protocol that allows servers to push data to clients over a long-lived connection. It's perfect for streaming text because it's lightweight, works over standard HTTP, and has excellent browser support. The key is to set the content type to text/event-stream and flush the response buffer after each chunk.

Here's a complete ASP.NET Core endpoint that streams Copilot responses using SSE:

using Microsoft.AspNetCore.Mvc;
using GitHub.Copilot.SDK;
using System.Threading.Channels;

app.MapGet("/api/chat/stream", async (
    [FromQuery] string prompt,
    CopilotClient copilotClient,
    HttpContext httpContext,
    CancellationToken ct) =>
{
    httpContext.Response.ContentType = "text/event-stream";
    httpContext.Response.Headers["Cache-Control"] = "no-cache";
    httpContext.Response.Headers["X-Accel-Buffering"] = "no";
    
    await using var session = await copilotClient.CreateSessionAsync(new SessionConfig
    {
        Model = "gpt-5",
        Streaming = true
    });
    
    await foreach (var token in StreamResponseAsync(session, prompt, ct))
    {
        await httpContext.Response.WriteAsync($"data: {token}

", ct);
        await httpContext.Response.Body.FlushAsync(ct);
    }
    
    await httpContext.Response.WriteAsync("data: [DONE]

", ct);
});

The critical headers here are Content-Type: text/event-stream which tells the browser this is an SSE stream, Cache-Control: no-cache to prevent any intermediate proxies from buffering the response, and X-Accel-Buffering: no which disables buffering in nginx if you're using it as a reverse proxy. Each event is formatted as `data:

-- the double newline is how SSE delimits individual events. The[DONE]` sentinel at the end signals to the client that the stream is complete.

On the JavaScript client side, you'd use the EventSource API to consume this stream. The browser handles reconnection automatically, and you get a simple event-driven interface that fires a callback for each data chunk. This creates a smooth, responsive chat interface where users see the AI's response appear word by word.

Streaming via Channels: IAsyncEnumerable from Events

While the event-driven model of the Copilot SDK is powerful, sometimes you want a more functional approach using IAsyncEnumerable<string>. This is particularly useful when you want to compose streaming operations, integrate with LINQ-style operators, or work with ASP.NET Core's built-in support for async enumerable responses. The bridge between events and async enumerables is System.Threading.Channels.Channel<T>.

Here's a reusable helper method that converts the SDK's event model into a clean IAsyncEnumerable<string> stream:

using System.Threading.Channels;

public static async IAsyncEnumerable<string> StreamResponseAsync(
    CopilotSession session,
    string prompt,
    [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken ct = default)
{
    var channel = Channel.CreateUnbounded<string?>(new UnboundedChannelOptions
    {
        SingleReader = true,
        SingleWriter = false
    });
    
    session.On(evt =>
    {
        switch (evt)
        {
            case AssistantMessageDeltaEvent delta:
                channel.Writer.TryWrite(delta.Data.DeltaContent);
                break;
            case SessionIdleEvent:
                channel.Writer.TryWrite(null); // Signal end of stream
                break;
            case SessionErrorEvent err:
                channel.Writer.TryComplete(new Exception(err.Data.Message));
                break;
        }
    });
    
    await session.SendAsync(new MessageOptions { Prompt = prompt });
    
    await foreach (var token in channel.Reader.ReadAllAsync(ct))
    {
        if (token is null) yield break; // End of stream sentinel
        yield return token;
    }
}

This pattern is incredibly useful because it gives you a clean, composable abstraction over the event model. The channel acts as a queue between the event handler and the async enumerable consumer. When a delta event arrives, we write the token to the channel. When the stream completes with SessionIdleEvent, we write a null sentinel to signal the end. If an error occurs, we complete the channel with an exception, which will propagate to the consumer.

The usage is beautifully simple:

await foreach (var token in StreamResponseAsync(session, "Explain LINQ in C#"))
{
    Console.Write(token);
}

This abstraction works seamlessly with ASP.NET Core minimal APIs, which can return IAsyncEnumerable<string> directly and will automatically stream the results to the client. It's one of my favorite patterns for working with streaming data in modern .NET applications.

Handling Streaming Errors

When you're working with streaming responses with GitHub Copilot SDK in C#, errors become more complex to handle because they can occur at any point during the stream. Unlike non-streaming mode where an error prevents any response from arriving, with streaming you might receive 50 tokens successfully, then encounter a SessionErrorEvent mid-stream. This means you need to handle partial responses gracefully.

The key event to watch for is SessionErrorEvent. When this fires during a streaming operation, you have several options. First, you can display the error message to the user alongside whatever partial response they've already seen. This is often the best user experience because the user gets to keep the partial information. Second, you can attempt to retry the request, but be aware that this creates a challenge -- you need to decide whether to replay the partial response or start fresh. Third, you can log the error and gracefully degrade, perhaps switching to non-streaming mode for the retry attempt.

In the console example I showed earlier, I'm handling errors by writing them to the console and completing the TaskCompletionSource with an exception. In a production web application, you might write an error event to the SSE stream that your JavaScript client can catch and display in the UI. The important principle is to never leave the user hanging -- if a stream fails midway, make sure they know what happened and ideally give them a way to retry.

For robustness, I recommend implementing timeouts on streaming operations. A stream that stalls indefinitely is worse than an error because the user doesn't know if it's still working or has failed. Use CancellationTokenSource.CancelAfter() to set a maximum duration for streaming operations, and handle OperationCanceledException as a timeout scenario.

Streaming with Tool Calls

One of the more complex scenarios when implementing streaming responses with GitHub Copilot SDK in C# is when the AI needs to invoke tools during response generation. When you've configured tools and function calling in your session, the streaming flow becomes interleaved. Here's how it works: the model starts streaming delta tokens, then suddenly you receive a tool call event, the streaming pauses while your code executes the tool, you send the tool result back to the session, and then streaming resumes with more delta tokens.

The flow looks like this: delta tokens → ToolExecutionStartEvent → your tool execution → SendAsync with tool result → more delta tokens → SessionIdleEvent. Your event handler needs to accommodate this interrupted flow. In practice, this means your AssistantMessageDeltaEvent handler continues to work the same way, but you need additional case statements for tool-related events. The key insight is that streaming doesn't stop tool calls from working -- it just means the response comes in chunks before and after the tool invocation.

One pattern I find useful is to display a visual indicator when a tool call occurs during streaming. For example, in a console app you might write [Calling tool: get_weather] inline with the streamed text, so users understand why there's a brief pause. In a web UI, you might show a subtle animation or status message. This transparency helps users understand that the AI is actively working even when tokens aren't flowing.

When Not to Stream

While streaming responses GitHub Copilot SDK C# creates a better user experience in most interactive scenarios, there are definitely times when you should not use streaming:

  • Batch processing: No human watching output means no UX benefit from streaming
  • File generation: Complete files needed for disk writes, not incremental tokens
  • Structured data: JSON/XML parsing requires complete, valid output
  • Message queuing: Most queues expect complete messages, not streaming chunks
  • Database logging: Storing full responses simpler than accumulating deltas

If you're doing batch processing of AI requests where no human is watching the output, streaming adds complexity without benefit. For example, if you're generating code files that will be written to disk, there's no advantage to receiving the code token-by-token -- you need the complete file anyway.

Logging and auditing scenarios often require complete responses as well. If you need to store the full AI response in a database or log file, streaming means you need to accumulate all the delta events into a complete message before storing it. While this is certainly possible, it's simpler to just use non-streaming mode and log the complete response in one operation.

Another scenario where streaming can be problematic is when you're generating structured output like JSON that will be parsed. While you could technically parse streaming JSON using specialized libraries, it's far more reliable to wait for the complete response and then parse the complete, valid JSON object. Partial JSON is rarely useful and can lead to parsing errors if you try to handle it incrementally.

Finally, if your application architecture involves intermediate queuing or message passing systems, streaming may not flow through naturally. Many message queues and service buses expect complete messages, not streaming chunks. In these architectures, you might use non-streaming mode at the Copilot SDK layer, even if you implement streaming at a different layer of your application.

Performance: Streaming vs Non-Streaming

From a performance perspective, streaming responses GitHub Copilot SDK C# and non-streaming modes have different characteristics that matter depending on your use case. The most important metric is time-to-first-token (TTFT) -- how long until the user sees any output:

  • Streaming TTFT: First token displays quickly after generation begins
  • Non-streaming TTFT: Equals total generation time -- nothing displays until complete
  • Total generation time: Virtually identical between modes
  • Network overhead: Minimal difference (milliseconds across multi-second operations)
  • User perception: Streaming feels significantly faster even with identical total time

With streaming, the first token appears quickly because you display it as soon as it's generated. With non-streaming, TTFT equals the total generation time because nothing displays until it's complete.

However, the total time to generate the complete response is virtually identical between streaming and non-streaming modes. The underlying model generates tokens at the same rate regardless of how your application consumes them. What changes is user perception. A response that takes time to generate feels much faster when users see it streaming in than when they stare at a spinner and then see it all at once.

There is a small amount of additional overhead with streaming because each delta event has network and event handling costs, whereas non-streaming mode has a single network response and event. In practice, this overhead is negligible -- we're talking milliseconds across a multi-second operation. The user experience benefit far outweighs the minimal performance cost.

One scenario where total time actually matters is when you need the complete response before taking the next action. For example, if the AI's response determines which API endpoint you call next, streaming doesn't help -- you have to wait for the complete response anyway. In these cases, the choice between streaming and non-streaming is about code complexity rather than performance, and non-streaming is simpler.

FAQ

Can I switch between streaming and non-streaming modes in the same session?

No, the streaming mode is set when you create the session via SessionConfig, and it applies to all requests sent through that session. If you need both modes in your application, you need to create separate sessions with different configurations. In practice, this is rarely a problem because most applications use streaming responses GitHub Copilot SDK C# consistently for user-facing interactions or consistently avoid it for background processing.

How do I handle rate limits with streaming responses?

Rate limiting works the same way whether you're streaming or not -- the limits apply to the underlying API calls. If you hit a rate limit during a streaming response, you'll receive a SessionErrorEvent with details about the rate limit. Your application should handle this the same way it handles other errors, potentially with retry logic that includes exponential backoff. The getting started guide covers rate limiting patterns in more detail.

Can I use streaming with different models?

Yes, streaming responses GitHub Copilot SDK C# is supported across all the models available through the Copilot SDK. You specify the model in SessionConfig.Model just as you would in non-streaming mode. Different models may have different token generation speeds, which affects how quickly delta events arrive, but the streaming mechanism itself works identically across models.

How do I test streaming functionality?

Testing streaming code requires some special considerations. I recommend using the channel-based IAsyncEnumerable pattern I showed earlier because it's easier to test -- you can consume the async enumerable in your test code and verify you receive the expected tokens in order. For unit testing, you might want to mock the session and emit synthetic delta events with known content. For integration testing, use a real session but with short prompts that produce predictable, quick responses to keep your test suite fast.

Conclusion

Implementing streaming responses GitHub Copilot SDK C# transforms the user experience of your AI-powered applications from "wait and hope" to "watch and read." The patterns I've shown you here -- from basic console streaming to ASP.NET Core SSE endpoints to channel-based async enumerables -- give you the building blocks to add streaming to any .NET application. The key is enabling streaming in your SessionConfig, handling AssistantMessageDeltaEvent to display tokens as they arrive, and properly managing the completion and error cases with SessionIdleEvent and SessionErrorEvent.

As you build on these foundations, remember that streaming is about user perception as much as technical implementation. The choice between streaming and non-streaming should be driven by whether a human is watching the output and whether seeing incremental progress provides value. For interactive applications, streaming is almost always the right choice. For background processing, non-streaming is simpler. And with the patterns and code examples in this article, you have everything you need to implement both approaches effectively in your C# applications.

Building Real Apps with GitHub Copilot SDK in C#: End-to-End Patterns and Architecture

Building real apps with GitHub Copilot SDK in C#: master app architecture patterns, CLI tools, ASP.NET Core APIs, and console agents in .NET.

Getting Started with GitHub Copilot SDK in C#: Installation, Setup, and First Conversation

Getting started with GitHub Copilot SDK in C#: master installation, CopilotClient setup, streaming responses, and build your first .NET AI app.

GitHub Copilot SDK for .NET: Complete Developer Guide

Learn the GitHub Copilot SDK for .NET in this complete developer guide. Build custom AI agents with CopilotClient, CopilotSession, streaming, tools, and multi-model support in C#.

An error has occurred. This application may no longer respond until reloaded. Reload