BrandGhost
OpenTelemetry and Observability in Microsoft Agent Framework

OpenTelemetry and Observability in Microsoft Agent Framework

OpenTelemetry and Observability in Microsoft Agent Framework

If you're building AI agents in C# with Microsoft Agent Framework (MAF), deploying them without observability is a liability. Microsoft agent framework opentelemetry observability gives you the instrumentation layer you need to understand exactly what your agent is doing -- how many tokens it's consuming, how long each turn takes, which tools are being called, and where failures are occurring. Without that visibility, debugging a misbehaving agent means sifting through logs manually or guessing. With it, you get structured telemetry feeding dashboards, alerts, and trace analysis.

MAF is in public preview (version 1.0.0-rc1), and while the APIs may evolve before GA, the observability patterns described here are grounded in stable OpenTelemetry contracts and Microsoft.Extensions.AI conventions. This article targets developers who already have a working MAF setup and want to add production-grade telemetry to their agent pipelines. We'll cover what MAF instruments automatically, how to configure OpenTelemetry tracing and metrics, how to export data to Azure Monitor, and what practical signals to watch once your telemetry is flowing.

Why Observability Is Critical for AI Agents

AI agents are not deterministic. They call LLMs, invoke tools, maintain conversational context, and often branch across multiple reasoning steps. That non-determinism means traditional application monitoring -- CPU spikes, HTTP 500s, and response time histograms -- is not sufficient on its own.

There are three failure modes that make observability especially important in agent systems.

Cost overruns. Token usage is billed. An agent that gets stuck in a retry loop, or that passes unnecessarily verbose context in every message, can quietly burn through budget. Tracking token counts per turn, per session, and per agent lets you catch these patterns before they become expensive problems.

Latency degradation. A single agent turn can involve multiple LLM calls, tool invocations, and context lookups. One slow tool can cascade into a 10-second response. Without span-level visibility into each step, isolating the bottleneck is extremely difficult. You'll know the turn was slow -- you just won't know why.

Quality erosion. When agents start producing incorrect outputs, you need a way to correlate those outputs with the specific tool calls and prompt content that preceded them. Trace-level data, especially with content logging enabled, gives you that correlation path.

For a broader look at how multi-step agent systems can be structured, Building AI Agents with Semantic Kernel covers foundational patterns that carry over well into MAF workflows. And for reasoning about how agents coordinate in more complex scenarios, Multi-Agent Orchestration with Semantic Kernel covers coordination strategies worth understanding before trying to observe them in production.

Microsoft.Extensions.AI Built-in Telemetry Hooks

MAF is built on top of Microsoft.Extensions.AI, which defines a middleware pipeline for IChatClient. That pipeline includes built-in support for OpenTelemetry through a component called OpenTelemetryMiddleware. When you call .UseOpenTelemetry() on the IChatClientBuilder, you register that middleware into the chain.

Here's what gets traced automatically when you add this middleware:

  • Chat completions. Each call to GetResponseAsync generates a span capturing the model name, provider, and total duration.
  • Streaming completions. Streaming calls get their own span covering the full stream lifecycle from first token to final chunk.
  • Token usage. Prompt tokens and completion tokens are recorded as span attributes and as counters on an OpenTelemetry meter.
  • Function and tool invocations. Each tool call gets its own child span, capturing the function name and (optionally) the arguments and result.
  • Errors. Exceptions during LLM calls or tool invocations set the span status to error and record the exception details.

The source name for traces is "Microsoft.Extensions.AI". Metrics are emitted on a meter with the same name. This means your OpenTelemetry configuration needs to subscribe to that source explicitly -- which we'll cover in the next section.

The logContents parameter on UseOpenTelemetry() controls whether prompt content and tool call arguments are included in span attributes. Setting it to true is useful during development but should be evaluated carefully in production, since prompt content may contain sensitive user data.

Setting Up OpenTelemetry with MAF

The first step is configuring a TracerProvider and MeterProvider that subscribe to the "Microsoft.Extensions.AI" source. You'll need the OpenTelemetry and OpenTelemetry.Extensions.Hosting NuGet packages, plus whichever exporter packages fit your environment.

Here's how to set up the OpenTelemetry providers directly for a console application or standalone host:

using OpenTelemetry;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;

// Tracing setup -- subscribes to all MAF spans
var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .AddSource("Microsoft.Extensions.AI")  // Capture MAF chat and tool spans
    .AddConsoleExporter()                   // For local development visibility
    .Build();

// Metrics setup -- subscribes to MAF token and latency meters
var meterProvider = Sdk.CreateMeterProviderBuilder()
    .AddMeter("Microsoft.Extensions.AI")   // Capture token usage, duration histograms
    .AddConsoleExporter()                   // For local development visibility
    .Build();

When using the generic host with dependency injection (the recommended approach for ASP.NET Core or Worker Service projects), you can register these providers through the IServiceCollection extensions instead:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("Microsoft.Extensions.AI")
        .AddConsoleExporter())
    .WithMetrics(metrics => metrics
        .AddMeter("Microsoft.Extensions.AI")
        .AddConsoleExporter());

Both patterns work. The hosted approach ties TracerProvider and MeterProvider lifetime to the application host, which handles disposal correctly on shutdown. For standalone console applications or integration tests, the manual Sdk.Create* approach gives you explicit control over provider lifecycle.

Note: Microsoft Agent Framework is in public preview (1.0.0-rc1). Source names, meter names, and API shapes may change before the GA release.

Adding the UseOpenTelemetry() Middleware to the IChatClient Pipeline

The telemetry middleware must be explicitly added to the IChatClient build chain. MAF wraps an underlying chat client (typically OpenAIChatClient) in a series of middleware layers. Each layer adds behavior -- retry logic, function invocation, streaming support, and observability.

Here's the pattern for wiring up telemetry alongside the function invocation middleware:

using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;
using Azure.Core;

// Build the underlying Azure OpenAI client
var openAIClient = new AzureOpenAIClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey));

// Compose the IChatClient pipeline with observability enabled
IChatClient client = new OpenAIChatClient(openAIClient, deploymentName)
    .AsBuilder()
    .UseOpenTelemetry(logContents: true)  // Must come before UseFunctionInvocation
    .UseFunctionInvocation()               // Dispatches tool calls automatically
    .Build();

// Wrap as an AI agent with a system prompt
var agent = client.AsAIAgent(instructions: "You are a helpful assistant.");

The order of .UseOpenTelemetry() and .UseFunctionInvocation() matters here. Placing UseOpenTelemetry() before UseFunctionInvocation() means the telemetry middleware wraps the function invocation middleware. The result is an outer span for the full agent turn, with child spans for each tool call nested inside it. That hierarchy is the most useful structure for diagnosing where time is being spent within a single agent turn.

If you're building agents that stream responses, Streaming Responses with GitHub Copilot SDK covers how streaming patterns relate to span lifecycle in chat-based systems -- the span-per-stream model applies to MAF in the same way.

Key Metrics to Track

Once the providers and middleware are wired up, you have access to structured telemetry signals. The metrics most relevant for AI agent monitoring fall into four categories.

Token usage. The gen_ai.client.token.usage metric tracks prompt and completion token counts per request. This is your primary signal for cost monitoring. Watch for sessions where prompt tokens grow unbounded across turns -- it typically means context is being accumulated without any pruning or summarization strategy in place.

Request duration. The gen_ai.client.operation.duration histogram captures end-to-end LLM call latency. Aggregated as p50/p95/p99, this tells you what your typical and worst-case turn latencies look like across your entire agent fleet.

Tool call frequency. Spans for function invocations let you count how often each tool is called per turn. A spike in tool invocations for a single agent turn is a strong signal that the agent is struggling to resolve a task -- or that the tool itself is failing silently and being retried.

Error rate. Error spans from failed LLM calls or tool invocations give you a structured failure signal separate from application-level exceptions. When error rate climbs, correlate it with model deployment changes, prompt modifications, or tool API availability windows.

These metrics integrate naturally with Azure Monitor dashboards, Prometheus, Grafana, or any OpenTelemetry-compatible backend. For understanding how sessions contribute to accumulating context and why that directly affects prompt token counts, Managing Sessions and Context in GitHub Copilot SDK covers the context management patterns that drive what gets sent in each prompt.

Traces and Spans for Agent Runs and Tool Invocations

A single agent turn in MAF generates a span hierarchy. At the top is a gen_ai.chat span representing the full GetResponseAsync call. Below that, if tools are invoked, each tool call gets its own gen_ai.execute_tool child span. If the model makes multiple tool calls in a single turn, you'll see multiple sibling spans all nested under the parent completion span.

Key span attributes include:

  • gen_ai.system -- the model provider (e.g., openai, azure.openai)
  • gen_ai.request.model -- the deployment or model name
  • gen_ai.usage.input_tokens -- prompt tokens consumed
  • gen_ai.usage.output_tokens -- completion tokens generated
  • gen_ai.response.finish_reason -- how the model ended the turn (stop, tool_calls, length)

With logContents: true, full prompt messages and completion content are also attached as span events, giving you the ability to reconstruct exactly what the model received and returned for any given span in your trace backend.

This structure makes it possible to answer operational questions like: Which tool call made this turn slow? Did the model hit a context length limit on this session? At what turn in a multi-turn conversation did an error first appear?

For multi-agent systems where one agent invokes another as a subtask, the span hierarchy extends further -- the orchestrating agent's turn becomes the root span, and the sub-agent's turns are nested below. The Build a Multi-Agent Analysis System article shows how those delegation patterns look in practice, and the same span nesting concepts apply to MAF.

Connecting to Azure Monitor and Application Insights

For production workloads, you'll want to export traces and metrics to a backend that supports retention, querying, and alerting. Azure Monitor (via Application Insights) is the most natural fit for C#/.NET workloads already running on Azure infrastructure.

The Azure.Monitor.OpenTelemetry.Exporter package replaces the console exporters from the setup section:

using OpenTelemetry;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;
using Azure.Monitor.OpenTelemetry.Exporter;

// Connection string from your Application Insights resource
var connectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING");

var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .AddSource("Microsoft.Extensions.AI")
    .AddAzureMonitorTraceExporter(options =>
    {
        options.ConnectionString = connectionString;
    })
    .Build();

var meterProvider = Sdk.CreateMeterProviderBuilder()
    .AddMeter("Microsoft.Extensions.AI")
    .AddAzureMonitorMetricExporter(options =>
    {
        options.ConnectionString = connectionString;
    })
    .Build();

Once traces are flowing into Application Insights, the Transaction Search view lets you find individual agent runs, drill into their span trees, and inspect token counts and latency at each step. The Metrics blade lets you plot token usage over time. Alerts can notify you when error rates exceed a threshold or when average turn latency crosses a defined boundary.

The connection string is available from the Application Insights resource in the Azure portal under Settings > Properties. Never hardcode it in source -- store it in Azure Key Vault or an environment variable and reference it through your configuration provider.

For understanding how custom AI tools generate the function call spans that appear in your traces, Custom AI Tools with AIFunctionFactory shows how tools are defined and registered, and those same tools will produce child spans in your MAF traces once observability is enabled.

Practical Tips: Reading Traces and Setting Up Dashboards

Getting telemetry flowing is the first step. Using it effectively requires knowing what to look for and having the right queries ready.

Find slow turns first. Sort traces by duration and investigate the outliers. A turn that takes 15 seconds when typical turns take 2 seconds usually has a slow tool call buried inside. The span tree will point you directly to which child span is responsible, saving you from guessing.

Watch prompt token growth across sessions. If your agent accumulates context without bounds, token usage grows with every turn. You'll see this in the gen_ai.usage.input_tokens trend -- it climbs consistently across turns in a session. That growth pattern is a reliable signal to implement context pruning or summarization.

Add custom span attributes for business context. The built-in spans capture model-level detail, but you often need to correlate agent behavior with application-level context like user IDs, session IDs, or tenant IDs. You can add these using manual instrumentation wrapping your agent calls:

using System.Diagnostics;

// Register this source name with your TracerProvider using .AddSource("MyAgentApp")
private static readonly ActivitySource ActivitySource = new("MyAgentApp");

public async Task<string> RunAgentTurnAsync(
    string userId,
    string sessionId,
    string prompt,
    AgentSession? session = null,
    CancellationToken cancellationToken = default)
{
    // Start a custom parent span -- MAF's spans will nest inside this one
    using var activity = ActivitySource.StartActivity("AgentTurn");

    activity?.SetTag("user.id", userId);
    activity?.SetTag("session.id", sessionId);
    activity?.SetTag("agent.name", "analysis-agent");

    var response = await _agent.RunAsync(prompt, session, cancellationToken);

    activity?.SetTag("agent.response.tokens", response.Text?.Length ?? 0);

    return response.Text ?? string.Empty;
}

Don't forget to register "MyAgentApp" as an additional source with .AddSource("MyAgentApp") in your TracerProvider builder -- otherwise these spans will not be exported.

Build dashboards before you need them. In Application Insights, Workbooks let you define reusable dashboards backed by KQL queries. Set one up early with panels covering average turn latency, total token usage per hour, tool call distribution by function name, and error rate over time. Having the dashboard ready means you're using telemetry proactively, not only reactively after an incident.

For RAG-based agents where vector retrieval adds latency to each turn, RAG with Semantic Kernel covers the retrieval patterns that benefit most from being wrapped in custom child spans so you can separate retrieval latency from LLM latency in your traces.

FAQ

What source name do I use when subscribing to MAF telemetry in OpenTelemetry?

The correct source and meter name is "Microsoft.Extensions.AI" for both tracing (AddSource) and metrics (AddMeter). MAF builds on the Microsoft.Extensions.AI abstraction layer, which emits all spans and metrics under this identifier. If you use the wrong name, your providers will simply not capture any MAF telemetry -- no errors, just silence.

Does UseOpenTelemetry() trace tool calls automatically or only LLM calls?

When combined with UseFunctionInvocation() in the middleware pipeline, UseOpenTelemetry() traces both LLM calls and individual tool invocations. Each tool call gets a child span nested under the parent chat completion span. The nesting gives you a clear picture of how time is distributed between the model's inference step and the tool execution steps within a single agent turn.

Is it safe to set logContents to true in production?

That depends entirely on what your prompts contain. Setting logContents: true attaches full prompt messages and completion content to span attributes, which then flow into your telemetry backend. If your prompts include PII, medical data, financial information, or any sensitive user input, enabling content logging in production is a significant privacy risk. Evaluate your data classification requirements and consider using sampling or filtering at the OpenTelemetry Collector layer if you need partial content capture without full exposure.

How does microsoft agent framework opentelemetry observability fit into an ASP.NET Core application?

When running MAF inside an ASP.NET Core application, use builder.Services.AddOpenTelemetry() with .WithTracing() and .WithMetrics() extension methods during startup. This ties the TracerProvider and MeterProvider lifecycle to the application host so they're properly disposed when the app shuts down. Register your IChatClient pipeline with .UseOpenTelemetry() in the same composition root, and telemetry will flow through to whichever exporters you configure.

Can I differentiate spans from different agent instances in the same trace backend?

Not directly through the built-in middleware -- UseOpenTelemetry() instruments the IChatClient interface uniformly. However, you can differentiate agents by wrapping their calls in custom Activity spans (as shown in the manual instrumentation example above) and tagging those spans with agent-specific attributes like agent.name or agent.type. This makes it straightforward to segment and filter traces by agent in downstream dashboards.

What happens to telemetry data if the Azure Monitor exporter is temporarily unavailable?

OpenTelemetry exporters are fire-and-forget by default. If the Azure Monitor ingestion endpoint is temporarily unavailable, spans and metrics that fail to export are dropped after the in-memory buffer fills. The exporter does not retry indefinitely or persist to disk. For mission-critical observability requirements, consider deploying an OpenTelemetry Collector as a sidecar. The collector buffers and retries exports independently of your application process, improving resilience against transient network issues.

Does MAF's telemetry follow the OpenTelemetry Semantic Conventions for Generative AI?

Yes. The Microsoft.Extensions.AI telemetry implementation targets the OpenTelemetry Semantic Conventions for Generative AI Systems (in experimental status as of this writing). Attributes like gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, and gen_ai.usage.output_tokens all follow that spec. Aligning with the spec means your dashboard queries and alert rules should remain compatible as both MAF and the semantic conventions mature toward stability.

Wrapping Up

Getting microsoft agent framework opentelemetry observability right is not complicated, but it does require deliberate setup. The Microsoft.Extensions.AI middleware pipeline provides the hooks, OpenTelemetry provides the collection and export layer, and Azure Monitor or any compatible backend provides the retention and analysis surface.

The key steps are: subscribe your TracerProvider and MeterProvider to the "Microsoft.Extensions.AI" source, add UseOpenTelemetry() before UseFunctionInvocation() in the IChatClient build chain, and configure an exporter appropriate for your target environment. From there, the telemetry data tells you where time is going, how tokens are being spent, and where failures are occurring -- all without requiring changes to your agent logic.

As MAF moves from public preview (1.0.0-rc1) toward GA, some API details will shift. But the OpenTelemetry contracts and observability patterns described here are built on stable foundations that predate MAF itself. Investing in observability now means you'll have the data you need to optimize, debug, and scale your agents as the framework matures.

For broader context on building sophisticated agent systems in C#, Semantic Kernel in C# Complete Guide covers the orchestration fundamentals that complement the observability work described here -- understanding how agents are composed makes it much easier to reason about what you're seeing in your traces.

Getting Started with Microsoft Agent Framework in C#

Getting started with Microsoft Agent Framework in C# is fast. Install packages, configure OpenAI or Azure OpenAI, and build your first streaming agent.

Multi-Agent Orchestration in Microsoft Agent Framework in C#

Master multi-agent orchestration in Microsoft Agent Framework in C# with a real research app. Revision loops, quality gates, and context passing explained.

Microsoft Agent Framework in C#: Complete Developer Guide

Complete guide to Microsoft Agent Framework in C#. Core abstractions, architecture, tool registration, sessions, and where MAF fits in the .NET AI ecosystem.

An error has occurred. This application may no longer respond until reloaded. Reload