BrandGhost
C# String to Byte Array: UTF-8, Encoding, and Span Conversions

C# String to Byte Array: UTF-8, Encoding, and Span Conversions

C# String to Byte Array: UTF-8, Encoding, and Span Conversions

Converting a c# string to byte array is a fundamental operation in .NET applications. You need it when writing to streams, sending data over a network, storing strings in binary format, computing checksums, or interfacing with low-level APIs that work in bytes rather than characters. The right way to do this conversion depends on the encoding you need, the performance requirements, and the .NET version you are targeting.

This guide covers all the approaches: classic Encoding.UTF8.GetBytes(), allocation-free Span<byte> methods, the .NET 7 u8 literal for compile-time UTF-8 bytes, and how to choose between them.


Why Encoding Matters

Before jumping to code, let us clarify one thing: a string is not bytes, and bytes are not a string. A .NET string is internally stored as UTF-16 -- each character takes 2 or 4 bytes. When you convert a string to bytes, you encode it using a specific encoding standard:

  • UTF-8 -- Variable-width, 1-4 bytes per character. The universal standard for text on the internet. ASCII characters use only 1 byte.
  • UTF-16 -- .NET's internal encoding. Use it when interoperating with Windows APIs or legacy systems.
  • ASCII -- 1 byte per character, only covers 128 characters. Data is lost if the string contains non-ASCII characters.
  • ISO-8859-1 / Latin-1 -- 1 byte per character, covers Western European characters.

For the vast majority of cases, UTF-8 is the correct choice. It is compact for ASCII-heavy text, universal, and what every web standard expects.


Method 1: Encoding.UTF8.GetBytes() (Classic Approach)

The simplest way to convert a string to a UTF-8 byte array:

using System.Text;

var text = "Hello, World!";
byte[] bytes = Encoding.UTF8.GetBytes(text);

Console.WriteLine(bytes.Length);     // 13
Console.WriteLine(bytes[0]);         // 72 ('H' in ASCII/UTF-8)

Encoding.UTF8 is a static, thread-safe instance. You do not need to create a new Encoding object each time. This method allocates a new byte array for the result.

Getting the Byte Count First

If you want to pre-size a buffer precisely before encoding, call GetByteCount first to avoid over-allocating. This is useful when you want to rent a buffer from ArrayPool with the exact required size:

var text = "Hello, 世界";
int byteCount = Encoding.UTF8.GetByteCount(text); // 13 (ASCII) + 6 (3 bytes each for 世 and 界)
byte[] buffer = new byte[byteCount];
int written = Encoding.UTF8.GetBytes(text, buffer);
Console.WriteLine($"Encoded {written} bytes");

Method 2: Encoding.GetBytes() with Span (Allocation-Free, .NET 5+)

If you already have a buffer (for example, from an ArrayPool or a stack allocation), you can encode directly into it without allocating a new byte array:

using System.Buffers;
using System.Text;

namespace ByteConversionDemo;

public static class Utf8Encoder
{
    public static int EncodeToRentedBuffer(string text, out byte[] rentedBuffer)
    {
        int maxByteCount = Encoding.UTF8.GetMaxByteCount(text.Length);
        rentedBuffer = ArrayPool<byte>.Shared.Rent(maxByteCount);

        int written = Encoding.UTF8.GetBytes(text, rentedBuffer);
        return written;
    }
}

This pattern is important in high-throughput scenarios where you want to avoid heap allocations. ArrayPool<byte>.Shared rents a reusable buffer from a shared pool, and you return it when done.

Span-Based Encoding

The Encoding.UTF8.GetBytes(ReadOnlySpan<char>, Span<byte>) overload is the most efficient for encoding substrings without creating intermediate strings:

Span<byte> stackBuffer = stackalloc byte[256];
var text = "Hello, World!";

int written = Encoding.UTF8.GetBytes(text.AsSpan(), stackBuffer);
var encoded = stackBuffer[..written];

Console.WriteLine($"Encoded {written} bytes");

For strings shorter than ~256 characters, stack allocation (stackalloc) avoids any heap allocation at all. This is one of the highest-performance options available.


Method 3: UTF-8 String Literals (.NET 7+, u8 Suffix)

When the string is a compile-time constant, you can use the u8 suffix to get the UTF-8 bytes without any runtime conversion:

// Compile-time UTF-8 bytes -- zero runtime cost
ReadOnlySpan<byte> hello = "Hello, World!"u8;
ReadOnlySpan<byte> contentType = "application/json"u8;
ReadOnlySpan<byte> crlf = "
"u8;

Console.WriteLine(hello.Length);  // 13

The bytes are embedded directly in the compiled assembly. At runtime, accessing them is as fast as accessing any other readonly data. No encoding computation occurs.

Storing as ReadOnlyMemory

If you need to persist the bytes beyond the current stack frame, ReadOnlySpan<byte> will not work because it is a ref struct restricted to the stack. Convert to ReadOnlyMemory<byte> instead to store the bytes as a field or pass them across async boundaries:

private static readonly ReadOnlyMemory<byte> JsonContentType =
    "application/json"u8.ToArray();

ToArray() allocates once at startup. The field is then reused without allocation throughout the application's lifetime.


Method 4: MemoryMarshal for UTF-16 Bytes

Sometimes you need the raw UTF-16 bytes of a string (for example, when writing to a Windows Named Pipe or a COM interface). Use MemoryMarshal:

using System.Runtime.InteropServices;

var text = "Hello";
ReadOnlySpan<byte> utf16Bytes = MemoryMarshal.AsBytes(text.AsSpan());

Console.WriteLine(utf16Bytes.Length); // 10 -- 2 bytes per char

This is a zero-copy operation. No memory is allocated; you get a view of the string's internal UTF-16 buffer. Be careful: these bytes are platform-endian (little-endian on most modern hardware).


Method 5: Converting Back -- Byte Array to String

The reverse operation decodes a byte array back into a .NET string. Use Encoding.UTF8.GetString() and match the encoding you used when encoding -- using the wrong encoding produces garbled text:

byte[] bytes = new byte[] { 72, 101, 108, 108, 111 }; // "Hello"
string text = Encoding.UTF8.GetString(bytes);
Console.WriteLine(text); // Hello

// Span-based overload (no array copy)
ReadOnlySpan<byte> span = bytes.AsSpan();
string fromSpan = Encoding.UTF8.GetString(span);
Console.WriteLine(fromSpan); // Hello

For large byte arrays received from network or file I/O, prefer the Span-based overload to avoid unnecessary copying.


Choosing the Right Approach

The best method depends on whether your string is known at compile time, how large it is, and whether allocations matter in that context. Use this table as a quick decision reference:

|----------|---------------------| | General purpose, allocating | Encoding.UTF8.GetBytes(string) | | Reusing a pooled buffer | Encoding.UTF8.GetBytes(ReadOnlySpan<char>, Span<byte>) | | Compile-time constant string | "..."u8 literal (.NET 7+) | | Stack buffer for short strings | stackalloc + Encoding.UTF8.GetBytes | | Storing as ReadOnlyMemory<byte> | "..."u8.ToArray() (allocated once) | | UTF-16 interop | MemoryMarshal.AsBytes(text.AsSpan()) |


Practical Example: HTTP Request Body

Here is a realistic example combining several techniques -- writing a JSON body to an HttpClient request with minimal allocation:

using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;

namespace ByteConversionDemo;

public sealed class ApiClient
{
    private readonly HttpClient _httpClient;

    public ApiClient(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }

    public async Task<HttpResponseMessage> PostJsonAsync(
        string url,
        string json,
        CancellationToken cancellationToken = default)
    {
        var bytes = Encoding.UTF8.GetBytes(json);
        var content = new ByteArrayContent(bytes);
        content.Headers.ContentType = new MediaTypeHeaderValue("application/json")
        {
            CharSet = "utf-8"
        };

        return await _httpClient.PostAsync(url, content, cancellationToken);
    }
}

For higher-performance scenarios (like sending many requests per second), you would use ArrayPool<byte> to rent the buffer and return it after the request completes.


Practical Example: Computing a SHA-256 Hash

Hashing a string requires first converting it to bytes:

using System.Security.Cryptography;
using System.Text;

namespace ByteConversionDemo;

public static class StringHasher
{
    public static string ComputeSha256(string input)
    {
        var bytes = Encoding.UTF8.GetBytes(input);
        var hash = SHA256.HashData(bytes);
        return Convert.ToHexString(hash).ToLowerInvariant();
    }

    // Allocation-free version using stackalloc
    public static string ComputeSha256Fast(string input)
    {
        Span<byte> inputBytes = stackalloc byte[Encoding.UTF8.GetMaxByteCount(input.Length)];
        int inputLength = Encoding.UTF8.GetBytes(input, inputBytes);
        inputBytes = inputBytes[..inputLength];

        Span<byte> hash = stackalloc byte[32]; // SHA-256 is always 32 bytes
        SHA256.HashData(inputBytes, hash);

        return Convert.ToHexString(hash).ToLowerInvariant();
    }
}

The fast version uses stackalloc for both the input bytes and the hash output, avoiding all heap allocation for short strings.


Common Mistakes to Avoid

A few encoding mistakes are very common in .NET codebases. Each one is easy to make and easy to miss in code review -- here are the most important ones to watch for:

Mistake 1: Using ASCII for General Text

ASCII encoding only covers 128 characters. Any character outside that range -- including virtually all non-English text -- is silently replaced with a question mark, corrupting the data:

byte[] bad = Encoding.ASCII.GetBytes("Hello, 世界"); // '世' and '界' become '?'

// ✅ UTF-8 handles all Unicode byte[] good = Encoding.UTF8.GetBytes("Hello, 世界");


### Mistake 2: Creating New Encoding Instances

`Encoding.UTF8` is a static, thread-safe singleton. There is no reason to instantiate a new `UTF8Encoding` object on every call -- it wastes memory and adds unnecessary pressure on the garbage collector:


byte[] bytes = new UTF8Encoding().GetBytes("Hello");

// ✅ Use the static instance
byte[] bytes2 = Encoding.UTF8.GetBytes("Hello");

Mistake 3: Encoding String for Comparison Instead of Using StringComparison

Converting both strings to bytes just to compare them is wasteful -- it allocates two byte arrays and throws them away immediately. Use the built-in string comparison APIs instead:

bool equal = Encoding.UTF8.GetBytes(a).SequenceEqual(Encoding.UTF8.GetBytes(b));

// ✅ Use string comparison directly bool correct = string.Equals(a, b, StringComparison.OrdinalIgnoreCase);


---

## Integration with Broader .NET Patterns

String-to-byte conversion comes up across the .NET ecosystem. When building [feature slices in C#](https://www.devleader.ca/2026/04/15/feature-slicing-in-c-organizing-code-by-feature), you often serialize command or query objects to bytes for caching or messaging. When working with [CQRS with feature slices in C#](https://www.devleader.ca/2026/04/25/cqrs-with-feature-slices-in-c-commands-and-queries-per-feature), string identifiers and message bodies need to be encoded for transmission.

If you are building an AI-powered feature like [a semantic search engine with Semantic Kernel in C#](https://www.devleader.ca/2026/03/18/build-a-semantic-search-engine-with-semantic-kernel-in-c), string encoding to bytes is part of the embedding pipeline -- text is converted to bytes before being sent to the embedding model's API.

The [singleton design pattern in C#](https://www.devleader.ca/2026/03/15/singleton-design-pattern-in-c-complete-guide-with-examples) is relevant here too -- a `static readonly SearchValues<byte>` or a singleton `Encoding` instance is a common pattern for reusing pre-built, thread-safe objects in high-throughput code paths.

---

## Performance Benchmarks Summary

For rough guidance on relative performance (actual measurements will vary by string length and target hardware -- always benchmark in your specific context):

| Method | Allocates? | Notes |
|--------|-----------|-------|
| `Encoding.UTF8.GetBytes(string)` | Yes (byte[]) | Simplest, fine for most cases |
| `GetBytes(Span<char>, Span<byte>)` | No | Best for reusing buffers |
| `"..."u8` literal | No | Zero cost, compile-time only |
| `stackalloc` buffer | No (stack) | Best for short strings (<256 chars) |
| `ArrayPool` buffer | No (pooled) | Best for larger or variable-length strings |

---

## FAQ

Here are the most common questions developers ask about converting strings to byte arrays and encoding in C#.



Use UTF-8 (`Encoding.UTF8`) in almost all cases. It is the standard for web protocols, file I/O, and APIs. Only use other encodings when a specific system or protocol requires it.

### Is Encoding.UTF8.GetBytes thread-safe?

Yes -- `Encoding.UTF8` is a thread-safe singleton that the runtime creates once and caches. You can call `GetBytes` from multiple threads simultaneously without any synchronization, and the behavior is always correct and predictable.

### What is the difference between GetBytes and GetByteCount?

`GetByteCount` calculates how many bytes are needed to encode the string without actually encoding it. Use it to pre-size a buffer. `GetBytes` performs the actual encoding and writes the bytes into the provided buffer or a new array.

### How do I convert bytes back to a string in C#?

Use `Encoding.UTF8.GetString(byte[])` or the `Span`-based overload `Encoding.UTF8.GetString(ReadOnlySpan<byte>)` for zero-copy decoding from an existing buffer.

### Can I avoid heap allocation when converting strings to bytes?

Yes. Use `stackalloc` for short strings (up to a few hundred bytes), `ArrayPool<byte>.Shared` for larger or variable-length strings, and the `u8` literal suffix for compile-time constant strings. The `Span<byte>`-based overloads of `Encoding.GetBytes` write into these buffers without allocating.

### What is the u8 suffix in .NET 7?

The `u8` suffix creates a `ReadOnlySpan<byte>` with the UTF-8 bytes of a string literal, computed at compile time. It is zero-cost at runtime and is the best choice for constant byte sequences like protocol headers and delimiters.

### What happens if I use ASCII encoding with Unicode characters?

Non-ASCII characters (any character with a code point above 127) are replaced with the ASCII substitution character `?` (byte value 63). Data is permanently lost. Always use UTF-8 when there is any possibility of non-ASCII content.

How to Format a String as Base64 in CSharp - Beginner's Guide

Learn how to format a string as Base64 in CSharp, including examples of encoding and decoding strings. Improve your coding skills with these practical tips!

Convert String to Byte Array in C# -- 3 Encoding Methods Explained

Learn how to convert a C# string to a byte[] using UTF-8, ASCII, and Unicode encodings. Copy-paste code examples included for each method.

C# Multiline Strings: Verbatim Literals, Raw String Literals, and UTF-8 Literals

Learn every C# multiline string technique: verbatim @-strings, C# 11 raw string literals with triple quotes, and .NET 7 UTF-8 u8 literals with code examples.

An error has occurred. This application may no longer respond until reloaded. Reload