How to Convert a String to Byte Array in C#: Encoding and Decoding Simplified

04/21/2024

C# / .NET / dotnet Programming Software Engineering .NET ASCII C#character encoding code coding CSharp decoding DotNet dotnet core Encoding strings UTF16 UTF8

In programming, it's very common that we need to go between strings and bytes. Humans read strings. Computers read bytes. As a result, it's important for us dotnet developers to understand how to convert a string to a byte array in C#. We accomplish this through a process called encoding and decoding to go back and forth between the two.

In this article, I'll provide you code examples to be able to convert a string to a byte array in C# -- and back again! You'll also learn about some of the nuances of character encodings to look out for!

Remember to check out these platforms:

// FIXME: social media icons coming back soon!

Encoding and Decoding in C#

If we want to convert a string to a byte array -- or go the other way -- we need to understand the concept of encoding and decoding. In software engineering, encoding refers to the process of transforming a string into a sequence of bytes, while decoding involves the reverse process of transforming bytes back into a string. Simple, right?

Remember that strings are a sequence of characters, and the concept of a character makes a lot of sense to us as readers, but computers understand bytes. Characters themselves can be represented by numbers in computers so when we want to work with strings at a lower level, such as when sending data over a network or storing it in a file, we need to convert the string into a byte array.

Encoding comes into play when converting strings to byte arrays. It determines how the characters in the string are represented as bytes. The encoding scheme defines the mapping between characters and their byte representations. Common encoding schemes include UTF-8, UTF-16, ASCII, and Unicode.

Choosing the correct encoding is important because different encoding schemes support different sets of characters. For example, ASCII only supports the basic English alphabet (with a few more characters), while UTF-8 and UTF-16 are capable of representing characters from multiple languages and scripts. More on this later though since I know you're eager for some code!

Choosing an Encoding for String to Byte Array Conversion

When converting strings to byte arrays in C#, one of the most important things to consider is character encoding. Character encoding determines the mapping between characters and byte values, and if you're considering transforming data one way then you may want to put some thought into how to transform it back!

What's that supposed to mean? Well, if we use a data transform -- let's say taking a string and transforming it to bytes with an ASCII encoding -- if there is not a mapping of a particular character to a byte representation we lose that data in the result. Now what happens if you want to go the other way and get your byte array back to a string?

Data is missing!

In C#, there are various encodings available, including ASCII, UTF-8, and UTF-16, each with its specific characteristics and usage scenarios. Let's explore these encodings and see how they can be used for string to byte array conversion.

ASCII Encoding

ASCII encoding represents characters using 7 bits, allowing for a total of 128 different characters. It's primarily suitable for handling basic English characters and is more space-efficient compared to other encodings. Here's an example of converting a string to a byte array using ASCII encoding:

string text = "Hello, World!";
byte[] asciiBytes = Encoding.ASCII.GetBytes(text);

In many modern applications, ASCII may not be what you're after. This is especially true if you have users across the globe in locales that are not English. That's not to say that ASCII *cannot* be used, but you'll want to be careful about what data you ASCII encode so as to not lose information during the encoding transform.

So keep in mind that the ASCII character set represents characters using a 7-bit encoding scheme, allowing for a total of 128 unique characters. However, with the growing need for internationalization and multilingual support, ASCII alone is insufficient to represent all characters effectively.

Good thing we have some other options coming right up!

UTF-8 Encoding

UTF-8 encoding is a variable-length encoding scheme that can represent any Unicode character. It's widely used for encoding text in various languages and is backward compatible with ASCII. It uses fewer bytes for representing ASCII characters but may require more bytes for non-ASCII characters. Here's an example of converting a string to a byte array using UTF-8 encoding:

string text = "Привет, мир!";
byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);

UTF-16 Encoding

UTF-16 encoding represents characters using either 2 or 4 bytes, making it capable of representing any Unicode character. It's commonly used by applications that need to handle multilingual text or when interoperability with other systems is required. The encoding can be little-endian or big-endian, with the former being more prevalent. Here's an example of converting a string to a byte array using UTF-16 encoding:

string text = "こんにちは、世界！";
byte[] utf16Bytes = Encoding.Unicode.GetBytes(text);

UTF-8 Encoding vs UTF-16 Encoding - What's The Difference?

Each of these encodings is variable width, and their size gives them different characteristics when we consider different alphabets.

Starting with UTF-8, it's variable-width and backward-compatible with ASCII. In this encoding:

ASCII characters, which are on the range U+0000 to U+007F, take only 1 byte.
In the next range, code points U+0080 to U+07FF take twice as much space at 2 bytes each.
Code points U+0800 to U+FFFF take one more byte, bringing us up to 3 bytes
And finally, code points U+10000 to U+10FFFF take 4 bytes.

This can be very effective for English text because those characters will take up minimal space. However, when it comes to Asian text it's not ideal for the exact opposite reason.

Because UTF-16 has code points from U+0000 to U+FFFF take 2 bytes and code points U+10000 to U+10FFFF are double that at 4 bytes, it's not-so-great for English. It *does* happen to be better suited for Asian characters though.

There's even UTF-32 encoding! This is a fixed-width encoding where all of the code points take four bytes -- unlike the others I've mentioned which are dynamic in size. This can use much more storage than the other encodings, but because of its simplicity can be much faster to operate on.

Give some consideration to the alphabets you'll need to support primarily!

Best Practices for String to Byte Array Conversion

When converting strings to byte arrays in C#, it's important to follow best practices to ensure efficiency and reliability. In this section, I'll discuss some key best practices that you should keep in mind when performing string to byte array conversions.

Error Handling and Validation

When working with encodings, it's important to handle potential errors and validate your data to prevent unexpected behavior in your code. Ideally, you structure the flow of your application such that you know what kind of data you're dealing with. If you can write code to avoid errors in the first place, this is preferred!

We don't want to have to rely on this but sometimes it's outside of our control -- handling errors is by using try-catch blocks. By encapsulating the conversion code within a try block, you can catch any exceptions that may occur during the conversion process and handle them gracefully. If you don't have control over the source of the input data, this is something you might need to do for safety.

Encoding Selection

C# provides several encoding options for converting strings to byte arrays, such as UTF-8, UTF-16, ASCII, and more. It's important to select the appropriate encoding based on the specific requirements of your application. Consider factors such as character sets, compatibility with other systems, and performance implications when choosing the encoding.

// Example of encoding selection
string inputString = "Hello, World!";
byte[] encodedBytes = Encoding.UTF8.GetBytes(inputString);

Just like we saw in the previous examples, after the Encoding class we get to pick the static property holding the encoding instance. If we need to select one to pass around as a variable and parameter, you can absolutely store it in a dedicated encoding reference:

Encoding selectedEncoding = Encoding.UTF8;
SomeMethod("Hello World!", selectedEncoding);

Selecting the wrong encoding can have big consequences for your application! This is especially true if you save data with an encoding that will lose data resolution and you can't reverse it... so put some care into this!

Now You Know How to Convert a String to Byte Array in C#!

You're a pro now with encoding and decoding! Well, maybe not a full-on expert... but you have the basics put in front of you and some guidelines to work with. That's a pretty good start.

Remember to select the right encoding for the situation you're dealing with. Keep in mind that you can potentially lose data resolution when using the wrong encoding, and as a result, lose data forever!

If you found this useful and you're looking for more learning opportunities, consider subscribing to my free weekly software engineering newsletter and check out my free videos on YouTube! Meet other like-minded software engineers and join my Discord community!

Affiliations

These are products & services that I trust, use, and love. I get a kickback if you decide to use my links. There’s no pressure, but I only promote things that I like to use!

BrandGhost: My social media content and scheduling tool that I use for ALL of my content!
RackNerd: Cheap VPS hosting options that I love for low-resource usage!
Contabo: Affordable VPS hosting options!
ConvertKit: The platform I use for my newsletter!
SparkLoop: Helps add value to my newsletter!
Opus Clip: Tool for creating short-form videos!
Newegg: For all sorts of computer components!
Bulk Supplements: Huge selection of health supplements!
Quora: I answer questions when folks request them!

Frequently Asked Questions: How to Convert a String to Byte Array in C#

Why is encoding and decoding important in string to byte array conversion?

Encoding and decoding is important in string to byte array conversion because it ensures that the characters are properly represented in the byte array and can be accurately converted back to the original string later.

Which method can be used to convert strings to byte arrays in C#?

In C#, the Encoding.GetBytes method can be used to convert strings to byte arrays. For example, Encoding.ASCII.GetBytes converts a string to a byte array using the ASCII encoding. There are other encodings to select from.

What are the limitations of using u003ccodeu003eEncoding.ASCII.GetBytesu003c/codeu003e for string to byte array conversion?

One limitation of using Encoding.ASCII.GetBytes is that it only supports ASCII characters and cannot handle non-ASCII characters or multibyte characters properly. To handle non-ASCII characters, you need to use other encodings such as UTF-8 or UTF-16.

What considerations should be made when choosing an encoding for string to byte array conversion?

When choosing an encoding for string to byte array conversion, it's important to consider the specific character set being used in the strings. Different encodings handle different character sets, and choosing the wrong encoding can result in incorrect conversions.

How can non-ASCII characters be handled in string to byte array conversion?

To handle non-ASCII characters in string to byte array conversion, you need to choose an appropriate encoding such as UTF-8 or UTF-16. These encodings can handle a wide range of characters and provide accurate conversions.

ASCII vs UTF8 - How To Navigate Character Encoding

If you're a programmer dealing with converting bytes to and from strings, you'll deal with character encodings. But in the ASCII vs UTF8 debate, who wins?

What Is String Interpolation In C# - What You Need To Know

What is string interpolation in C#? Learn about its definition, syntax, and benefits for improving code readability and efficiency. Check out this guide!

How to Format a String as Base64 in CSharp - Beginner's Guide

Learn how to format a string as Base64 in CSharp, including examples of encoding and decoding strings. Improve your coding skills with these practical tips!

Table of Contents