Chapter 10. Reading and writing files Save the last byte for me!

Sometimes it pays to be a little persistent.

So far, all of your programs have been pretty short-lived. They fire up, run for a while, and shut down. But that’s not always enough, especially when you’re dealing with important information. You need to be able to save your work. In this chapter, we’ll look at how to write data to a file, and then how to read that information back in from a file. You’ll learn about the streams, and also take a look at the mysteries of hexadecimal, Unicode, and binary data.

.NET uses streams to read and write data

A stream is the .NET Framework’s way of getting data in and out of your program. Any time your program reads or writes a file, connects to another computer over a network, or generally does anything where it sends or receives bytes from one place to another, you’re using streams. Sometimes you’re using streams directly. But even when you’re using classes that don’t directly expose streams, under the hood they’re almost always using streams.

Whenever you want to read data from a file or write data to a file, you’ll use a Stream object.

Let’s say you have a simple app that needs to read data from a file. A really basic way to do that is to use a Stream object.

And if your app needs to write data out to the file, it can use another Stream object.

Different streams read and write different things

Every stream is a subclass of the abstract Stream class, and there are many subclasses of Stream that do different things. We’ll be concentrating on reading and writing regular files, but everything you learn about streams in this chapter can apply to compressed or encrypted files, or network streams that don’t use files at all.

Things you can do with a stream:

Write to the stream.

You can write your data to a stream through a stream’s Write method.
Read from the stream.

You can use the Read method to get data from a file, or a network, or memory, or just about anything else, using a stream. You can even read data from really big files, even if they’re too big to fit into memory.
Change your position within the stream.

Most streams support a Seek method that lets you find a position within the stream so you can read or insert data at a specific place. But not every Stream class supports Seek—which makes sense, because you can’t always backtrack in some sources of streaming data.

Streams let you read and write data. Use the right kind of stream for the data you’re working with.

A FileStream reads and writes bytes to a file

When your program needs to write a few lines of text to a file, there are a lot of things that have to happen:

Create a new FileStream object and tell it to write to the file.
The FileStream attaches itself to a file.
Streams write bytes to files, so you’ll need to convert the string that you want to write to an array of bytes.
Call the stream’s Write method and pass it the byte array.
Close the stream so other programs can access the file.

Write text to a file in three simple steps

C# comes with a convenient class called StreamWriter that does all of those things in one easy step. All you have to do is create a new StreamWriter object and give it a filename. It automatically creates a FileStream and opens the file. Then you can use the StreamWriter’s Write and WriteLine methods to write everything to the file you want.

StreamWriter creates and manages a FileStream object for you automatically.

Use the StreamWriter’s constructor to open or create a file.

You can pass a filename to the StreamWriter’s constructor. When you do, the writer automatically opens the file. StreamWriter also has an overloaded constructor that lets you specify its append mode: passing it true tells it to add data to the end of an existing file (or append), while false tells the stream to delete the existing file and create a new file with the same name.
```
var writer = new StreamWriter("toaster oven.txt", true);
```
Use the Write and WriteLine methods to write to the file.

These methods work just like the ones in the Console class: Write writes text, and WriteLine writes text and adds a line break to the end.
```
writer.WriteLine($"The {appliance} is set to {temp} degrees.");
```
Call the Close method to release the file.

If you leave the stream open and attached to a file, then it’ll keep the file locked open and no other program will be able to use it. So make sure you always close your files!
```
writer.Close();
```

The Swindler launches another diabolical plan

The citizens of Objectville have long lived in fear of the Swindler, Captain Amazing’s arch-nemesis. Now he’s using a StreamWriter to implement another evil plan. Let’s take a look at what’s going on. Create a new Console Application and add this Main code, starting with a using declaration because StreamWriter is in the System.IO namespace.

StreamWriter’s Write and WriteLine methods work just like Console: Write writes text, and WriteLine writes text with a line break. And both classes support {curly brackets} like this:

sw.WriteLine("Clone #{0} attacks {1}",
          number, location);

When you include {0} in the text, it’s replaced by the first parameter after the string; {1} is replaced by the second, {2} by the third, etc.

Here’s the output of the app. Since you didn’t include a full path in the filename, it wrote the file to the same folder as the binary—so if you’re running your app inside Visual Studio, check the bin\Debug\netcoreapp3.1 folder underneath your solution folder.

Here’s the output that it writes to secret_plan.txt:

StreamWriter Magnets

Oops! These magnets were nicely arranged on the fridge with the code for the Flobbo class, but someone slammed the door and they all fell off. Can you rearrange them so the Main method produces the output below?

static void Main(string[] args) {
    Flobbo f = new Flobbo("blue yellow");
    StreamWriter sw = f.Snobbo();
    f.Blobbo(f.Blobbo(f.Blobbo(sw), sw), sw);
}

We added an extra challenge.

Something weird is going on with the Blobbo method. See how it has two different declarations in these first two magnets? We defined Blobbo as an overloaded method—there are two different versions, each with its own parameters, just like the overloaded methods you’ve used in previous chapters.

StreamWriter Magnets Solution

Your job was to construct the Flobbo class from the magnets to create the desired output.

static void Main(string[] args) {
    Flobbo f = new Flobbo("blue yellow");
    StreamWriter sw = f.Snobbo();
    f.Blobbo(f.Blobbo(f.Blobbo(sw), sw), sw);
}

Just a reminder: we picked intentionally weird variable names and methods in these puzzles because if we used really good names, the puzzle would be too easy! Don’t use names like this in your code, OK?

Use a StreamReader to read a file

Let’s read Swindler’s secret plans with a StreamReader, a class that’s a lot like StreamWriter—except instead of writing a file, you create a StreamReader and pass it the name of the file to read in its constructor. Its ReadLine method returns a string that contains the next line from the file. You can write a loop that reads lines from it until its EndOfStream field is true—that’s when it runs out of lines to read. Add this Console app that uses a StreamReader to read one file, and a StreamWriter to write another file.

StreamReader is a class that reads characters from streams, but it’s not a stream itself. When you pass a filename to its constructor, it creates a stream for you, and closes it when you call its Close method. It also has an overloaded constructor that takes a reference to a Stream.

Data can go through more than one stream

One big advantage to working with streams in .NET is that you can have your data go through more than one stream on its way to its final destination. One of the many types of streams in .NET Core is the CryptoStream class. This lets you encrypt your data before you do anything else with it. So instead of writing plain text to a regular old text file: The Swindler can chain streams together and send the text through a CryptoStream object before writing its output to a FileStream.

The Swindler can chain streams together and send the text through a CryptoStream object before writing its output to a FileStream.

You can CHAIN streams. One stream can write to another stream, which writes to another stream…often ending with a network or file stream.

Pool Puzzle

Your job is to take code snippets from the pool and place them into the blank lines in the program. You can use the same snippet more than once, and you won’t need to use all the snippets. Your goal is to make the program produce the output shown to the right.

Images Mini Sharpen your pencil

What text does the app write to delivery.txt?

Note: each snippet from the pool can be used more than once!

Pool Puzzle Solution

Images Mini Sharpen your pencil Solution

What text does the app write to delivery.txt?

North

there are no Dumb Questions

Q:Can you explain what you were doing with {0} and {1} when you called the StreamWriter Write and WriteLine methods?

A:When you’re printing strings to a file, you’ll often find yourself in the position of having to print the contents of a bunch of variables. For example, you might have to write something like this:

writer.WriteLine("My name is " + name +
    "and my age is " + age);

It gets really tedious and somewhat error-prone to have to keep using + to combine strings. It’s easier to use composite formatting, where you use a format string with placeholders like {0}, {1}, {2}, etc., and follow it with variables to replace the placeholders:

writer.WriteLine(
 "My name is {0} and my age is {1}", name, age);

You’re probably thinking, isn’t that really similar to string interpolation? And you’re right—it is! In some cases string interpolation may be easier to read, in other cases using a format string. And just like string interpolation, format strings support formatting. For example, {1:0.00} means format the second argument as a number with two decimal places, while {3:c} tells it to format the fourth argument in the local currency.

Oh, and one more thing—format strings work with Console.Write and Console.WriteLine too!

Q: What was that Path.DirectorySeparatorChar field that you used in the console app that used StringReader?

A: We wrote that code to work on both Window and MacOS, so we took advantage of some of .NET Core’s tools to help with that. Windows uses backslash \ characters as a path separator (C:\ Windows), while MacOS uses a forward slash / (/Users).

Path.DirectorySeparatorChar is a readonly field that’s set to the correct path separator character for the operating system: a “\” on Windows and “/” on MacOS and Linux.

We also used the Environment.GetFolderPath method, which returns the path of one of the special folders for the current user—in that case, the user’s Documents folder on Windows or home directory in MacOS.

Q:Near the beginning of the chapter you talked about converting a string to a byte array. How would that even work?

A:You’ve probably heard many times that files on a disk are represented as bits and bytes. What that means is that when you write a file to a disk, the operating system treats it as one long sequence of bytes. The StreamReader and StreamWriter are converting from bytes to characters for you—that’s called encoding and decoding. Remember from Chapter 4 how a byte variable can store any number between 0 and 255? Every file on your hard drive is one long sequence of numbers between 0 and 255. It’s up to the programs that read and write those files to interpret those bytes as meaningful data. When you open a file in Notepad, it converts each individual byte to a character—for example, E is 69 and a is 97 (but this depends on the encoding…you’ll learn more about encodings in just a minute). And when you type text into Notepad and save it, Notepad converts each of the characters back into a byte and saves it to disk. If you want to write a string to a stream, you’ll need to do the same.

Q:If I’m just using a StreamWriter to write to a file, why do I really care if it’s creating a FileStream for me?

A:If you’re only reading or writing lines to or from a text file in order, then all you need are StreamReader and StreamWriter. But as soon as you need to do anything more complex than that, you’ll need to start working with other streams. If you ever need to write data like numbers, arrays, collections, or objects to a file, a StreamWriter just won’t do. But don’t worry, we’ll go into a lot more detail about how that will work in just a minute.

Q:Why do I need to worry about closing streams after I’m done with them?

A:Have you ever had a word processor tell you it couldn’t open a file because it was “busy”? When one program uses a file, Windows locks it and prevents other programs from using it. And it’ll do that for your program when it opens a file. If you don’t call the Close method, then it’s possible for your program to keep a file locked open until it ends.

Both Console and StreamWriter can use composite formatting, which replaces placeholders with values of parameters passed to Write or WriteLine.

Use the File and Directory classes to work with files and directories

Like StreamWriter, the File class creates streams that let you work with files behind the scenes. You can use its methods to do most common actions without having to create the FileStreams first. Directory objects let you work with whole directories full of files.

Things you can do with File:

Find out if the file exists.

You can check to see if a file exists using the Exists method. It’ll return true if it does, and false if it doesn’t.
Read from and write to the file.

You can use the OpenRead method to get data from a file, or the Create or OpenWrite method to write to the file.
Append text to the file.

The AppendAllText method lets you append text to an already created file. It even creates the file if it’s not there when the method runs.
Get information about the file.

The GetLastAccessTime and GetLastWriteTime methods return the date and time when the file was last accessed and modified.

Things you can do with Directory:

Create a new directory.

Create a directory using the CreateDirectory method. All you have to do is supply the path; this method does the rest.
Get a list of the files in a directory.

You can create an array of files in a directory using the GetFiles method; just tell the method which directory you want to know about, and it will do the rest.
Delete a directory.

Deleting a directory is really simple too. Just use the Delete method.

IDisposable makes sure objects are closed properly

A lot of .NET classes implement a particularly useful interface called IDisposable. It has only one member: a method called Dispose. Whenever a class implements IDisposable, it’s telling you that there are important things that it needs to do in order to shut itself down, usually because it’s allocated resources that it won’t give back until you tell it to. The Dispose method is how you tell the object to release those resources.

Use the IDE to explore IDisposable

You can use the “Go to Definition” (or “Go to Declaration”) feature in the IDE to show you the definition of IDisposable. Go to your project and type IDisposable anywhere inside a class. Then right-click on it and select “Go To Definition” from the menu. It’ll open a new tab with code in it. Expand all of the code and this is what you’ll see:

Avoid filesystem errors with using statements

We’ve been telling you all chapter that you need to close your streams. That’s because some of the most common bugs that programmers run across when they deal with files are caused when streams aren’t closed properly. Luckily, C# gives you a great tool to make sure that never happens to you: IDisposable and the Dispose method. When you wrap your stream code in a using statement, it automatically closes your streams for you. All you need to do is declare your stream reference with a using statement, followed by a block of code (inside curly brackets) that uses that reference. When you do that, C# automatically calls the Dispose method as soon as it finishes running the block of code.

These “using” statements are different from the ones at the top of your code.

Use multiple using statements for multiple objects

You can pile using statements on top of each other—you don’t need extra sets of curly brackets or indents.

using (var reader = new StreamReader("secret_plan.txt"))
using (var writer = new StreamWriter("email.txt"))
{  
   // statements that use reader and writer
}

When you declare an object in a using block and that object’s Dispose method is called automatically.

Every stream has a Dispose method that closes the stream. When if you declare your stream in a using statement, it will always get closed! And that’s important, because some streams don’t write all of their data until they’re closed.

there are no Dumb Questions

Q: Why did you put an @ in front of the strings that contained the filenames in that “Sharpen Your Pencil” exercise?

A: When you add a string literal to your program, the compiler converts escape sequences like \n and \r to special characters. That makes it difficult to type filenames, which have a lot of backslash characters in them. If you put @ in front of a string, it tells C# not to interpret escape sequences. It also tells C# to include line breaks in your string, so you can hit Enter halfway through the string and it’ll include that as a line break in the output:

Q: Remind me again—what exactly are escape sequences?

A: An escape sequence is a way to include special characters in your strings. For example, \n is a line feed and \t is a tab, and \r is a return character, or half of a Windows return (in Windows text files, lines have to end with \r\n. For MacOS and Linux, lines just end in \n). If you need to include a quotation mark inside a string, you can use \” and it won’t end the string. And if you want to use an actual backslash in your string and not have C# interpret it as the beginning of an escape sequence, just do a double backslash: \\.

Q: You mentioned a stream called MemoryStream that writes to memory. Why would I want to use that?

A: We’ve been using streams to read and write files. But what if you want to read data from a file and then, well, do something with it? And that’s where MemoryStream comes in. When you create a new MemoryStream, it starts keeping track of all data streamed into it—for example, you can create a new MemoryStream and pass it as an argument to a StreamWriter constructor, and then any data you write with the StreamWriter is sent to that MemoryStream. You can retrieve that data using the MemoryStream.ToArray method, which returns all of the data that’s been streamed to it in a byte array.

Q: What do I do with data once I have it in a byte array?

A: One of the most common things that you’ll do with byte arrays is to convert them to strings. For example, if you have a byte array called bytes, here’s one way to convert a byte array to a string:

We gave you a chance to sleuth this problem out on your own. Here’s how we fixed it.

Sleuth it out

Sherlock Holmes once said, “Data! Data! Data! I can’t make bricks without clay.” So let’s start at the scene of the crime: our code that’s not working. And we’ll scour it for all of the data that we can find by digging up clues.

How many of these clues did you spot:

We instantiate a StreamWriter that feeds data into a new MemoryStream
The StreamWriter writes a line of text to the MemoryStream
The contents of the MemoryStream are copied to an array and converted to a string
This all happens inside a using block, so the streams are definitely closed

If you spotted all of those clues, then congratulations—you’ve been sharpening your code detective skills! But like in every great mystery, there’s always one last clue, the fact that we learned earlier that proves to be the key to unraveling the whole crime and finding the culprit.

We used a using block, so we know that the streams definitely get closed. But when are they closed? Which leads us to the keystone to this mystery, the all-important clue that we learned just moments before the crime:

The StreamWriter and MemoryStream are declared in the same using block, so both Dispose methods are called after the last line in the block is executed. So what does that mean? It means the MemoryStream.ToArray method is called before the StreamWriter is closed.

So we can fix the problem by adding a nested using block to first close the StreamWriter and then call ToArray:

Stream objects often have data in memory that’s buffered, or waiting to be written. When the stream empties all of that data, it’s called flushing. If you need to flush the buffered data without closing the stream, you can also call its Flush method.

Exercise

In Chapter 8 you created a Deck class that kept track of a sequence of Card objects, with methods to reset it to a 52-card deck in order, shuffle the cards to randomize their order, and sort the cards to put them back in order. Now you’ll add a method to write the cards to a file, and a constructor that lets you initialize a new deck by reading cards from a file.

Start by reviewing the Deck and Card classes that you wrote in Chapter 8

You created your Deck class by extending a generic collection of Card objects. This allowed you to use some useful members that Deck inherited from Collection<Card>:

The Count property returns the number of cards in the deck
The Add method adds a card to the top of the deck
The RemoveAt method removes a card at a specific index from the deck
The Clear method removes all cards from the deck

That gave you a solid starting point to add a Reset method that clears the deck and then adds 52 cards in order (ace through king in each suit), a Deal method to remove the card from the top of the deck and return it, a Shuffle method to randomize the order of the cards, and a Sort method to put them back in order.

Add a method to write all of the cards in the deck to a file

Your Card class has a Name property that returns a string like “Three of Clubs” or “Ace of Hearts”. Add a method called WriteCards that takes a string with a filename as parameter and writes the name of each card to a line in that file. So if you reset the deck and then call WriteCards, it will write 52 lines to the file, one for each card.

Add an overloaded Deck constructor that reads a deck of cards in from a file

Add a second constructor to the Deck class. Here’s what it should do:

In Chapter 9 we learned that switch expressions must be exhaustive, so add a default case that throws a new InvalidDataException if it encounters a suit or value that it doesn’t recognize—this will make sure each card is valid.

Here’s a Main method that you can use to test your app. It creates a deck with 10 random cards, writes it to a file, and then reads that file into a second deck and writes each of its cards to the console.

static void Main(string[] args) {
    var filename = "deckofcards.txt";
    Deck deck = new Deck();
    deck.Shuffle();
    for (int i = deck.Count - 1; i > 10; i--)
        deck.RemoveAt(i);
    deck.WriteCards(filename);

    Deck cardsToRead = new Deck(filename);
    foreach (var card in cardsToRead)
        Console.WriteLine(card.Name);
}

BULLET POINTS

Whenever you want to read data from a file or write data to a file, you’ll use a Stream object. Stream is an abstract class, with subclasses that do different things.
A FileStream lets you read from and write to files. A MemoryStream reads or writes data to memory.
You can write your data to a stream through a stream’s Write method, and read data using its Read method.
Remember to always close a stream after you’re done with it. Some streams don’t write all of their data until they’re closed or their Flush methods are called.
A StreamWriter is an easy way to read data from a file. StreamWriter creates and manages a FileStream object for you automatically.
A StreamReader reads characters from streams, but it’s not a stream itself. constructor, it creates a stream for you, and closes it when you call its Close method.
The Write and WriteLine methods of StreamWriter and Console use composite formatting, which takes a format string with placeholders like {0}, {1}, {2} that support formatting like {1:0.00} and {3:c}.
Path.DirectorySeparatorChar is a readonly field that’s set to the path separator character for the operating system: a “\” on Windows and “/” on MacOS and Linux.
The Environment.GetFolderPath method returns the path of one of the special folders for the current user, such as the user’s Documents folder on Windows or home directory in MacOS.
The File class has static methods including Exists (which checks if a file exists), OpenRead and OpenWrite (to get streams to read from or write to the file), and AppendAllText (to write text to a file in one statement).
The Directory class has static methods including CreateDirectory (to create folders), GetFiles (to get the list of files), and Delete (to remove it).
The FileInfo class is similar to the File class, except instead of using static methods it’s instantiated.
The IDisposable interface makes sure objects are closed properly. It includes one member, the Dispose method, which provides a mechanism for releasing unmanaged resources.
Use a using statement to instantiate a class that implements IDisposable. The using statement is followed by a block of code; the object instantiated in the using statement is disposed at the end of the block.
Use multiple using statements in a row to declare objects that are disposed at the end of the same block.

Windows and MacOS have different line endings

If you’re running Windows, open Notepad. If you’re running MacOS, open TextEdit. Create a file with these two lines. The first line has the characters L1 and the second has the characters L2.

If you used Windows, it will contain these six bytes: 76 49 13 10 4c 50

If you used MacOS, it will contain these five bytes: 76 49 10 76 50

Can you spot the difference? You can see that the first and second lines are encoded with the same bytes: L is 76, 1 is 49, and 2 is 50. But the line break is encoded differenly: on Windows it’s encoded with two bytes, 13 and 10. But on MacOS it’s encoded with one byte, 10. This is the difference between Windows-style and Unix-style line endings (MacOS is a flavor of Unix). If you need to write code that runs on different operating systems and writes files with line endings, you can use the static Environment.NewLine property, which returns “\r\n” on Windows and “\r” on MacOS or Unix.

There’s an easier way to store your objects in files. It’s called serialization.

Serialization means writing an entire object’s state to a file or string. Deserialization means reading the object’s state back from that file or string. So instead of painstakingly writing out each field and value to a file line by line, you can save your object the easy way by serializing it out to a stream. Serializing an object is like flattening it out so you can slip it into a file. And on the other end, you can deserialize it, which is like taking it out of the file and inflating it again.

OK, just to come clean here: there’s also a method called Enum.Parse that will convert the string “Spades” to the enum value Suits.Spades. And it even has a companion, Enum. TryParse, that works just like the int.TryParse method you’ve used throughout this book. But serialization still makes a lot more sense here. You’ll find out more about that shortly....

What happens to an object when it’s serialized?

It seems like something mysterious has to happen to an object in order to copy it off of the heap and put it into a file, but it’s actually pretty straightforward.

1 Object on the heap

When you create an instance of an object, it has a state. Everything that an object “knows” is what makes one instance of a class different from another instance of the same class.
Object serialized

When C# serializes an object, it saves the complete state of the object, so that an identical instance (object) can be brought back to life on the heap later.
And later on…

Later—maybe days later, and in a different program—you can go back to the file and deserialize it. That pulls the original class back out of the file and restores it exactly as it was, with all of its fields and values intact.

But what exactly IS an object’s state? What needs to be saved?

We already know that an object stores its state in its fields and properties. So when an object is serialized, each of those values needs to be saved to the file.

Serialization starts to get interesting when you have more complicated objects. Chars, ints, doubles, and other value types have bytes that can just be written out to a file as-is. But what if an object has an instance variable that’s an object reference? What about an object that has five instance variables that are object references? What if those object instance variables themselves have instance variables?

Think about it for a minute. What part of an object is potentially unique? Imagine what needs to be restored in order to get an object that’s identical to the one that was saved. Somehow everything on the heap has to be written to the file.

When an object is serialized, all of the objects it refers to get serialized, too…

…and all of the objects they refer to, and all of the objects those other objects refer to, and so on and so on. But don’t worry—it may sound complicated, but it all happens automatically. C# starts with the object you want to serialize and looks through its fields for other objects. Then it does the same for each of them. Every single object gets written out to the file, along with all the information C# needs to reconstitute it all when the object gets deserialized.

A group of objects connected to each other by references is sometimes referred to as a graph.

Use JsonSerialization to serialize your objects

You’re not just limited to reading and writing lines of text to your files. You can use JSON serialization to let your programs copy entire objects to strings (which you can write to files!) and read them back in…all in just a few lines of code! Let’s take a look at how this works. Start by creating a new console app.

Do this!

Design some classes for your object graph.

Add this HairColor enum and these Guy and HairStyle classes to your new console app.

class Guy {
    public string Name { get; set; }
    public HairStyle Hair { get; set; }
    public Outfit Clothes { get; set; }
    public override string ToString() => $"{Name} with {Hair} wearing {Clothes}";
}

class Outfit {
    public string Top { get; set; }
    public string Bottom { get; set; }
    public override string ToString() => $"{Top} and {Bottom}";
}

enum HairColor { 
    Auburn, Black, Blonde, Blue, Brown, Gray, Platinum, Purple, Red, White
}

class HairStyle {
    public HairColor Color { get; set; }
    public float Length { get; set; }
    public override string ToString() => $"{Length:0.0} inch {Color} hair";
}

Create a graph of objects to serialize.

Now create a small graph of objects to serialize: a new List<Guy> pointing to a couple of Guy objects. Add this code to your Main method that uses a collection initializer and object initializers to build the object graph:

static void Main(string[] args) {
    var guys = new List<Guy>() {
       new Guy() { Name = "Bob", Clothes = new Outfit() { Top = "t-shirt", Bottom = "jeans" },
          Hair = new HairStyle() { Color = HairColor.Red, Length = 3.5f }
       },
       new Guy() { Name = "Joe", Clothes = new Outfit() { Top = "polo", Bottom = "slacks" },
          Hair = new HairStyle() { Color = HairColor.Gray, Length = 2.7f }
       },
};

Use JsonSerializer to serialize the objects to a string.

First add a using directive to the top of your code file:
```
using System.Text.Json;
```
Now you can serialize the entire graph with a single line of code:
```
  var jsonString = JsonSerializer.Serialize(guys);
  Console.WriteLine(jsonString);
```
Run your app and look closely at what it prints to the console:
```
[{"Name":"Bob","Hair":{"Color":8,"Length":3.5},"Clothes":{"Top":"t-shirt","Bot
tom":"jeans"}},{"Name":"Joe","Hair":{"Color":5,"Length":2.7},"Clothes":{"Top":
"polo","Bottom":"slacks"}}]
```
That’s your object graph serialized to JSON (which some people pronounce “Jason” and others pronounce “JAY-sahn”). It’s a human readable data interchange format, which means that it’s a way to store complex objects using strings that a person can make sense of. And because it’s human readable, you can see that it has all of the parts of the graph: names and clothes are encoded as strings (“Bob”, “t-shirt”) and enums are encoded as their integer values.
Use JsonSerializer to deserialize the JSON to a new object graph.

Now that we have a string that contains the object graph serialized to JSON, we can deserialize it. That just means using it to create new objects. And JsonSerializer lets us do that in one line of code, too. Add this to the Main method:
```
var copyOfGuys = JsonSerializer.Deserialize<List<Guy>>(jsonString);
foreach (var guy in copyOfGuys)
    Console.WriteLine("I deserialized this guy: {0}", guy);
```
Run your app again. It deserializes the guys from the JSON string and writes them to the console:
```
I deserialized this guy: Bob with 3.5 inch Red hair wearing t-shirt and jeans
I deserialized this guy: Joe with 2.7 inch Gray hair wearing polo and slacks
```

JSON Up Close

Let’s take a closer look at how JSON actually works. Go back to your app with the Guy object graph and replace the line that serializes the graph to a string with this:

var options = new JsonSerializerOptions() { WriteIndented = true };
var jsonString = JsonSerializer.Serialize(guys, options);

That code calls an overloaded JsonSerializer.Serialize method that takes a JsonSerializerOptions object that lets you set options for the serializer. In this case, you’re telling it to write the JSON as indented text—in other words, it adds line breaks and spaces to make the JSON easier for people to read.

Now run the program again. The output should look like this:

[
  {
    "Name": "Bob",
    "Hair": {
      "Color": 8,
      "Length": 3.5
   },
    "Clothes": {
      "Top": "t-shirt",
      "Bottom": "jeans"
    }
  },
{
    "Name": "Joe",
    "Hair": {
     "Color": 5,
     "Length": 2.7
    },
    "Clothes": {
      "Top": "polo",
      "Bottom": "slacks"
    }
  }
]

Let’s break down exactly what we’re seeing:

The JSON starts and ends with square braces [ ]. This is how a list is serialized in JSON. A list of numbers would look like this: [1, 2, 3, 4]
This particular JSON represents a list with two objects. Each object starts and ends with curly braces { } – and if you look at the JSON, you can see that the second line is an opening curly brace {, the second-to-last line is a closing curly brace {, and in the middle there’s a line with }, followed by a line with {. That’s how JSON represents two objects—in this case, the two Guy objects.
Each object contains a set of keys and values that correspond to the serialized object properties, separated by commas. For example, “Name”: “Joe”, represents the first Guy object’s Name property.
The Guy.Clothes property is an object reference that points to an Outfit object. It’s represented by a nested object with values for Top and Bottom.

JSON only includes data, not specific C# types

When you were looking through the JSON data, you saw human-readable versions of the data in your objects: strings like “Bob” and “slacks”, numbers like 8 and 3.5, and even lists and nested objects. But did you think about what you didn’t see? JSON data does not include the names of types like Guy, Outfit, HairColor, or HairStyle. That’s because JSON just contains the data, and JsonSerializer will do its best to deserialize the data into whatever properties it finds.

Let’s put this to the test. Add a new class to your project:

class Dude
{
    public string Name { get; set; }
    public HairStyle Hair { get; set; }
}

Now add this code to the end of your Main method:

And run your code again. Since the JSON just has a list of objects, JsonSerializer.Deserialize will happily stick them into a Stack (or a Queue, or an array, or another collection type). And since Dude has public Name and Hair properties that match the data, it will fill in any data that it can. Here’s what it prints to the output—it prints Bob first because it’s popping them off of the stack in first-in-last-out, and Joe was added before Bob:

I deserialized this guy: Bob with 3.5 inch Red hair wearing t-shirt and jeans
I deserialized this guy: Joe with 2.7 inch Gray hair wearing polo and slacks

Sharpen your pencil

Let’s use JsonSerializer to explore how strings are translated into JSON. Add the following code to a console app, then write down what each line of code writes to the console. The last line serializes the elephant animal emoji.

You used the emoji panel in Chapter 1 to enter emoji.

Console.WriteLine(JsonSerializer.Serialize(3));                          ...................
Console.WriteLine(JsonSerializer.Serialize((long)-3));                   ...................
Console.WriteLine(JsonSerializer.Serialize((byte)0));                    ...................
Console.WriteLine(JsonSerializer.Serialize(float.MaxValue));             ...................
Console.WriteLine(JsonSerializer.Serialize(float.MinValue));             ...................
Console.WriteLine(JsonSerializer.Serialize(true));                       ...................
Console.WriteLine(JsonSerializer.Serialize("Elephant"));                 ...................
Console.WriteLine(JsonSerializer.Serialize("Elephant".ToCharArray()));   ...................
Console.WriteLine(JsonSerializer.Serialize(""));                     ...................

C# strings are encoded with Unicode

We’ve been using strings since you typed “Hello, world!” into the IDE at the start of Chapter 1. And because strings are so intuitive, we haven’t really needed to dissect them and figure out what makes them tick. But ask yourself... what exactly is a string?

A C# string is a read-only collection of chars. So if you actually look at how a string is stored in memory, you’ll the string "Elephant" stored as chars 'E', 'l', 'e', 'p', 'h', 'a', 'n', and 't'. But now ask yourself... what exactly is a char?

A char is a character represented with Unicode. Unicode is an industry standard for encoding characters, or converting them into bytes so they can be stored in memory, transmitted across networks, included in documents, or do pretty much anything you want with them—and you’re guaranteed that you’ll always get the correct character.

This is especially important when you consider just how many characters there are. The Unicode standard supports over 150 scripts (sets of characters for specific languages), including not just Latin (which has the 26 English letters and variants like é and ç) but scripts for many languages around the world. The list of supported scripts is constantly growing, as the Unicode Consortium adds new ones every year (here’s the current list: http://www.unicode.org/standard/supported.html).

Unicode supports another, really important set of characters: emoji. All of the emoji, from the winking smiley face ( Images ) to the ever-popular pile of poo ( Images ) are Unicode characters.

Every Unicode character—including emoji—has a unique number.

The number for a Unicode character is called a code point. You download a list of all of the Unicode characters here: https://www.unicode.org/Public/UNIDATA/UnicodeData.txt – that’s a large text file with a line for every Unicode character. Download it and search for this “ELEPHANT” and you’ll find a line that starts like this: 1F418;ELEPHANT. The numbers 1F418 represent a hexadecimal (or hex) value. Hex values are written with numbers 0 to 9 and letters A to F. You can create a hex literal in C# by adding 0x to the beginning, like this: 0x1F418.

1F418 is the Elephant emoji’s UTF-8 code point. UTF-8 is the most common way to encode a character as Unicode (or represent it as a number). It’s a variable-length encoding, using between 1 and 4 bytes. In this case, it uses three bytes: 0x01 (or 1), 0xF4 (or 244), and 0x18 (or 24).

But that’s not what JSON serializer printed. It printed a longer Hex number: D83DDC18. That’s because the C# char type uses UTF-16, which uses code points that consist of either one or two two-byte numbers. The UTF-16 code point of the elephant emoji is 0xD83D 0xDC18. UTF-8 is much more popular than UTF-16, especially on the Web, so when you look up code points you’re much more likely to find UTF-8 than UTF-16.

Visual Studio works really well with Unicode

Let’s use Visual Studio to how the IDE works with Unicode characters. You saw back in Chapter 1 that you can use emoji in code. Let’s see what else the IDE can handle. Go to the code editor and enter this code:

Console.WriteLine("Hello ");

If you’re using Windows, open up the Character Map app. If you’re using Mac, press Ctrl- Images -space to pop up the Character Viewer. Then search for the Hebrew letter shin ( Images ) and copy it to the clipboard.

Place your cursor at the end of the string between the space and the quotation mark, and paste the shin character that you copied to the clipboard. Hmm, something looks weird:

Did you notice that the cursor is positioned to the left of the pasted letter? Well, let’s continue. Don’t click anywhere in the IDE—keep the cursor where it is, then switch over to Character Map or Character Viewer to search for the Hebrew letter lamed ( Images ). Switch back to the IDE—make sure the cursor is still positioned just left of the shin—and paste in the lamed.

When you pasted the lamed, the IDE added it to the left of the shin. Now search for the Hebrew letters vav (I) and final mem ( Images ). Paste each of them into the IDE—it will insert them to the left of the cursor:

The IDE knows that Hebrew is read right-to-left, so it’s behaving accordingly. Click to select the text near the beginning of the statement, and slowly drag your cursor right to select Hello and then Images – watch carefully what happens when the selection reaches the Hebrew letters. It skips to the shin Images and then selects from right to left—and that’s exactly what a Hebrew reader would expect it to do.

.NET uses Unicode to store characters and text

The two C# types for storing text and characters—string and char—keep their data in memory as Unicode. When that data’s written out as bytes to a file, each of those Unicode numbers is written out to the file. Let’s get a sense of exactly how Unicode data is written out to a file. Create a new Console app we’ll use the File.WriteAllBytes and File.ReadAllBytes methods to start exploring Unicode.

Do this!

Write a normal string out to a file and read it back.

Add the following code to the Main method—it uses File.WriteAllText to write the string “Eureka!” out to a file called eureka.txt (so you’ll need using System.IO;). Then it creates a new byte array called eurekaBytes, reads the file into it, and prints out all of the bytes it read:
```
File.WriteAllText("eureka.txt", "Eureka!");
byte[] eurekaBytes = File.ReadAllBytes("eureka.txt");
foreach (byte b in eurekaBytes)
    Console.Write("{0} ", b);
Console.WriteLine(Encoding.UTF8.GetString(eurekaBytes));
```
The ReadAllBytes method returns a reference to a new array of bytes that contains all of the bytes that were read in from the file.

You’ll see these bytes written to the output: 69 117 114 101 107 97 33. The last line calls Encoding. UTF8.GetString, which converts a byte array with UTF-8 encoded characters to a string. Now open the file in the Notepad (Windows) or TextEdit (Mac). It says “Eureka!”
Then add code to write the bytes as hex numbers.

Whenever you’re encoding data you often use hex, so let’s do that now. Add this code to the end of the Main method that writes the same bytes out, using {0:x2} to format each byte as a hex number:
```
foreach (byte b in eurekaBytes)
    Console.Write("{0:x2} ", b);
Console.WriteLine();
```
Hex uses the numbers 0 through 9 and letters A through F to represent numbers in base 16, so 6B is equal to 107.

That tells Write to print parameter 0 (the first one after the string to print) as a two-character hex code. So it writes the same seven bytes in hex instead of decimal: 45 75 72 65 6b 61 21
Modify the first line to write Hebrew letters “” instead of “Eureka!”

You just added the Hebrew text "" to another program using either Character Map (Windows) or Character Viewer (Mac). Comment out the first line of the Main method and replace it with the following code that writes "" to the file instead of "Eureka!“. we’ve added an extra Encoding. Unicode parameter so it writes UTF-16—the Encoding class is in the System.Text namespace, so add using System.Text; to the top:
```
File.WriteAllText("eureka.txt", "", Encoding.Unicode);
```
Now run the code again, and look closely at the output: ff fe e9 05 dc 05 d5 05 dd 05. The first two characters are “FF FE”, which is the Unicode way of saying that we’re going to have a string of two-byte characters. The rest of the bytes are the Hebrew letters—but they’re reversed, so U+05E9 appears as e9 05. Now open the file up in Notepad or Text Edit to make sure it looks right.
Use JsonSerializer to explore UTF-8 and UTF-16 code points.

When you serialized the elephant emoji, JsonSerializer generated this: \uD83D\uDC18 – which we now know is the four-byte UTF-16 code point in hex. Now let’s try that with the Hebrew letter shin. Add using System.Text.Json; to the top of your app and then add this line:
```
Console.WriteLine(JsonSerializer.Serialize(""));
```
Run your app again. This time it printed a code with two hex bytes: "\u05E9" – that’s the UTF-16 code point for the Hebrew letter shin. It’s also the UTF-8 code point for the same letter.

But wait a minute—we learned that the UTF-8 code for the elephant emoji is 0x1F418, which is different than the UTF-16 code point (0xD83D 0xDC18). What’s going on?

It turns out that most of the characters with two-byte UTF-8 code points have the same code points in UTF-16. But once you reach the UTF-8 values that require three or more bytes—which includes the familiar emoji that we’ve used in this book—they differ. So while Hebrew letter shin is 0x05E9 in both UTF-8 and UTF-16, the elephant emoji is 0x1F418 in UTF-8 and 0xD8ED 0xDC18 in UTF-16.

Use \u escape sequences to include Unicode in strings

When you serialized the elephant emoji, JsonSerializer generated this: \uD83D\uDC18 – which we now know is the four-byte UTF-16 code point for emoji in hex. That’s because both JSON and C# strings use UTF-16 escape sequences – and it turns out JSON uses the same escape sequences.

Characters with two-byte code points like are represented with a \u followed by the hex code point (\u05E9) and characters with four-byte code points like are represented with \u and the highest two bytes, followed by \u and the lowest two bytes.

C# also has another Unicode escape sequence: \U (with an UPPERCASE U) followed by eight hex bytes lets you embed a UTF-32 code point, which is always 4 bytes long. That’s yet another Unicode encoding, and it’s really useful because it’s really easy to convert UTF-8 to UTF-32: just pad the hex number with zeroes, so the UTF-32 code point for is \U000005E9, and for it’s \U0001F418.
Use Unicode escape sequences to encode .

Add these lines to your Main method to write the elephant emoji to two files using both the UTF-16 and UTF-32 escape sequences:
```
File.WriteAllText("elephant1.txt", "\uD83D\uDC18");
File.WriteAllText("elephant2.txt", "\U0001F418");
```
Run your app again, then open both of those files in Notepad or Text Edit. You should see the correct character written to the file.

You used UTF-16 and UTF-32 escape sequences to create your emoji, but the WriteAllText method writes a UTF-8 file. The Encoding.UTF8.GetString method you used in step 1 converts a byte array with UTF-8 encoded data back to a string.

C# can use byte arrays to move data around

Since all your data ends up encoded as bytes, it makes sense to think of a file as one big byte array. And you already know how to read and write byte arrays.

Here’s the code to create a byte array, open an input stream, and read the text ‘Hello!!’ into bytes 0 through 6 of the array.

Use a BinaryWriter to write binary data

StreamWriter also encodes your data. It just specializes in text and text encoding—it defaults to UTF-8.

You could encode all of your strings, chars, ints, and floats into byte arrays before writing them out to files, but that would get pretty tedious. That’s why .NET gives you a very useful class called BinaryWriter that automatically encodes your data and writes it to a file. All you need to do is create a FileStream and pass it into the BinaryWriter’s constructor (they’re in the System.IO namespace, so you’ll need using System.IO;). Then you can call its methods to write out your data. So let’s create a new Console Application that uses BinaryWriter to write binary data to a file.

Do this!

Start by creating a console Application and setting up some data to write to a file.
```
  int intValue = 48769414;
  string stringValue = "Hello!";
  byte[] byteArray = { 47, 129, 0, 116 };
  float floatValue = 491.695F;
  char charValue = 'E';
```
If you use File.Create, it’ll start a new file—if there’s one there already, it’ll blow it away and start a brand-new one. There’s also the File.OpenWrite method, which opens the existing one and starts overwriting it from the beginning.

To use a BinaryWriter, first you need to open a new stream with File.Create:

  using (var output = File.Create("binarydata.dat"))
  using (var writer = new BinaryWriter(output))
  {

Now just call its Write method. Each time you do, it adds new bytes onto the end of the file that contain an encoded version of whatever data you passed it as a parameter.
Now use the same code you used before to read in the file you just wrote.
```
  byte[] dataWritten = File.ReadAllBytes("binarydata.dat");
  foreach (byte b in dataWritten)
      Console.Write("{0:x2} ", b);
  Console.WriteLine(" - {0} bytes", dataWritten.Length); 
```
Write down the output in the blanks below. Can you figure out what bytes correspond to each of the five writer.Write(...) statements? Put a bracket under each group of bytes that corresponds with each statement, and write the name of the variable under it.

__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ - ___ bytes

Use BinaryReader to read the data back in

The BinaryReader class works just like BinaryWriter. You create a stream, attach the BinaryReader object to it, and then call its methods. But the reader doesn’t know what data’s in the file! And it has no way of knowing. Your float value of 491.695F was encoded as d8 f5 43 45. But those same bytes are a perfectly valid int—1,140,185,334. So you’ll need to tell the BinaryReader exactly what types to read from the file. Add the following code to your program, and have it read the data you just wrote.

Don’t take our word for it. Replace the line that reads the float with a call to ReadInt32. (You’ll need to change the type of floatRead to int.) Then you can see for yourself what it reads from the file.

Start out by setting up the FileStream and BinaryReader objects:

   using (var input = File.OpenRead("binarydata.dat"))
   using (var reader = new BinaryReader(input))
{

You tell BinaryReader what type of data to read by calling its different methods.

You tell BinaryReader what type of data to read by calling its different methods.

    Console.Write("int: {0} string: {1} bytes: ", intRead, stringRead);
    foreach (byte b in byteArrayRead)
        Console.Write("{0} ", b);
    Console.Write(" float: {0} char: {1} ", floatRead, charRead);
  }

Here’s the output that gets printed to the console:

  int: 48769414 string: Hello! bytes: 47 129 0 116 float: 491.695 char: E

A hex dump lets you see the bytes in your files

A hex dump is a hexadecimal view of the contents of a file, and it’s a really common way for programmers to take a deep look at a file’s internal structure. We’ve been talking about hexadecimal (or hex) throughout the chapter.

It turns out that hex is a convenient way to display bytes in a file. A byte takes 2 characters to display in hex: bytes range from 0 to 255, or 00 to ff in hex. That lets you see a lot of data in a really small space, and in a format that makes it easy to spot patterns. And it’s useful to display binary data in rows that are 8, 16, or 32 bytes long because most binary data tends to break down in chunks of 4, 8, 16, or 32…like all the types in C#. (For example, an int takes up 4 bytes.) A hex dump lets you see exactly what those values are made of.

How to make a hex dump of some plain text

Start with some familiar text using Latin characters:

  When you have eliminated the impossible, whatever remains, however
  improbable, must be the truth. - Sherlock Holmes

First, break up the text into 16-character segments, starting with the first 16: When you have el

Next, convert each character in the text to its UTF-8 code point—and since the Latin characters all have one-byte UTF-8 code points, each will be represented by a two-digit hex number from 00 to 7F.

Then print each segment, starting with its offset (or position in the file) written as a hex number followed by a colon and a space, then the first eight code points in hex, then a divider (a space, two hyphens, and another space), then the next eight code points, then four spaces and the dumped characters:

  0000: 57 68 65 6e 20 79 6f 75 -- 20 68 61 76 65 20 65 6c When you have el

Repeat until you’ve dumped every 16-character segment:

  0000: 57 68 65 6e 20 79 6f 75 -- 20 68 61 76 65 20 65 6c   When you have el
  0010: 69 6d 69 6e 61 74 65 64 -- 20 74 68 65 20 69 6d 70   iminated the imp
  0020: 6f 73 73 69 62 6c 65 2c -- 20 77 68 61 74 65 76 65   ossible, whateve
  0030: 72 20 72 65 6d 61 69 6e -- 73 2c 20 68 6f 77 65 76   r remains, howev
  0040: 65 72 20 69 6d 70 72 6f -- 62 61 62 6c 65 2c 20 6d   er improbable, m
  0050: 75 73 74 20 62 65 20 74 -- 68 65 20 74 72 75 74 68   ust be the truth
  0060: 2e 20 2d 20 53 68 65 72 -- 6c 6f 63 6b 20 48 6f 6c   . - Sherlock Hol
  0070: 6d 65 73 0a             --                           mes.

And that’s the dump. There are many hex dump programs for various operating systems, and each of them has a slightly different output. Each line in our particular hex dump format represents 16 characters in the input that was used to generate it. The first four characters are the offset in the file—the first line starts at character 0, the next at character 16 (or hex 10), then character 32 (hex 20), etc.

A hex dump is a hexadecimal view of data in a file or memory, and can be really useful tool to help you to debug binary data.

Use SteamReader to build a simple hex dumper

Let’s build a simple hex dump app that uses StreamReader to read data from a file and writes it dump to the console. We’ll take advantage of the ReadBlock method in StreamReader, which reads a block of characters into a char array: you specify the number of characters you want to read, and it’ll either read that many characters or, if there are fewer than that many left in the file, it’ll read the rest of the file. Since we’re displaying 16 characters per line, we’ll read blocks of 16 characters.

Create a new console app called HexDump. Before you add code, run the app to create the folder with the binary. Use Notepad or TextEdit to create a text file called textdata.txt, add some text to it, and put it in the same folder as the binary.

Here’s the Main method—it reads the textdata.txt file and writes a hex dump to the console:

The ReadBlock method reads the next characters from its input into a byte array (sometimes referred to as a buffer). It blocks, which means it keeps executing and doesn’t return until it’s either read all of the characters you asked for or run out of data to read.

The String.Substring method returns a part of a string. The first parameter is the starting position (in this case, the beginning of the string), and the second is the number of characters to include in the substring. And the String class has an overloaded constructor that takes a char array as a parameter and converts it to a string.

Now run your app. It will print a hex dump to the console:

0000: 45 6c 65 6d 65 6e 74 61 -- 72 79 2c 20 6d 79 20 64   Elementary, my d
0010: 65 61 72 20 57 61 74 73 -- 6f 6e 21                  ear Watson!

Use Stream.Read to read bytes from a stream

The hex dumper works just fine for text files—but there’s a problem. Copy the binarydata.dat file you wrote with BinaryWriter into the same folder as your app, then change the app to read it:

  using (var reader = new StreamReader("binarydata.dat"))

Now run your app. This time it prints something else—but it’s not quite right:

The text characters (“Hello!”) seem okay. But compare the output with the “Sharpen your pencil” solution—the bytes aren’t quite right. It looks like it replaced some bytes (86, e8, 81, f6, d8, and f5) with a different byte, fd. That’s because StreamReader is built to read text files, so it only reads 7-bit values, or byte values up to 127 (7F in hex, or 1111111 in binary—which are 7 bits).

So let’s do this right—by reading the bytes directly from the stream. Modify the using block so it uses File.OpenRead, which opens the file and returns a FileStream. You’ll use the Stream’s Length property to keep reading until you’ve read all of the bytes in the file, and its Read method to read the next 16 bytes into the byte array buffer:

The rest of the code is the same, except for the line that sets bufferContents:

  // Write the actual characters in the byte array
  var bufferContents = Encoding.UTF8.GetString(buffer);

You used the Encoding class earlier in the chapter to convert a byte array to a string. This byte array contains a single byte per character—that means it’s a valid UTF-8 string. And that means you can use Encoding.UTF8.GetString to convert it. And since the Encoding class is in the System.Text namespace, you’ll need to add using System.Text; to the top of the file.

Now run your app again. This time it prints the correct bytes instead of changing them to fd:

There’s just one more thing we can do to clean up the output. Many hex dump programs replace non-text characters with dots in the output. Add this line to the end of the for loop:

  if (buffer[i] < 0x20 || buffer[i] > 0x7F) buffer[i] = (byte)'.';

Now run your app again—this time the question marks are replaced with dots:

  0000: 86 29 e8 02 06 48 65 6c -- 6c 6f 21 2f 81 00 74 f6    .)...Hello!/..t.
  0010: d8 f5 43 45             --                            ..CE

Modify your hex dumper to use command-line arguments

Most hex dump programs are utilities that you run from the command line. You can dump a file by passing its name to the hex dumper as a command-line argument: C:\>HexDump myfile.txt

If you don’t pass it a filename, it reads its input from standard input (or stdin): C:\>dir || HexDump

When you create a console app, C# makes the command-line arguments available as the args string array that gets passed to the Main method:

  static void Main(string[] args)

Let’s modify the hex dumper to use command-line arguments. First, add this GetInputStream method that uses a switch expression to return a stream. If the user passed a command line argument, the app will use it to call File.OpenRead to get a FileStream. If they didn’t, the app will open the standard input instead. static Stream GetInputStream(string[] args) => args.Length switch

Now modify the Main method to use the stream—and it will work the same whether that stream comes from a file or standard input. We’ll

Test the dump from stdin by debugging the app in the IDE. Type some text into the console or terminal—as soon as you press enter, the app will generate the hex dump for it. You can also test the command line argument in the IDE. Right-click on the project in the solution, then:

On Windows, choose Properties, then click Debug and enter the filename in the Application arguments box (either the full path or the name of a file in the binary folder)
On Mac, choose Options, expand Run >> Configurations, click Default, and enter the filename in the Arguments box

You can also run the app from the command line. On Windows, Visual Studio builds an executable under the bin\Debug folder (in the same place you put your files to read), so you can run the from that folder.

On Mac, you’ll need to build a self-contained application. Open a Terminal window, go to the project folder, and run this command: dotnet publish -r osx-x64

The output of the command will include a line like this: HexDump -> /path-to-binary/osx-x64/publish/

Then you can open a Terminal window, cd to the full path that it printed, and run ./HexDump.

there are no Dumb Questions

Q: Earlier in the chapter when I wrote “Eureka!” to a file and then read the bytes back, it took one byte per character. So why did each of Hebrew letters in take two bytes? And why did it write bytes “FF FE” at the beginning of the file?

A: What you’re seeing is the difference between two closely related Unicode encodings. Latin characters (including plain English letters), numbers, normal punctuation marks, and some standard characters (like curly brackets, ampersands, and other things you see on your keyboard) all have very low Unicode numbers— between 0 and 127. They correspond to a very old encoding called ASCII that dates back to the 1960s, and UTF-8 was designed to be backward compatible with ASCII. A file with only those Unicode characters contains just their bytes and nothing else.

Things get a little more complicated when you add Unicode characters with higher numbered code points into the mix. One byte can only hold a number between 0 and 255. But two bytes in a row can store numbers between 0 and 65,536—which, in hex, is FFFF. The file needs to be able to tell whatever program opens it up that it’s going to contain these higher-numbered characters. So it puts a special reserved byte sequence at the beginning of the file: FF FE. That’s called the byte order mark. As soon as a program sees that, it knows that all of the characters are encoded with two bytes each (so an E is encoded as 00 45 with a leading zero).

Q: Why is it called a byte order mark?

A: Go back to the code that wrote Images to a file then printed the bytes it wrote. You’ll see that the bytes in the file were reversed. For example, the Images code point U+05E9 was written to the file as E9 05. That’s called little-endian—it means the least significant byte is written first. Go back to the code that calls WriteAllText, modify it to change the third argument from Encoding.Unicode to Encoding.BigEndianUnicode. That tells it to write the data out in big-endian, which doesn’t flip the bytes around—when you run it again, you’ll see the bytes come out as “05 E9” instead. You’ll also see a different byte order mark: FE FF. And this tells Notepad or TextEdit how to interpret the bytes in the file.

Q: Why didn’t I use a using block or call Close after I used File.ReadAllText and File.WriteAllText?

A: The File class has several very useful static methods that automatically open up a file, read or write data, and then close it automatically. In addition to the ReadAllText and WriteAllText methods, there are ReadAllBytes and WriteAllBytes, which work with byte arrays, and ReadAllLines and WriteAllLines, which read and write string arrays, where each string in the array is a separate line in the file. All of these methods automatically open and close the streams, so you can do your whole file operation in a single statement.

Q: If the FileStream has methods for reading and writing, why do I ever need to use StreamReader and StreamWriter?

A: The FileStream class is really useful for reading and writing bytes to binary files. Its methods for reading and writing operate with bytes and byte arrays. But a lot of programs work exclusively with text files, and that’s where the StreamReader and StreamWriter come in really handy. They have methods that are built specifically for reading and writing lines of text. Without them, if you wanted to read a line of text in from a file, you’d have to first read a byte array and then write a loop to search through that array for a linebreak—so it’s not hard to see how they make your life easier.

Q: When should I use File, and when should I use FileInfo?

A: The main difference between the File and FileInfo classes is that the methods in File are static, so you don’t need to create an instance of them. On the other hand, FileInfo requires that you instantiate it with a filename. In some cases, that would be more cumbersome, like if you only need to perform a single file operation (like just deleting or moving one file). On the other hand, if you need to do many file operations to the same file, then it’s more efficient to use FileInfo, because you only need to pass it the filename once. You should decide which one to use based on the particular situation you encounter. In other words, if you’re doing one file operation, use File. If you’re doing a lot of file operations in a row, use FileInfo.

If you’re writing a string that only has Unicode characters with low numbers, it writes one byte per character. But if it’s got high-numbered characters, they’ll be written using two or more bytes each.

One more thing! We showed you basic serialization with JsonSerializer. But there are just a couple more things you need to know about it.

Watch it!

JsonSerializer only serializes public properties (not fields), and requires a parameterless constructor.

Flip back to Chapter 5 and have a look at the SwordDamage class. Its Damage property has a private set accessor:

It also has a constructor that takes an int parameter:

  public SwordDamage(int startingRoll)

You’ll be able to use JsonSerializer to serialize a SwordDamage object without any problems. But if you try to deserialize one, JsonSerializer will throw an exception—at least, it will if you use the code we’ve shown you. If you want to serialize objects that save their state in fields, private properties, or use constructors with parameters, you’ll need to create a converter. You can read more about that in the .NET Core serialization documentation: https://docs.microsoft.com/en-us/dotnet/standard/serialization/

BULLET POINTS

Serialization means writing an entire object’s state to a file or string. Deserialization means reading the object’s state back from that file or string.
The JsonSerializer class has a static Serialize method that serializes an object graph to JSON, and a static Deserialize method that instantiates an object graph using serialized JSON data.
Unicode is an industry standard for encoding characters, or converting them into bytes. Every one of the over one million Unicode characters has a code point, or a unique number assigned to it.
A C# string is a read-only collection of chars. C# characters are represented as Unicode. The C# char type uses UTF-16, a variable-length encoding that encodes characters using either one or two two-byte sequences.
Most files and web pages are encoded using UTF-8, a variable-length Unicode encoding that encodes some characters with either one, two, three, or four bytes.
StreamWriter and StreamReader work will with text, but will not handle many characters outside of the Latin character sets. Use BinaryWriter and BinaryReader to read and write binary data.
Use \u escape sequences to include Unicode in C# strings. The \u escape sequence encodes UTF-16, while \U encodes UTF-32, a 4-byte fixed-length encoding.
The StreamReader.ReadBlock method reads characters into a byte array buffer. It blocks, or keeps executing and doesn’t return until it’s either read all of the characters you asked for or run out of data to read.
File.OpenRead returns a FileStream, and the FileStream.Read method reads bytes from a stream.
The String.Substring method returns a part of a string. The String class has an overloaded constructor that takes a char array as a parameter and converts it to a string.
The Encoding.UTF8.GetString method converts a byte array with a UTF-8 to a string. Encoding.Unicode converts a byte array encoded with UTF-16 to a string, and Encoding.UTF32 converts a UTF-32 byte array.
C# makes the command-line arguments for a console app available as the args string array that gets passed to the Main method.
The Console.OpenStandardInput method returns a Stream object that’s connected to the app’s stdin.