Chapter 11. Files and Streams

Almost all programmers have to deal with storing, retrieving, and processing information in files at some time or another. The .NET Framework provides a number of classes and methods we can use to find, create, read, and write files and directories In this chapter we’ll look at some of the most common.

Files, though, are just one example of a broader group of entities that can be opened, read from, and/or written to in a sequential fashion, and then closed. .NET defines a common contract, called a stream, that is offered by all types that can be used in this way. We’ll see how and why we might access a file through a stream, and then we’ll look at some other types of streams, including a special storage medium called isolated storage which lets us save and load information even when we are in a lower-trust environment (such as the Silverlight sandbox). Finally, we’ll look at some of the other stream implementations in .NET by way of comparison. (Streams crop up in all sorts of places, so this chapter won’t be the last we see of them—they’re important in networking, for example.)

We, the authors of this book, have often heard our colleagues ask for a program to help them find duplicate files on their system. Let’s write something to do exactly that. We’ll pass the names of the directories we want to search on the command line, along with an optional switch to determine whether we want to recurse into subdirectories or not. In the first instance, we’ll do a very basic check for similarity based on filenames and sizes, as these are relatively cheap options. Example 11-1 shows our Main function.

The basic structure is pretty straightforward. First we inspect the command-line arguments to work out which directories we’re searching. Then we call InspectDirectories (shown later) to build a list of all the files in those directories. This groups the files by filename (without the full path) because we do not consider two files to be duplicates if they have different names. Finally, we pass this list to DisplayMatches, which displays any potential matches in the files we have found. DisplayMatches refines our test for duplicates further—it considers two files with the same name to be duplicates only if they have the same size. (That’s not foolproof, of course, but it’s surprisingly effective, and we will refine it further later in the chapter.)

Let’s look at each of these steps in more detail.

The code that parses the command-line arguments does a quick check to see that we’ve provided at least one command-line argument (in addition to the /sub switch if present) and we print out some usage instructions if not, using the method shown in Example 11-2.

The next step is to build a list of files grouped by name. We define a couple of classes for this, shown in Example 11-3. We create a FileNameGroup object for each distinct filename. Each FileNameGroup contains a nested list of FileDetails, providing the full path of each file that has that name, and also the size of that file.

For example, suppose the program searches two folders, c:\One and c:\Two, and suppose both of those folders contain a file called Readme.txt. Our list will contain a FileNameGroup whose FileNameWithoutPath is Readme.txt. Its nested FilesWithThisName list will contain two FileDetails entries, one with a FilePath of c:\One\Readme.txt and the other with c:\Two\Readme.txt. (And each FileDetails will contain the size of the relevant file in FileSize. If these two files really are copies of the same file, their sizes will, of course, be the same.)

We build these lists in the InspectDirectories method, which is shown in Example 11-4. This contains the meat of the program, because this is where we search the specified directories for files. Quite a lot of the code is concerned with the logic of the program, but this is also where we start to use some of the file APIs.

To get it to compile, you’ll need to add:

using System.IO;

The parts of Example 11-4 that use the System.IO namespace to work with files and directories have been highlighted. We’ll start by looking at the use of the Directory class.