Almost all programmers have to deal with storing, retrieving, and processing information in files at some time or another. The .NET Framework provides a number of classes and methods we can use to find, create, read, and write files and directories In this chapter we’ll look at some of the most common.
Files, though, are just one example of a broader group of entities that can be opened, read from, and/or written to in a sequential fashion, and then closed. .NET defines a common contract, called a stream, that is offered by all types that can be used in this way. We’ll see how and why we might access a file through a stream, and then we’ll look at some other types of streams, including a special storage medium called isolated storage which lets us save and load information even when we are in a lower-trust environment (such as the Silverlight sandbox). Finally, we’ll look at some of the other stream implementations in .NET by way of comparison. (Streams crop up in all sorts of places, so this chapter won’t be the last we see of them—they’re important in networking, for example.)
We, the authors of this book, have often heard our
colleagues ask for a program to help them find duplicate files on their
system. Let’s write something to do exactly that. We’ll pass the names of
the directories we want to search on the command line, along with an
optional switch to determine whether we want to recurse into
subdirectories or not. In the first instance, we’ll do a very basic check
for similarity based on filenames and sizes, as these are relatively cheap
options. Example 11-1 shows our
Main
function.
Example 11-1. Main method of duplicate file finder
static void Main(string[] args) { bool recurseIntoSubdirectories = false; if (args.Length < 1) { ShowUsage(); return; } int firstDirectoryIndex = 0; if (args.Length > 1) { // see if we're being asked to recurse if (args[0] == "/sub") { if (args.Length < 2) { ShowUsage(); return; } recurseIntoSubdirectories = true; firstDirectoryIndex = 1; } } // Get list of directories from command line. var directoriesToSearch = args.Skip(firstDirectoryIndex); List<FileNameGroup> filesGroupedByName = InspectDirectories(recurseIntoSubdirectories, directoriesToSearch); DisplayMatches(filesGroupedByName); Console.ReadKey(); }
The basic structure is pretty straightforward. First we inspect the
command-line arguments to work out which directories we’re searching. Then
we call InspectDirectories
(shown
later) to build a list of all the files in those directories. This groups
the files by filename (without the full path) because we do not consider
two files to be duplicates if they have different names. Finally, we pass
this list to DisplayMatches
, which
displays any potential matches in the files we have found. DisplayMatches
refines our test for duplicates
further—it considers two files with the same name to be duplicates only if
they have the same size. (That’s not foolproof, of course, but it’s
surprisingly effective, and we will refine it further later in the
chapter.)
Let’s look at each of these steps in more detail.
The code that parses the command-line arguments does a quick check
to see that we’ve provided at least one command-line argument (in addition
to the /sub
switch if present) and we
print out some usage instructions if not, using the method shown in
Example 11-2.
Example 11-2. Showing command line usage
private static void ShowUsage() { Console.WriteLine("Find duplicate files"); Console.WriteLine("===================="); Console.WriteLine( "Looks for possible duplicate files in one or more directories"); Console.WriteLine(); Console.WriteLine( "Usage: findduplicatefiles [/sub] DirectoryName [DirectoryName] ..."); Console.WriteLine("/sub - recurse into subdirectories"); Console.ReadKey(); }
The next step is to build a list of files grouped by name. We define
a couple of classes for this, shown in Example 11-3. We create a
FileNameGroup
object for each distinct
filename. Each FileNameGroup
contains a
nested list of FileDetails
, providing
the full path of each file that has that name, and also the size of that
file.
Example 11-3. Types used to keep track of the files we’ve found
class FileNameGroup { public string FileNameWithoutPath { get; set; } public List<FileDetails> FilesWithThisName { get; set; } } class FileDetails { public string FilePath { get; set; } public long FileSize { get; set; } }
For example, suppose the program searches two folders, c:\One and c:\Two, and suppose both of those folders
contain a file called Readme.txt. Our
list will contain a FileNameGroup
whose
FileNameWithoutPath
is Readme.txt
. Its nested FilesWithThisName
list will contain two FileDetails
entries, one with a FilePath
of c:\One\Readme.txt
and the other with c:\Two\Readme.txt
. (And each FileDetails
will contain the size of the
relevant file in FileSize
. If these two
files really are copies of the same file, their sizes will, of course, be
the same.)
We build these lists in the InspectDirectories
method, which is shown in Example 11-4.
This contains the meat of the program, because this is where we search the
specified directories for files. Quite a lot of the code is concerned with
the logic of the program, but this is also where we start to use some of
the file APIs.
Example 11-4. InspectDirectories method
private static List<FileNameGroup> InspectDirectories( bool recurseIntoSubdirectories, IEnumerable<string> directoriesToSearch) { var searchOption = recurseIntoSubdirectories ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly; // Get the path of every file in every directory we're searching. var allFilePaths = from directory in directoriesToSearch from file inDirectory.GetFiles(directory, "*.*",
searchOption) select file; // Group the files by local filename (i.e. the filename without the // containing path), and for each filename, build a list containing the // details for every file that has that filename. var fileNameGroups = from filePath in allFilePaths let fileNameWithoutPath =Path.GetFileName(filePath)
group filePath by fileNameWithoutPath into nameGroup select new FileNameGroup { FileNameWithoutPath = nameGroup.Key, FilesWithThisName = (from filePath in nameGroup let info =new FileInfo(filePath)
select new FileDetails { FilePath = filePath, FileSize =info.Length
}).ToList() }; return fileNameGroups.ToList(); }
To get it to compile, you’ll need to add:
using System.IO;
The parts of Example 11-4 that use
the System.IO
namespace to work with
files and directories have been highlighted. We’ll start by looking at the
use of the Directory
class.