All Sorts of “Empty” Strings

Let’s start by leaving out lines that have no content at all. There’s a special constant for the empty string; we saw it earlier: String.Empty. Let’s see what happens if we use the code in Example 10-75, which writes the line to the console only if it is not equal to String.Empty.

Example 10-75. Detecting empty strings

foreach (string line in strings)
{
    if (line != String.Empty)
    {
        output.AppendLine(line);
    }
    else
    {
        System.Diagnostics.Debug.WriteLine("Found a blank line");
    }
}

You might be wondering exactly how string comparisons are performed. Some languages base string comparison on object identity so that "Abc" is not equal to a different string object that also contains "Abc". (That may seem weird, but in one sense it’s consistent: comparing reference types always means asking “do these two variables refer to the same thing?”) But in C#, when you have distinct string objects, it performs a “character-like” comparison between strings, so any two strings containing the same sequence of characters are equal. This is different from how most reference types work, but by treating strings as a special case, the result is closer to what most people would expect. (Or at least to what most people who hadn’t already become accustomed to the oddities of another language might expect.)

Warning

Because not all languages use by-value string comparison, the .NET Framework supports the by-identity style too. Consequently, you get by-value comparison only if the C# compiler knows it’s dealing with strings. If you store two strings in variables of type object, the C# compiler loses track of the fact that they are strings, so if you compare these variables with the == operator, it doesn’t know it should provide the string-specific by-value comparison, and will instead do the default by-identity comparison you get for most reference types.

For the sake of working out what is going on, we’re also writing a message to the debug output each time we find a blank line.

If we build and run, the output to the console looks like this:

   To be, or not to be--that is the question:
Whether 'tis nobelr in the mind to suffer,
        The slings and arrows of outrageous fortune ,
        Or to take arms against a sea of troubles,
And by opposing end them.

The debug output indicates that the code found and removed eight blank lines. (If you can’t see the Output panel in Visual Studio, you can show it with the ViewOutput menu item. Ensure that the “Show output from” drop down has Debug selected.) But apparently it missed some, judging by the output.

So which are the eight “blank” lines—that is, the lines that are the equivalent of String.Empty? If you single-step through the debugger, you’ll see that they are the ones that look like "" and String.Empty.

The ones that contain just whitespace account for some of the remaining blanks in the output. While visibly blank, these are clearly not “empty”—they contain whitespace characters. We’ll deal with that in a minute. The other line that looks “empty” but isn’t is the null string.

As we said earlier, strings are reference types. There is, therefore, a considerable difference between a null reference to a string, and an empty string, as far as the .NET runtime is concerned. However, a lot of applications don’t care about this distinction, so it can sometimes be useful to treat a null string in much the same way as an empty string. The String class offers a static method that lets us test for nullness-or-emptiness with a single call, which Example 10-76 uses.

Example 10-76. Testing for either blank or null

foreach (string line in strings)
{
    if (!String.IsNullOrEmpty(line))
    {
        output.AppendLine(line);
    }
    else
    {
        System.Diagnostics.Debug.WriteLine("Found a blank line");
    }
}

Notice we have to use the ! operator, as the static method returns true if the string is null or empty. Our output is now stripped of “blank” lines except the one that contains just whitespace. If you check the debug output panel, you’ll see that nine lines have been ignored:

   
   To be, or not to be--that is the question:
Whether 'tis nobelr in the mind to suffer,
        The slings and arrows of outrageous fortune ,
        Or to take arms against a sea of troubles,
And by opposing end them.

So, what can we do about that remaining blank line at the start? We can deal with this by stripping out spurious whitespace, and then looking to see whether anything is left. Not only will this fix our blank-line problem, but it will also remove any whitespace that the user has left at the start and end of the line.