Chapter 15. Strings

There was a time (long, long ago, when the earth was still molten and I was in high school) when people thought of computers as manipulating numeric values exclusively. Among the first use of computers was to calculate missile trajectories during World War II, and for a very long time, programming was taught in the math department of major universities.

Today, most programs are concerned more with manipulating and displaying strings of characters than with strings of numbers. Typically, these strings are used for word processing, document manipulation, and creation of web pages.

C# provides built-in support for a fully functional string type. More importantly, C# treats strings as objects that encapsulate all the manipulation, sorting, and searching methods normally applied to strings of characters.

Tip

The .NET Framework provides a String class (uppercase “S”). The C# language offers an alias to the String class as the string class (lowercase “s”). These class names are interchangeable, and you are free to use either upper- or lowercase.

Complex string manipulation and pattern matching is aided by the use of regular expressions.

Tip

Regular expressions are a powerful technology for describing and manipulating text. Underlying regular expressions is a technique called pattern matching, which involves comparing one string to another, or comparing a series of wildcards that represent a type of string to a literal string. A regular expression is applied to a string—that is, to a set of characters. Often, that string is an entire text document. More on regular expressions later in this chapter.

C# combines the power and complexity of regular expression syntax, originally found only in string manipulation languages such as awk and Perl, with a fully object-oriented design.

In this chapter, you will learn to work with the C# string type and the .NET Framework System.String class that it aliases. You will see how to extract sub-strings , manipulate and concatenate strings, and build new strings with the StringBuilder class. In addition, you will find a short introduction to the RegEx class used to match strings based on regular expressions.

Creating Strings

C# treats strings as if they were built-in types (much as it does with arrays). C# strings are flexible, powerful, and easy to use.

In .NET, each string object is an immutable sequence of Unicode characters. In other words, methods that appear to change the string actually return a modified copy; the original string remains intact (and if no longer used, is collected by the Garbage Collector).

The declaration of the System.String class is (in part):

    public sealed class String :
     IComparable, ICloneable, IConvertible, IEnumerable

This declaration reveals that the class is sealed, meaning that it is not possible to derive from the String class. The class also implements four system interfaces—IComparable, ICloneable, IConvertible, and IEnumerable—which dictate functionality that System.String shares with other classes in the .NET Framework: the ability to be sorted, copied, converted to other types, and enumerated in foreach loops, respectively.

String Literals

The most common way to create a string is to assign a quoted string of characters, known as a string literal, to a user-defined variable of type string. The following code declares a string called newString that contains the phrase “This book teaches C#”:

    string newString = "This book teaches C#";

To be precise, newString is a string object that is initialized with the string literal "This book teaches C#". If you pass newString to the WriteLine method of the Console object, the string This book teaches C# will be displayed.

Escape Characters

Quoted strings can include escape characters (often referred to as “escape sequences”). Escape characters are a way to signal that the letters or characters that follow have a special meaning (for example, the two characters \n do not mean print a slash and then an n, but rather mean print a new-line). You indicate escape characters by preceding a letter or punctuation mark with a backslash (\). The two most common escape characters are \n, which is used to create a new line, and \t, which is used to insert a tab into a string. If you need to include a quotation mark (") within a string, you indicate that this is in the string (rather than ending the string) by escaping it:

    Console.Writeline("This \"string\" has quotes around it");

This will produce the output: This "string" has quotes around it.

If you want to display the backslash character itself, you must escape it with (you guessed it) another backslash. Thus, if you were writing the string c:\myDirectory, you’d write:

     "c:\\myDirectory"

Verbatim Strings

Strings can also be created using verbatim string literals, which start with the “at” (@) symbol. This tells the String constructor that the string should be used as is (verbatim), even if it spans multiple lines or includes escape characters. In a verbatim string literal, backslashes and the characters that follow them are simply considered additional characters of the string. Thus, the following two definitions are equivalent:

    string s1 = "My \'favorite\' book is in the directory \\books";
    string s2 = @" My 'favorite' book is in the directory \books";

In s1, a nonverbatim string literal is used, and so the quote and backslash characters must be escaped (preceded by a backslash character). The verbatim string s2 does not require the escape characters. A second example illustrates two ways to specify multiline verbatim strings . The first definition uses a non-verbatim string with a newline escape character (\n) to signal the line break. The second definition uses a verbatim string literal:

    string s3 = "Line One\nLine Two";
    string s4 = @"Line One
    Line Two";

If you want to use quotation marks in a verbatim string literal, you use two quotation marks, like this:

    string s5 = @"This string has ""quotation marks"" in it.";

Again, these declarations are interchangeable. Which one you use is a matter of convenience and personal style.

The ToString( ) Method

Another common way to create a string is to call the ToString( ) method on an object and assign the result to a string variable. All the built-in types override this method to simplify the task of converting a value (often a numeric value) to a string representation of that value. In the following example, the ToString( ) method of an integer type is called to store its value in a string:

    int myInteger = 5;
    string integerString = myInteger.ToString(  );

The call to myInteger.ToString( ) returns a string object that is then assigned to the string variable, integerString.

Manipulating Strings

The String class provides a host of methods for comparing , searching, and manipulating strings , the most important of which are shown in Table 15-1.

Table 15-1. String class properties and methods

Method or property	Explanation
`Chars`	Property that returns the string indexer
`Compare( )`	Overloaded public static method that compares two strings
`Copy( )`	Public static method that creates a new string by copying another
`Equals( )`	Overloaded public static and instance method that determines if two strings have the same value
`Format( )`	Overloaded public static method that formats a string using a format specification
`Length`	Property that returns the number of characters in the instance
`PadLeft( )`	Right-aligns the characters in the string, padding to the left with spaces or a specified character
`PadRight( )`	Left-aligns the characters in the string, padding to the right with spaces or a specified character
`Remove( )`	Deletes the specified number of characters
`Split( )`	Divides a string, returning the substrings delimited by the specified characters
`StartsWith( )`	Indicates if the string starts with the specified characters
`Substring( )`	Retrieves a substring
`ToCharArray( )`	Copies the characters from the string to a character array
`ToLower( )`	Returns a copy of the string in lowercase
`ToUpper( )`	Returns a copy of the string in uppercase
`Trim( )`	Removes all occurrences of a set of specified characters from beginning and end of the string
`TrimEnd( )`	Behaves like `Trim( )`, but only at the end
`TrimStart( )`	Behaves like `Trim( )`, but only at the start

Comparing Strings

The Compare( ) method of String is overloaded. The first version takes two strings and returns a negative number if the first string is alphabetically before the second, a positive number if the first string is alphabetically after the second, and zero if they are equal. The second version works just like the first but is case-insensitive. Example 15-1 illustrates the use of Compare( ).

Example 15-1. Compare( ) method

using System;

namespace StringManipulation
{
   class Tester
   {
      public void Run(  )
      {
         // create some strings to work with
         string s1 = "abcd";
         string s2 = "ABCD";
         int result; // hold the results of comparisons

         // compare two strings, case sensitive
         result = string.Compare( s1, s2 );
         Console.WriteLine(
         "compare s1: {0}, s2: {1}, result: {2}\n",
         s1, s2, result );

         // overloaded compare, takes boolean "ignore case"
         //(true = ignore case)
         result = string.Compare( s1, s2, true );
         Console.WriteLine( "Compare insensitive. result: {0}\n",
         result );

      }

      static void Main(  )
      {
         Tester t = new Tester(  );
         t.Run(  );
      }
   }
}

The output looks like this:

    compare s1: abcd, s2: ABCD, result: -1
    Compare insensitive. result: 0

Example 15-1 begins by declaring two strings, s1 and s2, and initializing them with string literals:

    string s1 = "abcd";
    string s2 = "ABCD";

Compare( ) is used with many types. A negative return value indicates that the first parameter is less than the second, a positive result indicates the first parameter is greater than the second, and a zero indicates they are equal. In Unicode (as in ASCII), a lowercase letter has a smaller value than an uppercase letter; with strings identical except for case, lowercase comes first alphabetically. Thus, the output properly indicates that s1 (abcd) is “less than” s2 (ABCD):

    compare s1: abcd, s2: ABCD, result: -1

The second comparison uses an overloaded version of Compare( ), which takes a third Boolean parameter, the value of which determines whether case should be ignored in the comparison. If the value of this “ignore case” parameter is true, the comparison is made without regard to case. This time the result is 0, indicating that the two strings are identical:

    Compare insensitive. result: 0

Concatenating Strings

There are a couple of ways to concatenate strings in C#. You can use the Concat( ) method, which is a static public method of the String class:

    string s3 = string.Concat(s1,s2);

or you can simply use the overloaded concatenation (+) operator:

    string s4 = s1 + s2;

Example 15-2 demonstrates both of these methods.

Example 15-2. Concatenation

using System;

namespace StringManipulation
{
   class Tester
   {
      public void Run(  )
      {
         string s1 = "abcd";
         string s2 = "ABCD";

         // concatenation method
         string s3 = string.Concat( s1, s2 );
         Console.WriteLine(
         "s3 concatenated from s1 and s2: {0}", s3 );

         // use the overloaded operator
         string s4 = s1 + s2;
         Console.WriteLine(
         "s4 concatenated from s1 + s2: {0}", s4 );
      }

      static void Main(  )
      {
         Tester t = new Tester(  );
         t.Run(  );
      }
   }
}

The output looks like this:

s3 concatenated from s1 and s2: abcdABCD
s4 concatenated from s1 + s2: abcdABCD

In Example 15-2, the new string s3 is created by calling the static Concat( ) method and passing in s1 and s2, while the string s4 is created by using the overloaded concatenation operator (+) that concatenates two strings and returns a string as a result.

Copying Strings

There are two ways to copy strings. 99.9 percent of the time you will just write:

    oneString = theOtherString;

and not worry about what is going on in memory.

There is a second, somewhat awkward way to copy strings:

    myString  = String.Copy(yourString);

and this actually does something subtly different. The difference is somewhat advanced, but here it is in a nutshell.

When you use the assignment operator (=), you create a second reference to the same object in memory, but when you use Copy, you create a reference to a new string that is initialized with the value of the first string.

“Huh?” I hear you cry. An example will make it clear (see Example 15-3).

Example 15-3. Copying strings

using System;

namespace StringManipulation
{
   class Tester
   {
      public void Run()
      {
         string s1 = "abcd";

         Console.WriteLine( " string s1: {0}",s1 );
         Console.WriteLine( " string s2 = s1; " );
         string s2 = s1;
         Console.WriteLine( "s1: {0} s2: {1}", s1, s2 );
         Console.WriteLine( "s1 == s2? {0}", s1 == s2 );
         Console.WriteLine( "ReferenceEquals(s1,s2): {0}",
             ReferenceEquals( s1, s2 ) );
         Console.WriteLine( " \nstring s2 = string.Copy( s1 ); " );
         string s3 = string.Copy( s1 );
         Console.WriteLine( "s1: {0} s3: {1}", s1, s3 );
         Console.WriteLine( "s1 == s3? {0}", s1 == s3 );
         Console.WriteLine( "ReferenceEquals(s1,s3): {0}",
             ReferenceEquals( s1, s3 ) );

         Console.WriteLine( " \ns2 = \"Hello\"; " );
         s1 = "Hello";
         Console.WriteLine( "s1: {0} s2: {1}", s1, s2 );
         Console.WriteLine( "s1 == s2? {0}", s1 == s2 );
         Console.WriteLine( "ReferenceEquals(s1,s2): {0}",
              ReferenceEquals( s1, s2 ) );

      }

      static void Main()
      {
         Tester t = new Tester();
         t.Run();
      }
   }
}

The output looks like this:

    string s1: abcd
    string s2 = s1;
    s1: abcd s2: abcd
    s1 == s2? True
    ReferenceEquals(s1,s2): True

    string s2 = string.Copy( s1 );
    s1: abcd s3: abcd
    s1 == s3? True
    ReferenceEquals(s1,s3): False

    s1 = "Hello";
    s1: Hello s2: abcd
    s1 == s2? False
    ReferenceEquals(s1,s2): False

In Example 15-3, you start by initializing one string:

    string s1 = "abcd";

You then assign the value of s1 to s2 using the assignment operator:

    s2 = s1;

You print their values, as shown in the first section of results, and find that not only do the two string references have the same value, as indicated by using the equality operator (==), but they actually point to the same object in memory, which is why ReferenceEquals returns true.

On the other hand, if you create s3 and assign its value using String.Copy(s1), while the two values are equal (as shown by using the equality operator), they refer to different objects in memory (as shown by the fact that ReferenceEquals returns false).

Now, returning to s1 and s2, which refer to the same object, if you change either one, for example, when you write:

    s1 = "Hello";

s3 goes on referring to the original string, but s1 now refers to a brand new string.

If you later write:

    S3 = "Goodbye";

(not shown in the example), the original string referred to by s1 will no longer have any references to it, and it will be mercifully and painlessly destoryed by the Garbage Collector.

Testing for Equality

The .NET String class provides three ways to test for the equality of two strings. First, you can use the overloaded Equals( ) method and ask one string (say, s6) directly whether another string (s5) is of equal value:

    Console.WriteLine( "\nDoes s6.Equals(s5)?: {0}", s6.Equals(s5));

You can also pass both strings to String’s static method Equals( ):

    Console.WriteLine( "Does Equals(s6,s5)?: {0}" string.Equals(s6,s5));

Or you can use the String class’s overloaded equality operator (==):

    Console.WriteLine( "Does s6==s5?: {0}", s6 == s5);

In each of these cases, the returned result is a Boolean value (true for equal and false for unequal). Example 15-4 demonstrates these techniques.

Example 15-4. Are all strings created equal?

using System;

namespace StringManipulation
{
   class Tester
   {
      public void Run(  )
      {
         string s1 = "abcd";
         string s2 = "ABCD";

         // the string copy method
         string s5 = string.Copy( s2 );
         Console.WriteLine(
         "s5 copied from s2: {0}", s5 );
         string s6 = s5;
         Console.WriteLine( "s6 = s5: {0}", s6 );

         // member method
         Console.WriteLine(
         "\nDoes s6.Equals(s5)?: {0}",
         s6.Equals( s5 ) );

         // static method
         Console.WriteLine(
         "Does Equals(s6,s5)?: {0}",
         string.Equals( s6, s5 ) );

         // overloaded operator
         Console.WriteLine(
         "Does s6==s5?: {0}", s6 == s5 );
      }

      static void Main(  )
      {
         Tester t = new Tester(  );
         t.Run(  );
      }
   }
}

The output looks like this:

    s5 copied from s2: ABCD
    s6 = s5: ABCD

    Does s6.Equals(s5)?: True
    Does Equals(s6,s5)?: True
    Does s6==s5?: True

The equality operator is the most natural of the three methods to use when you have two string objects.

Other Useful String Methods

The String class includes a number of useful methods and properties for finding specific characters or substrings within a string, as well as for manipulating the contents of the string. Example 15-5 demonstrates a few methods, such as locating substrings, finding the index of a substring, and inserting text from one string into another. Following the output is a complete analysis.

Example 15-5. Useful methods of the String class

using System;

namespace StringManipulation
{
   class Tester
   {
      public void Run(  )
      {
         string s1 = "abcd";
         string s2 = "ABCD";
         string s3 = @"Liberty Associates, Inc.
 provides custom .NET development,
 on-site Training and Consulting";

         // the string copy method
         string s5 = string.Copy( s2 );
         Console.WriteLine(
         "s5 copied from s2: {0}", s5 );

         // Two useful properties: the index and the length
         Console.WriteLine(
         "\nString s3 is {0} characters long. ",
         s5.Length );

         Console.WriteLine(
         "The 5th character is {0}\n", s3[4] );

         // test whether a string ends with a set of characters
         Console.WriteLine( "s3:{0}\nEnds with Training?: {1}\n",
         s3,
         s3.EndsWith( "Training" ) );
         Console.WriteLine(
         "Ends with Consulting?: {0}",
         s3.EndsWith( "Consulting" ) );

         // return the index of the substring
         Console.WriteLine(
         "\nThe first occurrence of Training " );
         Console.WriteLine( "in s3 is {0}\n",
         s3.IndexOf( "Training" ) );

         // insert the word excellent before "training"
         string s10 = s3.Insert( 73, "excellent " );
         Console.WriteLine( "s10: {0}\n", s10 );

         // you can combine the two as follows:
         string s11 = s3.Insert( s3.IndexOf( "Training" ),
         "excellent " );
         Console.WriteLine( "s11: {0}\n", s11 );
      }
      static void Main(  )
      {
         Tester t = new Tester(  );
         t.Run(  );
      }
   }
}

The output looks like this:

    s5 copied from s2: ABCD

    String s3 is 4 characters long.
    The 5th character is r

    s3:Liberty Associates, Inc.
     provides custom .NET development,
     on-site Training and Consulting
    Ends with Training?: False

    Ends with Consulting?: True

    The first occurrence of Training
    in s3 is 73

    s10: Liberty Associates, Inc.
     provides custom .NET development,
     on-site excellent Training and Consulting

    s11: Liberty Associates, Inc.
     provides custom .NET development,
     on-site excellent Training and Consulting

The Length property returns the length of the entire string, and the index operator ([]) is used to access a particular character within a string:

    Console.WriteLine(
     "\nString s3 is {0} characters long. ",
     s5.Length);

    Console.WriteLine(
     "The 5th character is {0}\n", s3[4]);

Here’s the output:

    String s3 is 4 characters long.
    The 5th character is r

The EndsWith( ) method asks a string whether a substring is found at the end of the string. Thus, you might first ask if s3 ends with “Training” (which it does not), and then if it ends with “Consulting” (which it does):

    Console.WriteLine("s3:{0}\nEnds with Training?: {1}\n",
     s3,
     s3.EndsWith("Training") );
    Console.WriteLine(
     "Ends with Consulting?: {0}",
     s3.EndsWith("Consulting"));

The output reflects that the first test fails and the second succeeds:

    Ends with Training?: False
    Ends with Consulting?: True

The IndexOf( ) method locates a substring within a string, and the Insert( ) method inserts a new substring into a copy of the original string. The following code locates the first occurrence of “Training” in s3:

    Console.WriteLine("\nThe first occurrence of Training ");
    Console.WriteLine ("in s3 is {0}\n",
     s3.IndexOf("Training"));

The output indicates that the offset is 73:

    The first occurrence of Training
    in s3 is 73

Then use that value to insert the word “excellent,” followed by a space, into that string. Actually the insertion is into a copy of the string returned by the Insert( ) method and assigned to s10:

    string s10 = s3.Insert(73,"excellent ");
    Console.WriteLine("s10: {0}\n",s10);

Here’s the output:

    s10: Liberty Associates, Inc.
     provides custom .NET development,
     on-site excellent Training and Consulting

Finally, you can combine these operations to make a more efficient insertion statement:

    string s11 = s3.Insert(s3.IndexOf("Training"),"excellent ");
    Console.WriteLine("s11: {0}\n",s11);

with the identical result:

    s11: Liberty Associates, Inc.
     provides custom .NET development,
     on-site excellent Training and Consulting

Finding Substrings

The String class has methods for finding and extracting substrings . For example, the IndexOf( ) method returns the index of the first occurrence of a string (or of any character in an array of characters) within a target string. For example, given the definition of the string s1 as:

    string s1 = "One Two Three Four";

you can find the first instance of the characters “hre” by writing:

    int index = s1.IndexOf("hre");

This code sets the int variable index to 9, which is the offset of the letters “hre” in the string s1.

Similarly, the LastIndexOf( ) method returns the index of the last occurrence of a string or substring. While the following code:

    s1.IndexOf("o");

returns the value 6 (the first occurrence of the lowercase letter “o” is at the end of the word “Two”), the method call:

    s1.LastIndexOf("o");

returns the value 15 (the last occurrence of “o” is in the word “Four”).

The Substring( ) method returns a series of characters. You can ask it for all the characters starting at a particular offset and ending either with the end of the string or with an offset you (optionally) provide. Example 15-6 illustrates the Substring( ) method.

Example 15-6. Finding substrings by index

using System;

namespace StringSearch
{
   class Tester
   {
      public void Run(  )
      {
         // create some strings to work with
         string s1 = "One Two Three Four";

         int index;

         // get the index of the last space
         index = s1.LastIndexOf( " " );

         // get the last word.
         string s2 = s1.Substring( index + 1 );

         // set s1 to the substring starting at 0
         // and ending at index (the start of the last word)
         // thus s1 has "one two three"
         s1 = s1.Substring( 0, index );

         // find the last space in s1 (after two)
         index = s1.LastIndexOf( " " );

         // set s3 to the substring starting at
         // index, the space after "two" plus one more
         // thus s3 = "three"
         string s3 = s1.Substring( index + 1 );
         // reset s1 to the substring starting at 0
         // and ending at index, thus the string "one two"
         s1 = s1.Substring( 0, index );

         // reset index to the space between
         // "one" and "two"
         index = s1.LastIndexOf( " " );

         // set s4 to the substring starting one
         // space after index, thus the substring "two"
         string s4 = s1.Substring( index + 1 );

         // reset s1 to the substring starting at 0
         // and ending at index, thus "one"
         s1 = s1.Substring( 0, index );

         // set index to the last space, but there is
         // none so index now = -1
         index = s1.LastIndexOf( " " );

         // set s5 to the substring at one past
         // the last space. there was no last space
         // so this sets s5 to the substring starting
         // at zero
         string s5 = s1.Substring( index + 1 );

         Console.WriteLine( "s2: {0}\ns3: {1}", s2, s3 );
         Console.WriteLine( "s4: {0}\ns5: {1}\n", s4, s5 );
         Console.WriteLine( "s1: {0}\n", s1 );
      }

      static void Main(  )
      {
         Tester t = new Tester(  );
         t.Run(  );
      }
   }
}

The output looks like this:

    s2: Four
    s3: Three
    s4: Two
    s5: One

    s1: One

Example 15-6 is not the most elegant solution possible to the problem of extracting words from a string, but it is a good first approximation, and it illustrates a useful technique. The example begins by creating a string, s1:

    string s1 = "One Two Three Four";

The local variable index is assigned the value of the last literal space in the string (which comes before the word “Four”):

    index=s1.LastIndexOf(" ");

The substring that begins one position later is assigned to the new string, s2:

    string s2 = s1.Substring(index+1);

This extracts the characters from index +1 to the end of the line (the string “Four”) and assigns the value “Four” to s2.

The next step is to remove the word “Four” from s1; assign to s1 the substring of s1 that begins at 0 and ends at the index:

    s1 = s1.Substring(0,index);

Tip

After this line executes, the variable s1 will point to a new string object that will contain the appropriate substring of the string that s1 used to point to. That original string will eventually be destroyed by the garbage collector because no variable now references it.

You reassign index to the last (remaining) space, which points you to the beginning of the word “Three.” You then extract the character “Three” into string s3. Continue like this until you’ve populated s4 and s5. Finally, display the results:

    s2: Four
    s3: Three
    s4: Two
    s5: One
    s1: One

Splitting Strings

A more effective solution to the problem illustrated in Example 15-6 would be to use the String class’s Split( ) method, which parses a string into substrings. To use Split( ), pass in an array of delimiters (characters that indicate where to divide the words). The method returns an array of substrings (which Example 15-7 illustrates). The complete analysis follows the code.

Example 15-7. The Split( ) method

using System;

namespace StringSearch
{
   class Tester
   {
      public void Run(  )
      {
         // create some strings to work with
         string s1 = "One,Two,Three Liberty Associates, Inc.";
         // constants for the space and comma characters
         const char Space = ' ';
         const char Comma = ',';

         // array of delimiters to split the sentence with
         char[] delimiters = new char[]
         {
            Space,
            Comma
         };

         int ctr = 1;

         // split the string and then iterate over the
         // resulting array of strings

         String[] resultArray = s1.Split( delimiters );

         foreach ( String subString in resultArray )
         {
            Console.WriteLine(ctr++ + ":" + subString);
         }
      }

      static void Main(  )
      {
         Tester t = new Tester(  );
         t.Run(  );
      }
   }
}

The output looks like this:

    1: One
    2: Two
    3: Three
    4: Liberty
    5: Associates
    6:
    7: Inc.

Example 15-7 starts by creating a string to parse:

    string s1 = "One,Two,Three Liberty Associates, Inc.";

The delimiters are set to the space and comma characters. Then call Split( ) on the string, passing in the delimiters:

    String[] resultArray = s1.Split(delimiters);

Split( ) returns an array of the substrings that you can then iterate over using the foreach loop, as explained in Chapter 10:

    foreach (String subString in resultArray)

Tip

You can, of course, combine the call to split with the iteration, as in the following:

    foreach (string subString in s1.Split(delimiters))

C# programmers are fond of combining statements like this. The advantage of splitting the statement into two, however, and of using an interim variable like resultArray is that you can examine the contents of resultArray in the debugger.

Start the foreach loop by initializing output to an empty string, and then build up the output string in four steps. Start by concatenating the incremented value of ctr to the output string, using the += operator.

    output += ctr++;

Next add the colon, then the substring returned by Split( ), and then the newline:

    output += ": ";
    output += subString;
    output += "\n";

With each concatenation, a new copy of the string is made, and all four steps are repeated for each substring found by Split( ).

This repeated copying of string is terribly inefficient. The problem is that the string type is not designed for this kind of operation. What you want is to create a new string by appending a formatted string each time through the loop. The class you need is StringBuilder .

The StringBuilder Class

You can use the System.Text.StringBuilder class for creating and modifying strings . Table 15-2 summarizes the important members of StringBuilder.

Table 15-2. StringBuilder members

Method or property	Explanation
`Append( )`	Overloaded public method that appends a typed object to the end of the current `StringBuilder`
`AppendFormat( )`	Overloaded public method that replaces format specifiers with the formatted value of an object
`EnsureCapacity( )`	Ensures that the current `StringBuilder` has a capacity at least as large as the specified value
`Capacity`	Property that retrieves or assigns the number of characters the `StringBuilder` is capable of holding
`Insert( )`	Overloaded public method that inserts an object at the specified position
`Length`	Property that retrieves or assigns the length of the `StringBuilder`
`MaxCapacity`	Property that retrieves the maximum capacity of the `StringBuilder`
`Remove( )`	Removes the specified range of characters
`Replace( )`	Overloaded public method that replaces all instances of the specified characters with new characters

Unlike String, StringBuilder is mutable; when you modify an instance of the StringBuilder class, you modify the actual string, not a copy.

Example 15-8 replaces the String object in Example 15-7 with a StringBuilder object.

Example 15-8. The StringBuilder class

using System;
using System.Text;

namespace StringSearch
{
   class Tester
   {
      public void Run(  )
      {
         // create some strings to work with
         string s1 = "One,Two,Three Liberty Associates, Inc.";

         // constants for the space and comma characters
         const char Space = ' ';
         const char Comma = ',';

         // array of delimiters to split the sentence with
         char[] delimiters = new char[]
         {
            Space,
            Comma
         };

         // use a StringBuilder class to build the
         // output string
         StringBuilder output = new StringBuilder(  );
         int ctr = 1;

         // split the string and then iterate over the
         // resulting array of strings
         foreach ( string subString in s1.Split( delimiters ) )
         {
            // AppendFormat appends a formatted string
            output.AppendFormat( "{0}: {1}\n", ctr++, subString );
         }
         Console.WriteLine( output );

      }
      static void Main(  )
      {
         Tester t = new Tester(  );
         t.Run(  );
      }
   }
}

Only the last part of the program is modified. Rather than using the concatenation operator to modify the string, use the AppendFormat( ) method of StringBuilder to append new formatted strings as you create them. This is much easier and far more efficient. The output is identical:

    1: One
    2: Two
    3: Three
    4: Liberty
    5: Associates
    6:
    7: Inc.

Because you passed in delimiters of both comma and space, the space after the comma between “Associates” and “Inc.” is returned as a word, numbered 6 in the previous code. That is not what you want. To eliminate this, you need to tell Split( ) to match a comma (as between One, Two, and Three), a space (as between Liberty and Associates), or a comma followed by a space. It is that last bit that is tricky and requires that you use a regular expression.

Regular Expressions

As noted earlier, regular expressions provide a very powerful way to describe and manipulate text through pattern matching.

The result of applying a regular expression to a string is either to return a substring or to return a new string representing a modification of some part of the original string. (Remember that string objects are immutable and so cannot be changed by the regular expression.)

By applying a properly constructed regular expression to the following string:

    One,Two,Three Liberty Associates, Inc.

you can return any or all of its substrings (such as “Liberty” or “One”) or modified versions of its substrings (such as “LIBeRtY” or “OnE”). What the regular expression does is determined by the syntax of the regular expression itself.

A regular expression consists of two types of characters: literals and metacharacters. A literal is a character you want to match in the target string. A metacharacter is a special symbol that acts as a command to the regular expression parser. The parser is the engine responsible for understanding the regular expression. For example, if you create a regular expression:

    ^(From|To|Subject|Date):

this will match any substring with the letters “From,” “To,” “Subject,” or “Date,” so long as those letters start a new line (^) and end with a colon (:).

The caret (^) indicates to the regular expression parser that the string you’re searching for must begin a new line. The letters “From” and “To” are literals, and the metacharacters left and right parentheses ((, )) and vertical bar (|) are all used to group sets of literals and indicate that any of the choices should match. Thus, you would read the following line as “match any string that begins a new line, followed by any of the four literal strings From, To, Subject, or Date, and followed by a colon”:

    ^(From|To|Subject|Date):

Tip

A full explanation of regular expressions is beyond the scope of this book, but all the regular expressions used in the examples are explained. For a complete understanding of regular expressions, I highly recommend Mastering Regular Expressions, Second Edition, by Jeffrey E. F. Friedl (O’Reilly, 2002).

The Regex Class

The .NET Framework provides an object-oriented approach to regular expression pattern matching and replacement.

The Framework Class Library namespace System.Text.RegularExpressions is the home to all the .NET Framework objects associated with regular expressions. The central class for regular expression support is Regex, which provides methods and properties for working with regular expressions, the most important of which are shown in Table 15-3.

Table 15-3. Regex members

Method or property	Explanation
`Regex` constructor	Overloaded; creates an instance of `Regex`
`Options`	Property that returns the options passed in to the constructor
`IsMatch( )`	Method that indicates whether a match is found in the input string
`Match`	Searches an input string and returns a match for a regular expression
`Matches`	Searches an input string and returns all successful matches for a regular expression
`Replace`	Replaces all occurrences of a pattern with a replacement string
`Split`	Splits an input string into an array of substrings based on a regular expression

Example 15-9 rewrites Example 15-8 to use regular expressions and thus solve the problem of searching for more than one type of delimiter.

Example 15-9. Regular expressions

using System;
using System.Text;
using System.Text.RegularExpressions;

namespace RegularExpressions
{
   class Tester
   {
      public void Run(  )
      {
         string s1 =
         "One,Two,Three Liberty Associates, Inc.";
         Regex theRegex = new Regex( " |, |," );
         StringBuilder sBuilder = new StringBuilder(  );
         int id = 1;

         foreach ( string subString in theRegex.Split( s1 ) )
         {
            sBuilder.AppendFormat(
            "{0}: {1}\n", id++, subString );
         }
         Console.WriteLine( "{0}", sBuilder );
      }


      static void Main(  )
      {
         Tester t = new Tester(  );
         t.Run(  );
      }
   }
}

The output looks like this:

    1: One
    2: Two
    3: Three
    4: Liberty
    5: Associates
    6: Inc.

Example 15-9 begins by creating a string, s1, identical to the string used in Example 15-8:

    string s1 = "One,Two,Three Liberty Associates, Inc.";

and a regular expression that is used to search the string:

    Regex theRegex = new Regex(" |,|, ");

One of the overloaded constructors for Regex takes a regular expression string as its parameter.

Tip

This can be a bit confusing. In the context of a C# program, which is the regular expression—the text passed in to the constructor or the Regex object itself? It is true that the text string passed to the constructor is a regular expression in the traditional sense of the term. From a C# (that is, object-oriented) point of view, however, the argument to the constructor is just a string of characters; it is the object called theRegex that is the regular expression object.

The rest of the program proceeds like Example 15-8, except that rather than calling the Split( ) method of String on string s1, the Split( ) method of Regex is called. theRegex.Split( ) acts in much the same way as String.Split( ), returning an array of strings as a result of matching the regular expression pattern within theRegex. Because it matches a regular expression, rather than using a set of delimiters, you have much greater control over how the string is split.

Summary

C# strings can be sorted, searched, and otherwise manipulated.
The String class is sealed, meaning it cannot be derived from. It implements the IComparable, IClonable, IConvertible, and IEnumerable interfaces, indicating that you can compare two strings (to sort them), clone a string (to create a duplicate), convert a string to another type (for example, converting the string “15” to the integer 15), and enumerate over a string using a foreach statement, respectively.
A string literal is a quoted string of characters assigned to a variable of type string. This is the most common use of strings.
Escape characters allow you to add special characters to strings that would otherwise not be valid within a string.
A verbatim string literal starts with an @ symbol and indicates that the string should be used exactly as is. Verbatim strings do not require escape characters.
You can concatenate strings with the Concat( ) method or the + operator.
You can copy strings with the Copy( ) method or the = operator.
You can test for equality of two strings with the Equals( ) method or the == operator.
The String class also includes methods for finding and extracting substrings, such as IndexOf( ), LastIndexOf( ), and Substring( ).
You can use the Split( ) method with an array of delimiters to divide a string into substrings.
Strings are immutable. Every time you appear to modify a string, a copy is made with the modification and the original string is released to the garbage collector.
The StringBuilder class allows you to assemble the contents of a string with greater efficiency and then to call its ToString( ) method to generate the string you need once it is fully assembled.
Regular expressions provide pattern-matching abilities that enable you to search and manipulate text.

Quiz

Question 2–1.: What is the difference between string and String (lower- and uppercase)?
Question 2–2.: Some of the interfaces implemented by the string are: IComparable, ICloneable, IConvertible and IEnumerable. What do these guarantee to you as a client of the String class?
Question 2–3.: What is a string literal?
Question 2–4.: What is the purpose of escape characters? Give two examples.
Question 2–5.: What are verbatim strings?
Question 2–6.: What does it mean that strings are immutable?
Question 2–7.: What does it mean that the String class is sealed?
Question 2–8.: What are the two ways to concatenate strings?
Question 2–9.: What does Split( ) do?
Question 2–10.: What is the StringBuilder class, why is it used, and how do you create a string with one?
Question 2–11.: What are regular expressions?

Exercises

Exercise 2-1.

Create the following six strings:

Exercise 2-2.

String 1: “Hello”
String 2: “World”
String 3 (a verbatim string): “Come visit us at http://www.LibertyAssociates.com"
String 4: a concatenation of strings 1 and 2
String 5: “world”
String 6: a copy of string 3

Once you have the strings created, do the following:

Output the length of each string.
Output the third character in each string.
Output whether the character “H” appears in each string.
Output which strings are the same as string 2.
Output which strings are the same as string 2, ignoring case.

Exercise 2-2.

Take the following string:

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.

and use a regular expression to split the string into words.