Chapter 6

String Manipulation

Key Skills & Concepts

Image   Numeric Format Specifiers

Image   Parsing Strings

Image   Editing Strings

Image   Escape Sequences

Image   Regular Expressions

Image   Converting Strings to Other Formats


 

An important part of data-driven development involves transforming text inputs and outputs into professional-looking and meaningful data. With security in mind, it is also important to ensure the text handled by your application is in the expected format. While string handling may seem trivial, being able to parse and modify string inputs and outputs is a core skill of any data-driven developer. To help with this effort, the C# language provides a rich library for managing strings of text. This chapter shows some of C#’s most common and essential string manipulation routines.

Numeric Format Specifiers

C# offers many convenient options to display numeric data in a user-friendly format. To enable numeric formatting, C# types such as float, decimal, integer, and double provide a ToString() method with options for controlling how the numeric data is displayed as text. The ToString() method is provided by all objects—numeric data types just provide some useful overloads.

Raw Text

With no parameters, the ToString() method converts the number to raw text:

double num = 123456.789d;
Console.WriteLine(num.ToString()); // Outputs 123456.789

Rounding

Passing a string parameter with a decimal point followed by zeros as placeholders in the ToString() method rounds the number to the same total of digits as placeholders. Only passing a “0” as a parameter in the ToString() method rounds the number so the closest integer value is displayed:

image

Rounding with a Thousands Separator

The letter “N” followed by a digit in the ToString() method outputs the number with a thousands separator and a decimal place rounding to n digits:

double num = 123456.789d;
Console.WriteLine(num.ToString(“N2”)); // Outputs 123,456.79

Displaying Local Currency

To output a number in the local currency format, pass the letter “C” to the ToString() method:

double num = 123456.789d;
Console.WriteLine(num.ToString(“C”)); // Outputs $123,456.79

Parsing Strings

Parsing a string enables you to extract data about different parts of the string. C# provides lots of methods to help discover important information within a string.

Length

The string object’s Length property returns the character count of the string:

string stringVarName = “some text”;
int stringLength = stringVarName.Length; // assigns 9



Example 6-1 String Length


The string type can be thought of as an array of characters—each character is a value of type char and can be referenced by an index. With the help of the string object’s Length property, this code example uses a loop to reference and print each character in a string one character at a time:

image

The output displays the individual characters of the string and the total number of characters in the string:

Length of original string: 18
Maurine Shambarger
The letter count is: 18

Image

IndexOf()

To find the starting position of a specific set of characters, the IndexOf() method returns the position index of the first occurrence of a substring within a string object. When an occurrence of the substring is not found, the IndexOf() method returns –1. The two overloads listed next offer you flexibility when searching strings for character combinations. The first IndexOf() method listed receives the search string value as a parameter. The second IndexOf() method receives the search string and starting position of the search.

int IndexOf(string value);
int IndexOf(string value, int startIndex);

In this example, IndexOf() locates the position of each comma in a string:

image

LastIndexOf()

The LastIndexOf() method returns the position of the last occurrence of a search string. When the desired substring is not found, LastIndexOf() returns –1.

int LastIndexOf(string value);

In this example, the position of the last comma is returned:

image

Substring

When parsing content, you may want to extract only a section of a string. The Substring() method can extract a section of a string when provided with the starting position and length for the substring:

string Substring(int characterStartPosition, int substringLength);

In the following example, the Substring() method extracts the city name from a string with a format that always starts the city name after a colon and a space. The city name is also always followed by a comma. The IndexOf() method locates the starting and ending positions of the city data so it can be read with the Substring() method.

image

Split()

When managing data contained in character-delimited strings, you have the option to create an array with string content. The Split() method stores parts of a string that are separated by a common character into an array. The character that is used as a separator is called a delimiter.

string[] stringArray = stringObject.Split(char charValue);

The Split() method uses a comma delimiter as the parameter to separate location components into a string array:

image



Try This 6-1 String Formatting and Parsing Exercise


This exercise offers practice with format specifiers and different parsing methods.

1. Write a program that declares a string variable with the following text:

    “Name: Spencer Potter Balance: 3040.50”.

2. Find a way to dynamically extract the first name, last name, and dollar values separately.

3. Output the data in the manner shown next. Use a format specifier to output the dollar value.

    First Name: Spencer
Last Name: Potter
Balance: $3,040.00

Image

Join()

In contrast with the Split() method, the Join() method combines all elements of an array into a delimited string:

string Join(string separator, params object[] values);

In this example, information about a song within a comma-delimited string is converted into an array:

image

StartsWith()

The string class provides a StartsWith() method to determine if a string begins with a specific set of characters. The StartsWith() method finds any string that meets this criteria and returns a true value whenever a match is found:

bool StartsWith(string value);

For example, you might want to view all last names in a string array that begin with the letters “Agl”:

image

Contains()

The Contains() method of the string class returns a Boolean value to confirm the existence of a specific set of characters within a string:

bool Contains(string value);

In this example, Contains() helps to search for last names that have “Ja” in them:

image

Editing Strings

When formatting data for storage or presentation, you will want to edit your strings. As you would expect, C# provides methods to help. Strings can be trimmed and joined. Their letter cases can be adjusted. Portions of a string can be replaced, inserted, and removed.

Trim()

When handling string inputs from unknown sources, you may need to remove leading or trailing spaces. String values with empty spaces preceding or trailing the content are not equivalent to strings without the padding. In other words:

“ ABC ” != “ABC”

Chances are you don’t want to store the extra padding anyway. The Trim() method conveniently removes empty spaces positioned at either side of a character string:

image

Concatenation

Sometimes you may need to combine strings. The process of joining strings is called string concatenation. A simple way to append strings is using the + operator to add a string to the right of an existing string. In this code example, a first name and last name are concatenated to create a full name:

image

Adjusting Letter Case

C# string comparisons are case sensitive. For example,

“ABC” != “abc”

In addition to requiring case-sensitive string comparisons for processes such as password validation, you may face situations where alphabetical characters of varying case must be stored or displayed in a specific letter case. To handle case conversions, C# string objects provide ToUpper() and ToLower() methods to transform your alphabetical characters. ToUpper() converts a string of alphabetical characters to uppercase, and ToLower() converts a string of alphabetical characters to lowercase:

const string ALPHABET = “AbCdEfGhIjKlMnOpQrStUvWxYz”;
Console.WriteLine(ALPHABET.ToUpper()); // Outputs ABCDEFGHIJKLMNOPQRSTUVWXYZ
Console.WriteLine(ALPHABET.ToLower()); // Outputs abcdefghijklmnopqrstuvwxyz

Replace()

To help with string formatting and editing, string objects provide a Replace() method to swap all instances of one substring with another:

string Replace(string oldValue, string newValue);

This example uses Replace() to swap all instances of New Jersey with NJ:

string location = “Retro Fitness,Secaucus,New Jersey”;
// Assigns “Retro Fitness,Secaucus,NJ’
location = location.Replace(“New Jersey”, “NJ”);

Insert()

When formatting string inputs and outputs, you may need to insert additional data in the middle of the string. String objects provide an Insert() method for inserting a group of characters into an existing string at a specific position:

string Insert(int startIndex, string value);

Here the day of the week is added to a town event announcement:

image

Remove()

To remove ranges of characters from your string, you can use the Remove() method:

string Remove(int startIndex, int count);

In this example, Remove() extracts the sentence from the selection:

string answer = “e) All of the above.”;
// Assigns “All of the above.”
answer = answer.Remove(0, 3);

Escape Sequences

Escape sequences are character combinations that represent actions, nonprinted characters, and special characters such as single quotes, double quotes, file paths, newline entries, carriage returns, tabs, backspaces, and more. Escape sequences start with a backslash. Table 6-1 lists the escape sequences that are available in C#.

image

Table 6-1 Recognized C# Escape Sequences

Use of the backslashes in string combinations other than escape sequences can cause errors. For example, this file path declaration will cause an error:

string filePath = “C:\CSharp\log.txt”; // Causes an error.

Since the escape sequence \\ represents an escaped backslash, you can use the following string declaration to define the file path:

string filePath = “C:\\CSharp\\log.txt”; // Escape sequence equivalent

Verbatim String Literals

Verbatim string literal declarations prevent escape sequence processing. To define a verbatim string literal, begin the string declaration with the @ symbol. Using the file path definition example from the previous section, you could keep all of the text in the original format if you prefix the string with the @ symbol:

string filePath = @“C:\CSharp\log.txt”; // Verbatim string literal.

Regular Expressions

A regular expression is a sequence of characters that is used for pattern matching in strings. Regular expressions are often used to validate text inputs to ensure that the data conforms to the expected format. Most modern programming languages enable regular expressions since they offer such concise and efficient pattern searching. The C# regular expression library is defined in the System.Text.RegularExpressions namespace. Parsing and validation is enabled from this library with the Regex class. When initializing a Regex object to validate a string, the regular expression pattern is passed as a parameter:

Regex regex = new Regex(string regularExpression);

The Regex object provides an IsMatch() method to validate strings:

bool match = regex.IsMatch(string input);

Introductory Expressions

Regular expressions can be intimidating if you have never worked with them before, so let’s ease into this topic by only examining three regular expression operators.

Starts With

The regular expression operator ^ ensures that a string begins with the character set on its right. A regular expression of “^Chapter” validates the string value of “Chapter 10”:

image

This same regular expression generates a false result with a string value of “Preface”:

image

Ends With

A regular expression with the operator $ ensures that a string ends with a specific character set. The pattern Brazil$ validates strings that end with “Brazil” such as “San Paulo, Brazil”:

image

On the other hand, this regular expression does not validate a string that ends with “Peru”:

image

Or

The pipe symbol | represents an OR operator in a regular expression. When using | for validation, at least one character set on either side of this operator is required for a match. A regular expression pattern of “Apples|Pears|Lemons” validates any string that contains “Apples” or “Pears” or “Lemons”. In this case, the regular expression “Apples|Pears|Lemons” validates the string “Red Apples”:

image

The string “Grapes” fails to validate with this regular expression:

image



Example 6-2 Introductory Regular Expressions


This example combines regular expression operators to make a more complex expression. In this case, the pattern “^Apples|Pears|Lemons$” only validates “Apples”, “Pears”, or “Lemons” and nothing else:

image

image

The following output proves that “Red Apples” and “Lemons and Grapes” fail, while strings that start and end with either “Apples”, “Pears”, or “Lemons” validate:

FAIL: ^Apples|Pears|Lemons$ invalidates Red Apples
FAIL: ^Apples|Pears|Lemons$ invalidates Lemons and Grapes
PASS: ^Apples|Pears|Lemons$ validates Apples
PASS: ^Apples|Pears|Lemons$ validates Pears
PASS: ^Apples|Pears|Lemons$ validates Lemons

Image

More Regular Expression Operators

Many other regular operators exist to offer flexibility to design validation patterns. Table 6-2 provides a summary of common regular expression operator descriptions with examples.

image image

Table 6-2 Regular Expression Summary

TIP      

For many aspiring programmers, regular expressions appear daunting at first glance. Do not be discouraged, though, because they really are not difficult if you break each pattern into simple components. Try the self-study questions at the end of this chapter to get practice with them. You will find they are actually easier to use once you get past the initial learning curve. With practice, you will find ways to divide challenging regular expression patterns into simple sections.



Example 6-3 Regular Expressions for Simple Patterns


This example shows how to set up the Regex object for validation with sets of characters. You can use this example to test all regular expressions and corresponding example inputs listed in Table 6-2.

image

image

For the case presented in code, the input string in the preceding example conforms to the regular expression rules, so a successful validation message is displayed:

PASS: ba* validates b



Try This 6-2 Regular Expression Exercise


Here is a chance to try writing regular expressions.

1. Start with the code solution from Example 6-3.

2. Create a regular expression to validate full names such as Jane Chen, Ron Terencio, or Raj Bains. Assume that each first name and last name must begin with a capital letter and that all other characters in the name must be lowercase letters. Each full name only has a first name and a last name. There are no hyphens. First and last names are separated with one space.

Image



Example 6-4 Complex Regular Expressions


This example demonstrates how to build a complex regular expression for validating different phone number formats. The regular expression

^(\([0-9]{3}\)\s?|[0-9]{3}[-\.\s]?)[0-9]{3}[-\.\s]?[0-9]{4}$

validates

(201) 867-5309, 201-867-5309, 201.867.5309, 2018675309 and (201)867-5309

Aside from the pattern and input definitions, the code in the following example is identical to the preceding example:

image

When running this example, the output confirms that the phone number is validated with the regular expression:

Pattern: ^(\([0-9]{3}\)\s?|[0-9]{3}[-\.\s]?)[0-9]{3}[-\.\s]?[0-9]{4}$
PASS: (201) 867-5309

Image

There are lots of free tools online to help automate the process of building complex regular expressions. You might find one you like, but often you will still need to understand how regular expressions work before you can use them effectively. Common patterns such as phone numbers and e-mail addresses are usually easy to find through quick searches on the Web, too. If you seek help online, just make sure your tool or pattern is C# compliant, since other languages will implement regular expressions in a very similar but slightly different manner. If you are really stuck while trying to figure out how to validate strings, you might find other ways to parse a string without a regular expression to get the job done. On the other hand, regular expressions are extremely helpful for automating the process of validation, especially in web development, so the extra effort of figuring out how to use them often is worthwhile.

Converting Strings to Other Formats

When working with forms and other data sources, you may be confronted with a need to convert string data into numeric, date, or Boolean formats. There are two common ways to convert strings to other data types:

Image   The Convert class offers easy-to-use methods for converting strings to simple data types.

Image   The TryParse() method also enables conversion of string values to other data formats. TryParse() also provides additional error checking to prevent a conversion to an invalid data type.

Convert

Referencing the Convert class allows you to access methods that transform string arguments to other data types. Here are seven quick examples to demonstrate how to convert strings to decimal, float, double, int, long, DateTime, and bool formats:

image

TryParse()

All simple C# data type objects expose a TryParse() method to transform strings to other data types. TryParse() handles string conversions more gracefully than the Convert class since TryParse() also determines if a conversion is possible. When the conversion is not possible, TryParse() returns false and the program continues to execute on the next line without any run-time error. For example, with the Convert class, this instruction causes a run-time error:

Convert.ToInt32(“hello”);

The TryParse() method, however, attempts a string conversion and returns a Boolean value of false when unsuccessful:

image

The value generated from the conversion is stored in the out parameter. Here are seven quick examples to demonstrate how to use TryParse() to convert strings to decimal, float, double, int, long, DateTime, and Boolean formats:

image

image

NOTE      

Additional techniques for converting strings into DateTime formats will be presented in Chapter 7.

 

Image Chapter 6 Self Test


The following questions are intended to help reinforce your comprehension of the concepts covered in this chapter. The answers can be found in the accompanying online Appendix B, “Answers to the Self Tests.”

1. In a new program, store your name in a string variable named fullName. Then, store the last character of this string in a char variable. You may reference the last character of the string with the index value of fullName.Length – 1. Finally, display the value that is stored in the char variable.

2. Write a program that declares a string variable that stores the value of “Jeffrey steinberg”. Assuming you don’t know what the last name is, devise a way to find the start of the last name and update the original string to ensure that the first character of the last name is capitalized. Output the value of your updated string.

3. An Internet company called retrofitness.com issues e-mail addresses to all employees. The application that adds new employees to the database validates all data before entering it in the database. Write a regular expression that permits the entry of a valid @retrofitness.com e-mail address where the address prefix may contain alphabetical, underscore, hyphen, and dot characters. Note also that the first character in the e-mail address must be an alphabetical character.

4. Write a regular expression that only permits an entry of either “Cable” or “DSL”.

5. Write a regular expression to ensure that an age string is valid for all numbers between “1” and “110”.

6. Write an expression that only validates phrases with two or more occurrences of the word substring “go”. Each occurrence of “go” is case insensitive. Sample valid strings include “Go dogs go” and “Ogopogo”.

7. Write an expression to ensure that monetary amounts always begin with a “$” sign, that at least one digit exists on the left side of the decimal point, and that there are always two digits on the right side of the decimal point.

8. Write a program that prompts a user to enter percentage data. If incorrect data is provided, the user is prompted again. Initially, the input is read into a string with the following instruction:

     string input = Console.ReadLine();

      When the correct input is provided, the program displays the data rounded to two decimal places. The text from the program and user input could appear similar to the following:

      Input percentage earned: abc
This value is incorrect. Please try again.
Input percentage earned:
This value is incorrect. Please try again.
Input percentage earned: 12.32aa
This value is incorrect. Please try again.
Input percentage earned: 12.478
You entered 12.48%