8.1 An Array Type for Strings

In everything one must consider the end.

JEAN DE LA FONTAINE, FABLES, BOOK III (1668)

In this section we describe one way to represent strings of characters, which C++ has inherited from the C language. In Section 8.2 we describe a string class that is a more modern way to represent strings. Although the string type described here may be a bit “old-fashioned,” it is still widely used and is an integral part of the C++ language.

C-String Values and C-String Variables

One way to represent a string is as an array with base type char. If the string is "Hello", it is handy to represent it as an array of characters with six indexed variables: five for the five letters in “Hello” plus one for the character '\0', which serves as an end marker. The character '\0' is called the null character and is used as an end marker because it is distinct from all the “real” characters. The end marker allows your program to read the array one character at a time and know that it should stop reading when it reads the end marker '\0'. A string stored in this way (as an array of characters terminated with '\0') is called a C string.

We write '\0' with two symbols when we write it in a program, but just like the new-line character '\n', the character '\0' is really only a single character value. Like any other character value, '\0' can be stored in one variable of type char or one indexed variable of an array of characters.

You have already been using C strings. In C++, a literal string, such as "Hello", is stored as a C string, although you seldom need to be aware of this detail.

A C-string variable is just an array of characters. Thus, the following array declaration provides us with a C-string variable capable of storing a C-string value with nine or fewer characters:

char s[10];

The 10 is for the nine letters in the string plus the null character '\0' to mark the end of the string.

A C-string variable is a partially filled array of characters. Like any other partially filled array, a C-string variable uses positions starting at indexed variable 0 through as many as are needed. However, a C-string variable does not use an int variable to keep track of how much of the array is currently being used. Instead, a string variable places the special symbol '\0' in the array immediately after the last character of the C string. Thus, if s contains the string "Hi Mom"!, then the array elements are filled as shown here:

An array consisting of row of 10 adjacent boxes, labeled s[0] through s[9] from left to right. Their contents are as follows: H, i, blank, M, o, m, exclamation point, backslash 0, question mark, question mark.

The character '\0' is used as a sentinel value to mark the end of the C string. If you read the characters in the C string starting at indexed variable s[0], proceed to s[1], and then to s[2], and so forth, you know that when you encounter the symbol '\0', you have reached the end of the C string. Since the symbol '\0' always occupies one element of the array, the length of the longest string that the array can hold is 1 less than the size of the array.

The thing that distinguishes a C-string variable from an ordinary array of characters is that a C-string variable must contain the null character '\0' at the end of the C-string value. This is a distinction in how the array is used rather than a distinction about what the array is. A C-string variable is an array of characters, but it is used in a different way.

You can initialize a C-string variable when you declare it, as illustrated by the following example:

char myMessage[20] = "Hi there.";

Notice that the C string assigned to the C-string variable need not fill the entire array.

When you initialize a C-string variable, you can omit the array size. C++ will automatically make the size of the C-string variable 1 more than the length of the quoted string. (The one extra indexed variable is for '\0'.) For example,

char shortString[] = "abc";

is equivalent to

char shortString[4] = "abc";

Be sure you do not confuse the following initializations:

char shortString[] = "abc";

and

char shortString[] = {'a', 'b', 'c'};

They are not equivalent. The first of these two possible initializations places the null character '\0' in the array after the characters 'a', 'b', and 'c'. The second one does not put a '\0' anywhere in the array.

A C-string variable is an array, so it has indexed variables that can be used just like those of any other array. For example, suppose your program contains the following C-string variable declaration:

char ourString[5] = "Hi";

With ourString declared as shown previously, your program has the following indexed variables: ourString[0], ourString[1], ourString[2], ourString[3], and ourString[4]. For example, the following will change the C-string value in ourString to a C string of the same length consisting of all 'X' characters:

int index = 0;
while (ourString[index] != '\0')
{
	ourString[index] = 'X';
	index++;
}

When manipulating these indexed variables, you should be very careful not to replace the null character '\0' with some other value. If the array loses the value '\0', it will no longer behave like a C-string variable. For example, the following will change the array happyString so that it no longer contains a C string:

char happyString[7] = "DoBeDo";
happyString[6] = 'Z';

After this code is executed, the array happyString will still contain the six letters in the C-string "DoBeDo", but happyString will no longer contain the null character '\0' to mark the end of the C string. Many string-manipulating functions depend critically on the presence of '\0' to mark the end of the C-string value.

As another example, consider the previous while loop that changed characters in the C-string variable ourString. That while loop changes characters until it encounters a '\0'. If the loop never encounters a '\0', then it could change a large chunk of memory to some unwanted values, which could make your program do strange things. As a safety feature, it would be wise to rewrite that while loop as follows, so that if the null character '\0' is lost, the loop will not inadvertently change memory locations beyond the end of the array:

int index = 0;
while ( (ourString[index] != '\0') && (index < SIZE) )
{
	ourString[index] = 'X';
	index++;
}

SIZE is a defined constant equal to the declared size of the array ourString.

Other Functions in cstring

Display 8.1 contains a few of the most commonly used functions from the library with the header file <cstring>. To use them, you insert the following near the top of the file:

#include <cstring>

Display 8.1 Some Predefined C-String Functions in <cstring>

Function Description Cautions
strcpy(Target_String_Var, SrcString) Copies the C-string value SrcString into the C-string variable Target_String_Var. Does not check to make sure Target_String_Var is large enough to hold the value SrcString.
strncpy(Target_String_Var, SrcString, Limit) The same as the two-argument strcpy except that at most Limit characters are copied. If Limit is chosen carefully, this is safer than the two-argument version of strcpy. Not implemented in all versions of C++.
strcat(Target_String_Var, SrcString) Concatenates the C-string value SrcString onto the end of the C string in the C-string variable Target_String_Var. Does not check to see that Target_String_Var is large enough to hold the result of the concatenation.
strncat(Target_String_Var, SrcString, Limit) The same as the two-argument strcat except that at most Limit characters are appended. If Limit is chosen carefully, this is safer than the two-argument version of strcat. Not implemented in all versions of C++.
strlen(SrcString) Returns an integer equal to the length of SrcString. (The null character, '\0', is not counted in the length.)
strcmp(String_1, String_2) Returns 0 if String_1 and String_2 are the same. Returns a value < 0 if String_1 is less than String_2. Returns a value >0 if String_1 is greater than String_2 (that is, returns a nonzero value if String_1 and String_2 are different). The order is lexicographic. If String_1 equals String_2, this function returns 0, which converts to false. Note that this is the reverse of what you might expect it to return when the strings are equal.
strncmp(String_1, String_2, Limit) The same as the two-argument strcat except that at most Limit characters are compared. If Limit is chosen care-fully, this is safer than the two-argument version of strcmp. Not implemented in all versions of C++.

Like the functions strcpy and strcmp, all the other functions in <cstring> also do not require the following or anything similar (although other parts of your program are likely to require it):1

using namespace	std;

We have already discussed strcpy and strcmp. The function strlen is easy to understand and use. For example, strlen("dobedo") returns 6 because there are six characters in "dobedo".

The function strcat is used to concatenate two C strings, that is, to form a longer string by placing the two shorter C strings end-to-end. The first argument must be a C-string variable. The second argument can be anything that evaluates to a C-string value, such as a quoted string. The result is placed in the C-string variable that is the first argument. For example, consider the following:

char stringVar[20] = "The rain";
strcat(stringVar, "in Spain");

This code will change the value of stringVar to "The rainin Spain". As this example illustrates, you need to be careful to account for blanks when concatenating C strings.

If you look at the table in Display 8.1, you will see that safer, three-argument versions of the functions strcpy, strcat, and strcmp are available in many, but not all, versions of C++. Also, note that these three-argument versions are spelled with an added letter n: strncpy, strncat, and strncmp.

Self-Test Exercises

  1. Which of the following declarations are equivalent?

    char stringVar[10] = "Hello";
    char stringVar[10] = {'H', 'e', 'l', 'l', 'o', '\0'};
    char stringVar[10] = {'H', 'e', 'l', 'l', 'o'};
    char stringVar[6] = "Hello";
    char stringVar[] = "Hello";
  2. What C string will be stored in singingString after the following code is run?

    char singingString[20] = "DoBeDo";
    strcat(singingString, " to you");

    Assume that the code is embedded in a complete and correct program and that an include directive for <cstring> is in the program file.

  3. What (if anything) is wrong with the following code?

    char stringVar[] = "Hello";
    strcat(stringVar, " and Good-bye.");
    cout << stringVar;

    Assume that the code is embedded in a complete program and that an include directive for <cstring> is in the program file.

  4. Suppose the function strlen (which returns the length of its string argument) was not already defined for you. Give a function definition for strlen. Note that strlen has only one argument, which is a C string. Do not add additional arguments; they are not needed.

  5. What is the maximum length of a string that can be placed in the string variable declared by the following declaration? Explain.

    char s[6];
  6. How many characters are in each of the following character and string constants?

    1. '\n'

    2. 'n'

    3. "Mary"

    4. "M"

    5. "Mary\n"

  7. Since character strings are just arrays of char, why does the text caution you not to confuse the following declaration and initialization?

    char shortString[] = "abc";
    char shortString[] = {'a', 'b', 'c'};
  8. Given the following declaration and initialization of the string variable, write a loop to assign 'X' to all positions of this string variable, keeping the length the same.

    char ourString[15] = "Hi there!";
  9. Given the declaration of a C-string variable, where SIZE is a defined constant:

    char ourString[SIZE];

    The C-string variable ourString has been assigned in code not shown here. For correct C-string variables, the following loop reassigns all positions of ourString the value 'X', leaving the length the same as before. Assume this code fragment is embedded in an otherwise complete and correct program. Answer the questions following this code fragment:

    int index = 0;
    while (ourString[index] != '\0')
    {
    	ourString[index] = 'X';
    	index++;
    }
    1. Explain how this code can destroy the contents of memory beyond the end of the array.

    2. Modify this loop to protect against inadvertently changing memory beyond the end of the array.

  10. Write code using a library function to copy the string constant "Hello" into the string variable declared below. Be sure to include the necessary header file to get the declaration of the function you use.

    char aString[10];
  11. What string will be output when this code is run? (Assume, as always, that this code is embedded in a complete, correct program.)

    char song[10] = "I did it ";
    char franksSong[20];
    strcpy( franksSong, song );
    strcat( franksSong, "my way!");
    cout << franksSong << endl;
  12. What is the problem (if any) with this code?

    char aString[20] = "How are you? ";
    strcat(aString, "Good, I hope.");

C-String Input and Output

C strings can be output using the insertion operator <<. In fact, we have already been doing so with quoted strings. You can use a C-string variable in the same way; for example,

cout << news << "Wow.\n";

where news is a C-string variable.

It is possible to fill a C-string variable using the input operator >>, but there is one thing to keep in mind. As for all other types of data, all whitespace (blanks, tabs, and line breaks) are skipped when C strings are read this way. Moreover, each reading of input stops at the next space or line break. For example, consider the following code:

char a[80], b[80];
cout << "Enter some input:\n";
cin >> a >> b;
cout << a << b << "END OF OUTPUT\n";

When embedded in a complete program, this code produces a dialogue like the following:

Enter some input:
Do bedo to you! 
DobedoEND OF OUTPUT

The C-string variables a and b each receive only one word of the input: a receives the C-string value "Do" because the input character following Do is a blank; b receives "be" because the input character following be is a blank.

If you want your program to read an entire line of input, you can use the extraction operator >> to read the line one word at a time. This can be tedious and it still will not read the blanks in the line. There is an easy way to read an entire line of input and place the resulting C string into a C-string variable: Just use the predefined member function getline, which is a member function of every input stream (such as cin or a file input stream). The function getline has two arguments. The first argument is a C-string variable to receive the input and the second is an integer that typically is the declared size of the C-string variable. The second argument tells the maximum number of array elements in the C-string variable that getline will be allowed to fill with characters. For example, consider the following code:

char a[80];
cout << "Enter some input:\n";
cin.getline(a, 80);
cout << a << "END OF OUTPUT\n";

When embedded in a complete program, this code produces a dialogue like the following:

Enter some input:
Do be do to you!
Do be do to you!END OF OUTPUT

With the function cin.getline, the entire line is read. The reading ends when the line ends, even though the resulting C string may be shorter than the maximum number of characters specified by the second argument.

When getline is executed, the reading stops after the number of characters given by the second argument have been filled in the C-string array, even if the end of the line has not been reached. For example, consider the following code:

char shortString[5];
cout << "Enter some input:\n";
cin.getline(shortString, 5);
cout << shortString << "END OF OUTPUT\n";

When embedded in a complete program, this code produces a dialogue like the following:

Enter some input:
dobedowap
dobeEND OF OUTPUT

Notice that four, not five, characters are read into the C-string variable shortString, even though the second argument is 5. This is because the null character '\0' fills one array position. Every C string is terminated with the null character when it is stored in a C-string variable, and this always consumes one array position.

The C-string input and output techniques we illustrated for cout and cin work the same way for input and output with files. The input stream cin can be replaced by an input stream that is connected to a file. The output stream cout can be replaced by an output stream that is connected to a file. (File I/O is discussed in Chapter 6.)

Self-Test Exercises

  1. Consider the following code (and assume it is embedded in a complete and correct program and then run):

    char a[80], b[80];
    cout << "Enter some input:\n";
    cin >> a >> b;
    cout << a << '−' << b << "END OF OUTPUT\n";

    If the dialogue begins as follows, what will be the next line of output?

    Enter some input:
    The
    	time is now.
  2. Consider the following code (and assume it is embedded in a complete and correct program and then run):

    char myString[80];
    cout << "Enter a line of input:\n";
    cin.getline(myString, 6);
    cout << myString << "<END OF OUTPUT";

    If the dialogue begins as follows, what will be the next line of output?

    Enter a line of input:
    May the hair on your toes grow long and curly.

C-String-to-Number Conversions and Robust Input

The C string "1234" and the number 1234 are not the same things. The first is a sequence of characters; the second is a number. In everyday life, we write them the same way and blur this distinction, but in a C++ program this distinction cannot be ignored. If you want to do arithmetic, you need 1234, not "1234". If you want to add a comma to the numeral for one thousand two hundred thirty four, then you want to change the C string "1234" to the C string "1,234". When designing numeric input, it is often useful to read the input as a string of characters, edit the string, and then convert the string to a number. For example, if you want your program to read an amount of money, the input may or may not begin with a dollar sign. If your program is reading percentages, the input may or may not have a percent sign at the end. If your program reads the input as a string of characters, it can store the string in a C-string variable and remove any unwanted characters, leaving only a C string of digits. Your program then needs to convert this C string of digits to a number, which can easily be done with the predefined function atoi.

The function atoi takes one argument that is a C string and returns the int value that corresponds to that C string. For example, atoi ("1234") returns the integer 1234. If the argument does not correspond to an int value, then atoi returns 0. For example, atoi(“#37”) returns 0, because the character ‘#’ is not a digit. You pronounce atoi as “A to I,” which is an abbreviation of “alphabetic to integer.” The function atoi is in the library with

header file cstdlib, so any program that uses it must contain the following directive:

#include <cstdlib>

If your numbers are too large to be values of type int, you can convert them from C strings to values of type long. The function atol performs the same conversion as the function atoi except that atol returns values of type long and thus can accommodate larger integer values (on systems where this is a concern).

Display 8.2 contains the definition of a function called readAndClean that reads a line of input and discards all characters other than the digits '0' through '9'. The function then uses the function atoi to convert the “cleaned-up” C string of digits to an integer value. As the demonstration program indicates, you can use this function to read money amounts and it will not matter whether the user included a dollar sign or not. Similarly, you can read percentages and it will not matter whether the user types in a percent sign or not. Although the output makes it look as if the function readAndClean simply removes some symbols, more than that is happening. The value produced is a true int value that can be used in a program as a number; it is not a C string of characters.

Display 8.2 C Strings to Integers

 1	 //Demonstrates the function readAndClean.
 2	  #include <iostream>
 3	  #include <cstdlib>
 4	  #include <cctype>
 5	  
 6	  void readAndClean(int& n);
 7	  //Reads a line of input. Discards all symbols except the digits. Converts
 8	  //the C string to an integer and sets n equal to the value of this integer.
 9	  
10	  void newLine( );
11	  //Discards all the input remaining on the current input line.
12	  //Also discards the '\n' at the end of the line.
13	   
14	  int main( )
15	  {
16		  using namespace std;
17		  int n;
18		  char ans;
19		  do
20		  {
21			 cout << "Enter an integer and press Return: ";
22			 readAndClean(n);
23			 cout << "That string converts to the integer " << n <<endl;
24			 cout << "Again? (yes/no): ";
25			 cin >> ans;
26			 newLine( );
27		  }	 while ( (ans != 'n') && (ans != 'N') );
28		  return 0;
29	  }
30	  //Uses iostream, cstdlib, and cctype:
31	  void readAndClean(int& n)
32	  {
33		  using namespace std;
34		  const int ARRAY_SIZE = 6;
35		  char digitString[ARRAY_SIZE];
36	   
37		  char next;
38		  cin.get(next);
39		  int index = 0;
40		  while (next != '\n')
41		  {
42			  if ((isdigit(next)) && (index < ARRAY_SIZE - 1))
43			  {
44				  digitString[index] = next;
45				  index++;
46			  }
47			cin.get(next);
48		  }
49		  digitString[index] = '\0';
50		  n = atoi(digitString);
51	  }
52	  //Uses iostream:
53	  void newLine( )
54	  {
55		  using namespace std;
		   <The rest of the definition of newLine is given in Display 6.7.>

Sample Dialogue

Enter an integer and press Return:  $ 100
That string converts to the integer 100
Again? (yes/no):  yes
Enter an integer and press Return:	100
That string converts to the integer 100
Again? (yes/no):  yes
Enter an integer and press Return:	99%
That string converts to the integer 99
Again? (yes/no):  yes
Enter an integer and press Return:	23% &&5 *12
That string converts to the integer 23512
Again? (yes/no):  no

The function readAndClean shown in Display 8.2 will delete any nondigits from the string typed in, but it cannot check that the remaining digits will yield the number the user has in mind. The user should be given a chance to look at the final value and see whether it is correct. If the value is not correct, the user should be given a chance to reenter the input. In Display 8.3 we have used the function readAndClean in another function called getInt, which will accept anything the user types and will allow the user to reenter the input until she or he is satisfied with the number that is computed from the input string. It is a very robust input procedure. (The function getInt is an improved version of the function of the same name given in Display 6.7.)

Display 8.3 Robust Input Function

 1	 //Demonstration program for improved version of getInt.
 2	  #include <iostream>
 3	  #include <cstdlib>
 4	  #include <cctype>

 5	  void readAndClean(int& n);
 6	  //Reads a line of input. Discards all symbols except the digits. Converts
 7	  //the C string to an integer and sets n equal to the value of this integer.

 8	  void newLine( );
 9	  //Discards all the input remaining on the current input line.
10	  //Also discards the '\n' at the end of the line.

11	  void getInt(int& inputNumber);
12	  //Gives inputNumber a value that the user approves of.
13	int main( )
14	  {
15		  using namespace std;
16		  int inputNumber;
17		  getInt(inputNumber);
18		  cout << "Final value read in = " <<inputNumber<<endl;
19		  return 0;
20	  }

21	  //Uses iostream and readAndClean:
22	  void getInt(int& inputNumber)
23	  {
24		  using namespace std;
25		  char ans;
26		  do
27		  {
28			 cout << "Enter input number: ";
29			 readAndClean(inputNumber);
30			 cout << "You entered " <<inputNumber
31				  << " Is that correct? (yes/no): ";
32			 cin >> ans;
33			 newLine( );
34		  } while ((ans != 'y') && (ans != 'Y'));
35	  }

36	  //Uses iostream, cstdlib, and cctype:
37	  void readAndClean(int& n)
<The rest of the definition of readAndClean is given in	 Display 8.2.>
38	  //Uses iostream:
39	  void newLine( )
<The rest of the definition of  newLine	 is given in  Display 8.2.>

Sample Dialogue

Enter input number:	$57
You entered 57 Is that correct? (yes/no):  no
Enter input number:	 $77*5xa
You entered 775 Is that correct? (yes/no):	no
Enter input number:	 77
You entered 77 Is that correct? (yes/no):  no
Enter input number:	 $75
You entered 75 Is that correct? (yes/no):  yes
Final value read in = 75

The functions readAndClean in Display 8.2 and getInt in Display 8.3 are samples of the various input functions you can design by reading numeric input as a string value. Programming Project 3 at the end of this chapter asks you to define a function similar to getInt that reads in a number of type double, as opposed to a number of type int. To write that function, it would be nice to have a predefined function that converts a string value to a number of type double. Fortunately, the predefined function atof, which is also in the library with header file cstdlib, does just that. For example, atof ("9.99") returns the value 9.99 of type double. If the argument does not correspond to a number of type double, then atof returns 0.0. You pronounce atof as “A to F,” which is an abbreviation of “alphabetic to floating point.” Recall that numbers with a decimal point are often called floating-point numbers because of the way the computer handles the decimal point when storing these numbers in memory.