In everything one must consider the end.
JEAN DE LA FONTAINE, FABLES, BOOK III (1668)
In this section we describe one way to represent strings of characters, which C++ has inherited from the C language. In Section 8.2 we describe a string class that is a more modern way to represent strings. Although the string type described here may be a bit “old-fashioned,” it is still widely used and is an integral part of the C++ language.
One way to represent a string is as an array with base type char
. If the string is "Hello"
, it is handy to represent it as an array of characters with six indexed variables: five for the five letters in “Hello” plus one for the character '\0'
, which serves as an end marker. The character '\0'
is called the null character and is used as an end marker because it is distinct from all the “real” characters. The end marker allows your program to read the array one character at a time and know that it should stop reading when it reads the end marker '\0'
. A string stored in this way (as an array of characters terminated with '\0'
) is called a C string.
We write '\0'
with two symbols when we write it in a program, but just like the new-line character '\n'
, the character '\0'
is really only a single character value. Like any other character value, '\0'
can be stored in one variable of type char
or one indexed variable of an array of characters.
You have already been using C strings. In C++, a literal string, such as "Hello"
, is stored as a C string, although you seldom need to be aware of this detail.
A C-string variable is just an array of characters. Thus, the following array declaration provides us with a C-string variable capable of storing a C-string value with nine or fewer characters:
char s[10];
The 10
is for the nine letters in the string plus the null character '\0'
to mark the end of the string.
A C-string variable is a partially filled array of characters. Like any other partially filled array, a C-string variable uses positions starting at indexed variable 0 through as many as are needed. However, a C-string variable does not use an int
variable to keep track of how much of the array is currently being used. Instead, a string variable places the special symbol '\0'
in the array immediately after the last character of the C string. Thus, if s contains the string "Hi Mom"
!, then the array elements are filled as shown here:
The character '\0'
is used as a sentinel value to mark the end of the C string. If you read the characters in the C string starting at indexed variable s[0]
, proceed to s[1]
, and then to s[2]
, and so forth, you know that when you encounter the symbol '\0'
, you have reached the end of the C string. Since the symbol '\0'
always occupies one element of the array, the length of the longest string that the array can hold is 1 less than the size of the array.
The thing that distinguishes a C-string variable from an ordinary array of characters is that a C-string variable must contain the null character '\0'
at the end of the C-string value. This is a distinction in how the array is used rather than a distinction about what the array is. A C-string variable is an array of characters, but it is used in a different way.
You can initialize a C-string variable when you declare it, as illustrated by the following example:
char myMessage[20] = "Hi there.";
Notice that the C string assigned to the C-string variable need not fill the entire array.
When you initialize a C-string variable, you can omit the array size. C++ will automatically make the size of the C-string variable 1 more than the length of the quoted string. (The one extra indexed variable is for '\0'
.) For example,
char shortString[] = "abc";
is equivalent to
char shortString[4] = "abc";
Be sure you do not confuse the following initializations:
char shortString[] = "abc";
and
char shortString[] = {'a', 'b', 'c'};
They are not equivalent. The first of these two possible initializations places the null character '\0'
in the array after the characters 'a'
, 'b'
, and 'c'
. The second one does not put a '\0'
anywhere in the array.
A C-string variable is an array, so it has indexed variables that can be used just like those of any other array. For example, suppose your program contains the following C-string variable declaration:
char ourString[5] = "Hi";
With ourString
declared as shown previously, your program has the following indexed variables: ourString[0], ourString[1], ourString[2], ourString[3]
, and ourString[4]
. For example, the following will change the C-string value in ourString
to a C string of the same length consisting of all 'X'
characters:
int index = 0;
while (ourString[index] != '\0')
{
ourString[index] = 'X';
index++;
}
When manipulating these indexed variables, you should be very careful not to replace the null character '\0'
with some other value. If the array loses the value '\0'
, it will no longer behave like a C-string variable. For example, the following will change the array happyString
so that it no longer contains a C string:
char happyString[7] = "DoBeDo";
happyString[6] = 'Z';
After this code is executed, the array happyString
will still contain the six letters in the C-string "DoBeDo"
, but happyString
will no longer contain the null character '\0'
to mark the end of the C string. Many string-manipulating functions depend critically on the presence of '\0'
to mark the end of the C-string value.
As another example, consider the previous while
loop that changed characters in the C-string variable ourString
. That while
loop changes characters until it encounters a '\0'
. If the loop never encounters a '\0'
, then it could change a large chunk of memory to some unwanted values, which could make your program do strange things. As a safety feature, it would be wise to rewrite that while
loop as follows, so that if the null character '\0'
is lost, the loop will not inadvertently change memory locations beyond the end of the array:
int index = 0;
while ( (ourString[index] != '\0') && (index < SIZE) )
{
ourString[index] = 'X';
index++;
}
SIZE
is a defined constant equal to the declared size of the array ourString
.
<cstring>
LibraryYou do not need any include
directive or using
directive in order to declare and initialize C strings. However, when processing C strings, you inevitably will use some of the predefined string functions in the library <cstring>
. So, when using C strings, you will normally give the following include
directive near the beginning of the file with your code:
#include <cstring>
cstring
Display 8.1 contains a few of the most commonly used functions from the library with the header file <cstring>
. To use them, you insert the following near the top of the file:
#include <cstring>
<cstring>
Function | Description | Cautions |
---|---|---|
strcpy (Target_String_Var, SrcString ) |
Copies the C-string value SrcString into the C-string variable Target_String_Var . |
Does not check to make sure Target_String_Var is large enough to hold the value SrcString . |
strncpy (Target_String_Var, SrcString, Limit ) |
The same as the two-argument strcpy except that at most Limit characters are copied . |
If Limit is chosen carefully, this is safer than the two-argument version of strcpy. Not implemented in all versions of C++ . |
strcat (Target_String_Var, SrcString ) |
Concatenates the C-string value SrcString onto the end of the C string in the C-string variable Target_String_Var . |
Does not check to see that Target_String_Var is large enough to hold the result of the concatenation . |
strncat (Target_String_Var, SrcString, Limit ) |
The same as the two-argument strcat except that at most Limit characters are appended . |
If Limit is chosen carefully, this is safer than the two-argument version of strcat. Not implemented in all versions of C++ . |
strlen (SrcString ) |
Returns an integer equal to the length of SrcString. (The null character, '\0', is not counted in the length .) |
|
strcmp (String_1, String_2 ) |
Returns 0 if String_1 and String_2 are the same. Returns a value < 0 if String_1 is less than String_2. Returns a value >0 if String_1 is greater than String_2 (that is, returns a nonzero value if String_1 and String_2 are different). The order is lexicographic . |
If String_1 equals String_2, this function returns 0, which converts to false. Note that this is the reverse of what you might expect it to return when the strings are equal . |
strncmp (String_1, String_2, Limit ) |
The same as the two-argument strcat except that at most Limit characters are compared . |
If Limit is chosen care-fully, this is safer than the two-argument version of strcmp. Not implemented in all versions of C++ . |
Like the functions strcpy
and strcmp
, all the other functions in <cstring>
also do not require the following or anything similar (although other parts of your program are likely to require it):1
using namespace std;
We have already discussed strcpy
and strcmp
. The function strlen
is easy to understand and use. For example, strlen("dobedo")
returns 6 because there are six characters in "dobedo"
.
The function strcat
is used to concatenate two C strings, that is, to form a longer string by placing the two shorter C strings end-to-end. The first argument must be a C-string variable. The second argument can be anything that evaluates to a C-string value, such as a quoted string. The result is placed in the C-string variable that is the first argument. For example, consider the following:
char stringVar[20] = "The rain";
strcat(stringVar, "in Spain");
This code will change the value of stringVar
to "The rainin Spain"
. As this example illustrates, you need to be careful to account for blanks when concatenating C strings.
If you look at the table in Display 8.1, you will see that safer, three-argument versions of the functions strcpy
, strcat
, and strcmp
are available in many, but not all, versions of C++. Also, note that these three-argument versions are spelled with an added letter n: strncpy
, strncat
, and strncmp
.
A C-string variable is an array, so a C-string parameter to a function is simply an array parameter.
As with any array parameter, whenever a function changes the value of a C-string parameter, it is safest to include an additional int
parameter giving the declared size of the C-string variable.
On the other hand, if a function only uses the value in a C-string argument but does not change that value, then there is no need to include another parameter to give either the declared size of the C-string variable or the amount of the C-string variable array that is filled. The null character '\0'
can be used to detect the end of the C-string value that is stored in the C-string variable.
strcpy
Dangers of strcpy
A common error in C and C++ is to copy a larger C-string to a smaller C-string using strcpy
. This is dangerous because the strcpy
function doesn’t put any bounds on how much data to copy. It will simply copy everything from the source string to the target string until the null character is encountered. If the source is larger than the target then data will be copied past the memory allocated for the target string. Here is a simple example where we could have problems:
void copyString(char source[])
{
char target[5];
strcpy(target, source);
// If this was more than an example we would presumably
// use the target string in some way here
}
Quite simply, if the source
C-string is larger than five characters then this code will copy data into whatever happens to be stored past the target
array, likely causing your program to crash or do unpredictable things. It could even open up your system to attack by malicious users. This has been such a serious problem that some compilers will not compile code that uses strcpy
unless you override the warning. Assuming your compiler does allow you to use strcpy
, one way to fix the problem is to only copy the C-string if it is less than five characters long. Consider the following attempt to avoid exceeding the size of the C-string:
void copyString(char source[])
{
char target[5];
signed char length; // Can hold −128 to +127
length = strlen(source);
if (length < 5)
strcpy(target, source);
}
In this version we might use a signed char
to store the length of the C-string. This may seem reasonable since we are only creating an array of size 5 and a signed char
can store values up to +127. This version will work fine for small source strings. But what if we input a source string that is 145 characters long? strlen
will return 145, but this number is too large to store in a signed char
. This causes overflow and results in a negative value copied into length
. As a result the program enters the if
statement and erroneously copies the source data to the target array. To avoid this problem we should make length
an int
(the same size returned by strlen
), use strncpy
to cap the maximum copy length, or use the string
class described in the next section.
Which of the following declarations are equivalent?
char stringVar[10] = "Hello";
char stringVar[10] = {'H', 'e', 'l', 'l', 'o', '\0'};
char stringVar[10] = {'H', 'e', 'l', 'l', 'o'};
char stringVar[6] = "Hello";
char stringVar[] = "Hello";
What C string will be stored in singingString
after the following code is run?
char singingString[20] = "DoBeDo";
strcat(singingString, " to you");
Assume that the code is embedded in a complete and correct program and that an include
directive for <cstring>
is in the program file.
What (if anything) is wrong with the following code?
char stringVar[] = "Hello";
strcat(stringVar, " and Good-bye.");
cout << stringVar;
Assume that the code is embedded in a complete program and that an include
directive for <cstring>
is in the program file.
Suppose the function strlen
(which returns the length of its string argument) was not already defined for you. Give a function definition for strlen
. Note that strlen
has only one argument, which is a C string. Do not add additional arguments; they are not needed.
What is the maximum length of a string that can be placed in the string variable declared by the following declaration? Explain.
char s[6];
How many characters are in each of the following character and string constants?
'\n'
'n'
"Mary"
"M"
"Mary\n"
Since character strings are just arrays of char
, why does the text caution you not to confuse the following declaration and initialization?
char shortString[] = "abc";
char shortString[] = {'a', 'b', 'c'};
Given the following declaration and initialization of the string variable, write a loop to assign 'X'
to all positions of this string variable, keeping the length the same.
char ourString[15] = "Hi there!";
Given the declaration of a C-string variable, where SIZE
is a defined constant:
char ourString[SIZE];
The C-string variable ourString
has been assigned in code not shown here. For correct C-string variables, the following loop reassigns all positions of ourString
the value 'X'
, leaving the length the same as before. Assume this code fragment is embedded in an otherwise complete and correct program. Answer the questions following this code fragment:
int index = 0;
while (ourString[index] != '\0')
{
ourString[index] = 'X';
index++;
}
Explain how this code can destroy the contents of memory beyond the end of the array.
Modify this loop to protect against inadvertently changing memory beyond the end of the array.
Write code using a library function to copy the string constant "Hello"
into the string variable declared below. Be sure to include the necessary header file to get the declaration of the function you use.
char aString[10];
What string will be output when this code is run? (Assume, as always, that this code is embedded in a complete, correct program.)
char song[10] = "I did it ";
char franksSong[20];
strcpy( franksSong, song );
strcat( franksSong, "my way!");
cout << franksSong << endl;
What is the problem (if any) with this code?
char aString[20] = "How are you? ";
strcat(aString, "Good, I hope.");
C strings can be output using the insertion operator <<
. In fact, we have already been doing so with quoted strings. You can use a C-string variable in the same way; for example,
cout << news << "Wow.\n";
where news
is a C-string variable.
It is possible to fill a C-string variable using the input operator >>
, but there is one thing to keep in mind. As for all other types of data, all whitespace (blanks, tabs, and line breaks) are skipped when C strings are read this way. Moreover, each reading of input stops at the next space or line break. For example, consider the following code:
char a[80], b[80];
cout << "Enter some input:\n";
cin >> a >> b;
cout << a << b << "END OF OUTPUT\n";
When embedded in a complete program, this code produces a dialogue like the following:
Enter some input:
Do bedo to you!
DobedoEND OF OUTPUT
The C-string variables a
and b
each receive only one word of the input: a
receives the C-string value "Do"
because the input character following Do
is a blank; b
receives "be"
because the input character following be
is a blank.
If you want your program to read an entire line of input, you can use the extraction operator >>
to read the line one word at a time. This can be tedious and it still will not read the blanks in the line. There is an easy way to read an entire line of input and place the resulting C string into a C-string variable: Just use the predefined member function getline
, which is a member function of every input stream (such as cin
or a file input stream). The function getline
has two arguments. The first argument is a C-string variable to receive the input and the second is an integer that typically is the declared size of the C-string variable. The second argument tells the maximum number of array elements in the C-string variable that getline
will be allowed to fill with characters. For example, consider the following code:
char a[80];
cout << "Enter some input:\n";
cin.getline(a, 80);
cout << a << "END OF OUTPUT\n";
When embedded in a complete program, this code produces a dialogue like the following:
Enter some input:
Do be do to you!
Do be do to you!END OF OUTPUT
With the function cin.getline
, the entire line is read. The reading ends when the line ends, even though the resulting C string may be shorter than the maximum number of characters specified by the second argument.
When getline
is executed, the reading stops after the number of characters given by the second argument have been filled in the C-string array, even if the end of the line has not been reached. For example, consider the following code:
char shortString[5];
cout << "Enter some input:\n";
cin.getline(shortString, 5);
cout << shortString << "END OF OUTPUT\n";
When embedded in a complete program, this code produces a dialogue like the following:
Enter some input:
dobedowap
dobeEND OF OUTPUT
Notice that four, not five, characters are read into the C-string variable shortString
, even though the second argument is 5. This is because the null character '\0'
fills one array position. Every C string is terminated with the null character when it is stored in a C-string variable, and this always consumes one array position.
The C-string input and output techniques we illustrated for cout
and cin
work the same way for input and output with files. The input stream cin
can be replaced by an input stream that is connected to a file. The output stream cout
can be replaced by an output stream that is connected to a file. (File I/O is discussed in Chapter 6.)
The member function getline
can be used to read a line of input and place the C string of characters on that line into a C-string variable.
cin.getline(stringVar, Max_Characters + 1);
One line of input is read from the stream Input_Stream
, and the resulting C string is placed in stringVar
. If the line is more than Max_Characters
long, then only the first Max_Characters
on the line are read. (The +1
is needed because every C string has the null character '\0'
added to the end of the C string and so the string stored in stringVar
is 1 longer than the number of characters read in.)
char oneLine[80];
cin.getline(oneLine, 80);
(You can use an input stream connected to a text file in place of cin
.)
Consider the following code (and assume it is embedded in a complete and correct program and then run):
char a[80], b[80];
cout << "Enter some input:\n";
cin >> a >> b;
cout << a << '−' << b << "END OF OUTPUT\n";
If the dialogue begins as follows, what will be the next line of output?
Enter some input:
The
time is now.
Consider the following code (and assume it is embedded in a complete and correct program and then run):
char myString[80];
cout << "Enter a line of input:\n";
cin.getline(myString, 6);
cout << myString << "<END OF OUTPUT";
If the dialogue begins as follows, what will be the next line of output?
Enter a line of input:
May the hair on your toes grow long and curly.
The C string "1234"
and the number 1234
are not the same things. The first is a sequence of characters; the second is a number. In everyday life, we write them the same way and blur this distinction, but in a C++ program this distinction cannot be ignored. If you want to do arithmetic, you need 1234
, not "1234"
. If you want to add a comma to the numeral for one thousand two hundred thirty four, then you want to change the C string "1234"
to the C string "1,234"
. When designing numeric input, it is often useful to read the input as a string of characters, edit the string, and then convert the string to a number. For example, if you want your program to read an amount of money, the input may or may not begin with a dollar sign. If your program is reading percentages, the input may or may not have a percent sign at the end. If your program reads the input as a string of characters, it can store the string in a C-string variable and remove any unwanted characters, leaving only a C string of digits. Your program then needs to convert this C string of digits to a number, which can easily be done with the predefined function atoi
.
The function atoi
takes one argument that is a C string and returns the int
value that corresponds to that C string. For example, atoi
("1234"
) returns the integer 1234
. If the argument does not correspond to an int
value, then atoi
returns 0. For example, atoi
(“#37”) returns 0, because the character ‘#’ is not a digit. You pronounce atoi
as “A to I,” which is an abbreviation of “alphabetic to integer.” The function atoi
is in the library with
The functions atoi
, atol
, and atof
can be used to convert a C string of digits to the corresponding numeric value. The functions atoi
and atoll
convert C strings to integers. The only difference between atoi
and atol
is that atoi
returns a value of type int
whereas atol
returns a value of type long
. The function atof
converts a C string to a value of type double
. If the C-string argument (to either function) is such that the conversion cannot be made, then the function returns zero. For example
int x = atoi(“657”);
sets the value of x to 657, and
double y = atof(“12.37”);
sets the value of y to 12.37.
Any program that uses atoi
or atof
must contain the following directive:
#include <cstdlib>
header file cstdlib
, so any program that uses it must contain the following directive:
#include <cstdlib>
If your numbers are too large to be values of type int
, you can convert them from C strings to values of type long
. The function atol
performs the same conversion as the function atoi
except that atol
returns values of type long
and thus can accommodate larger integer values (on systems where this is a concern).
Display 8.2 contains the definition of a function called readAndClean
that reads a line of input and discards all characters other than the digits '0'
through '9'
. The function then uses the function atoi
to convert the “cleaned-up” C string of digits to an integer value. As the demonstration program indicates, you can use this function to read money amounts and it will not matter whether the user included a dollar sign or not. Similarly, you can read percentages and it will not matter whether the user types in a percent sign or not. Although the output makes it look as if the function readAndClean
simply removes some symbols, more than that is happening. The value produced is a true int
value that can be used in a program as a number; it is not a C string of characters.
1 //Demonstrates the function readAndClean.
2 #include <iostream>
3 #include <cstdlib>
4 #include <cctype>
5
6 void readAndClean(int& n);
7 //Reads a line of input. Discards all symbols except the digits. Converts
8 //the C string to an integer and sets n equal to the value of this integer.
9
10 void newLine( );
11 //Discards all the input remaining on the current input line.
12 //Also discards the '\n' at the end of the line.
13
14 int main( )
15 {
16 using namespace std;
17 int n;
18 char ans;
19 do
20 {
21 cout << "Enter an integer and press Return: ";
22 readAndClean(n);
23 cout << "That string converts to the integer " << n <<endl;
24 cout << "Again? (yes/no): ";
25 cin >> ans;
26 newLine( );
27 } while ( (ans != 'n') && (ans != 'N') );
28 return 0;
29 }
30 //Uses iostream, cstdlib, and cctype:
31 void readAndClean(int& n)
32 {
33 using namespace std;
34 const int ARRAY_SIZE = 6;
35 char digitString[ARRAY_SIZE];
36
37 char next;
38 cin.get(next);
39 int index = 0;
40 while (next != '\n')
41 {
42 if ((isdigit(next)) && (index < ARRAY_SIZE - 1))
43 {
44 digitString[index] = next;
45 index++;
46 }
47 cin.get(next);
48 }
49 digitString[index] = '\0';
50 n = atoi(digitString);
51 }
52 //Uses iostream:
53 void newLine( )
54 {
55 using namespace std;
<The rest of the definition of newLine
is given in Display 6.7.>
Sample Dialogue
Enter an integer and press Return: $ 100 That string converts to the integer 100 Again? (yes/no): yes Enter an integer and press Return: 100 That string converts to the integer 100 Again? (yes/no): yes Enter an integer and press Return: 99% That string converts to the integer 99 Again? (yes/no): yes Enter an integer and press Return: 23% &&5 *12 That string converts to the integer 23512 Again? (yes/no): no
The function readAndClean
shown in Display 8.2 will delete any nondigits from the string typed in, but it cannot check that the remaining digits will yield the number the user has in mind. The user should be given a chance to look at the final value and see whether it is correct. If the value is not correct, the user should be given a chance to reenter the input. In Display 8.3 we have used the function readAndClean
in another function called getInt
, which will accept anything the user types and will allow the user to reenter the input until she or he is satisfied with the number that is computed from the input string. It is a very robust input procedure. (The function getInt
is an improved version of the function of the same name given in Display 6.7.)
1 //Demonstration program for improved version of getInt.
2 #include <iostream>
3 #include <cstdlib>
4 #include <cctype>
5 void readAndClean(int& n);
6 //Reads a line of input. Discards all symbols except the digits. Converts
7 //the C string to an integer and sets n equal to the value of this integer.
8 void newLine( );
9 //Discards all the input remaining on the current input line.
10 //Also discards the '\n' at the end of the line.
11 void getInt(int& inputNumber);
12 //Gives inputNumber a value that the user approves of.
13 int main( )
14 {
15 using namespace std;
16 int inputNumber;
17 getInt(inputNumber);
18 cout << "Final value read in = " <<inputNumber<<endl;
19 return 0;
20 }
21 //Uses iostream and readAndClean:
22 void getInt(int& inputNumber)
23 {
24 using namespace std;
25 char ans;
26 do
27 {
28 cout << "Enter input number: ";
29 readAndClean(inputNumber);
30 cout << "You entered " <<inputNumber
31 << " Is that correct? (yes/no): ";
32 cin >> ans;
33 newLine( );
34 } while ((ans != 'y') && (ans != 'Y'));
35 }
36 //Uses iostream, cstdlib, and cctype:
37 void readAndClean(int& n)
<The rest of the definition of readAndClean
is given in Display 8.2.>
38 //Uses iostream:
39 void newLine( )
<The rest of the definition of newLine
is given in Display 8.2.>
Sample Dialogue
Enter input number: $57 You entered 57 Is that correct? (yes/no): no Enter input number: $77*5xa You entered 775 Is that correct? (yes/no): no Enter input number: 77 You entered 77 Is that correct? (yes/no): no Enter input number: $75 You entered 75 Is that correct? (yes/no): yes Final value read in = 75
The functions readAndClean
in Display 8.2 and getInt
in Display 8.3 are samples of the various input functions you can design by reading numeric input as a string value. Programming Project 3 at the end of this chapter asks you to define a function similar to getInt
that reads in a number of type double
, as opposed to a number of type int
. To write that function, it would be nice to have a predefined function that converts a string value to a number of type double
. Fortunately, the predefined function atof
, which is also in the library with header file cstdlib
, does just that. For example, atof ("9.99")
returns the value 9.99
of type double
. If the argument does not correspond to a number of type double
, then atof
returns 0.0
. You pronounce atof
as “A to F,” which is an abbreviation of “alphabetic to floating point.” Recall that numbers with a decimal point are often called floating-point numbers because of the way the computer handles the decimal point when storing these numbers in memory.