7.1 Introduction to Arrays

Suppose we wish to write a program that reads in five test scores and performs some manipulations on these scores. For instance, the program might compute the highest test score and then output the amount by which each score falls short of the highest. The highest score is not known until all five scores are read in. Hence, all five scores must be retained in storage so that after the highest score is computed each score can be compared to it.

To retain the five scores, we will need something equivalent to five variables of type int. We could use five individual variables of type int, but five variables are hard to keep track of, and we may later want to change our program to handle 100 scores; certainly, 100 variables are impractical. An array is the perfect solution. An array behaves like a list of variables with a uniform naming mechanism that can be declared in a single line of simple code. For example, the names for the five individual variables we need might be score[0], score[1], score[2], score[3], and score[4]. The part that does not change—in this case, score—is the name of the array. The part that can change is the integer in the square brackets, [ ].

Declaring and Referencing Arrays

In C++, an array consisting of five variables of type int can be declared as follows:

int score[5];

This declaration is like declaring the following five variables to all be of type int:

score[0], score[1], score[2], score[3], score[4]

The individual variables that together make up the array are referred to in a variety of different ways. We will call them indexed variables, though they are also sometimes called subscripted variables or elements of the array. The number in square brackets is called an index or a subscript. In C++, indexes are numbered starting with 0, not with 1 or any other number except 0. The number of indexed variables in an array is called the declared size of the array, or sometimes simply the size of the array. When an array is declared, the size of the array is given in square brackets after the array name. The indexed variables are then numbered (also using square brackets), starting with 0 and ending with the integer that is one less than the size of the array.

In our example, the indexed variables were of type int, but an array can have indexed variables of any type. For example, to declare an array with indexed variables of type double, simply use the type name double instead of int in the declaration of the array. All the indexed variables for one array are, however, of the same type. This type is called the base type of the array. Thus, in our example of the array score, the base type is int.

You can declare arrays and regular variables together. For example, the following declares the two int variables next and max in addition to the array score:

int next, score[5], max;

An indexed variable like score[3] can be used anyplace that an ordinary variable of type int can be used.

Do not confuse the two ways to use the square brackets [ ] with an array name. When used in a declaration, such as

int score[5];

the number enclosed in the square brackets specifies how many indexed variables the array has. When used anywhere else, the number enclosed in the square brackets tells which indexed variable is meant. For example, score[0] through score[4] are indexed variables.

The index inside the square brackets need not be given as an integer constant. You can use any expression in the square brackets as long as the expression evaluates to one of the integers 0 through the integer that is one less than the size of the array. For example, the following will set the value of score[3] equal to 99:

int n = 2;
score[n + 1] = 99;

Although they may look different, score[n+1] and score[3] are the same indexed variable in the code above. That is because n + 1 evaluates to 3.

The identity of an indexed variable, such as score[i], is determined by the value of its index, which in this instance is i. Thus, you can write programs that say things such as “do such and such to the ith indexed variable,” where the value of i is computed by the program. For example, the program in Display 7.1 reads in scores and processes them in the way we described at the start of this chapter.

Programming Tip Use `for` Loops with Arrays

The second for loop in Display 7.1 illustrates a common way to step through an array using a for loop:

for (i = 0; i < 5; i++)
    cout << score[i] << " off by "
         << (max - score[i]) << endl;

The for statement is ideally suited to array manipulations.

Programming Tip Use a Defined `Constant` for the Size of an Array

Look again at the program in Display 7.1. It only works for classes that have exactly five students. Most classes do not have exactly five students. One way to make a program more versatile is to use a defined constant for the size of each array. For example, the program in Display 7.1 could be rewritten to use the following defined constant:

const int NUMBER_OF_STUDENTS = 5;

The line with the array declaration would then be

int i, score[NUMBER_OF_STUDENTS], max;

Of course, all places that have a 5 for the size of the array should also be changed to have NUMBER_OF_STUDENTS instead of 5. If these changes are made to the program (or better still, if the program had been written this way in the first place), then the program can be rewritten to work for any number of students by simply changing the one line that defines the constant NUMBER_OF_STUDENTS. Note that on many compilers you cannot use a variable for the array size, such as the following:

cout << "Enter number of students:\n";
cin >> number;
int score[number]; //ILLEGAL ON MANY COMPILERS!

Display 7.1 Program Using an Array

 1    //Reads in 5 scores and shows how much each
 2    //score differs from the highest score.
 3    #include <iostream>
 4    int main( )
 5    {
 6        using namespace std;
 7        int i, score[5], max;

 8        cout << "Enter 5 scores:\n";
 9        cin >> score[0];
10        max = score[0];
11        for (i = 1; i < 5; i++)
12        {
13            cin >> score[i];
14            if (score[i] > max)
15                max = score[i];
16            //max is the largest of the values score[0],..., score[i].
17        }
18        cout << "The highest score is " << max << endl
19             << "The scores and their\n"
20             << "differences from the highest are:\n";
21        for (i = 0; i < 5; i++)
22            cout << score[i] << " off by "
23                 << (max − score[i]) << endl;
24        return 0;
25    }

Sample Dialogue

Enter 5 scores:
5 9 2 10 6
The highest score is 10
The scores and their
differences from the highest are:
5 off by 5
9 off by 1
2 off by 8
10 off by 0
6 off by 4

Some but not all compilers will allow you to specify an array size with a variable in this way. However, for the sake of portability you should not do so, even if your compiler permits it. (In Chapter 9 we will discuss a different kind of array whose size can be determined when the program is run.)

Arrays in Memory

Before discussing how arrays are represented in a computer’s memory, let’s first see how a simple variable, such as a variable of type int or double, is represented in the computer’s memory. A computer’s memory consists of a list of numbered locations called bytes.1 The number of a byte is known as its address. A simple variable is implemented as a portion of memory consisting of some number of consecutive bytes. The number of bytes is determined by the type of the variable. Thus, a simple variable in memory is described by two pieces of information: an address in memory (giving the location of the first byte for that variable) and the type of the variable, which tells how many bytes of memory the variable requires. When we speak of the address of a variable, it is this address we are talking about. When your program stores a value in the variable, what really happens is that the value (coded as 0s and 1s) is placed in those bytes of memory that are assigned to that variable. Similarly, when a variable is given as a (call-by-reference) argument to a function, it is the address of the variable that is actually given to the calling function. Now let’s move on to discuss how arrays are stored in memory.

Array indexed variables are represented in memory the same way as ordinary variables, but with arrays there is a little more to the story. The locations of the various array indexed variables are always placed next to one another in memory. For example, consider the following:

int a[6];

When you declare this array, the computer reserves enough memory to hold six variables of type int. Moreover, the computer always places these variables one after the other in memory. The computer then remembers the address of indexed variable a[0], but it does not remember the address of any other indexed variable. When your program needs the address of some other indexed variable, the computer calculates the address for this other indexed variable from the address of a[0]. For example, if you start at the address of a[0] and count past enough memory for three variables of type int, then you will be at the address of a[3]. To obtain the address of a[3], the computer starts with the address of a[0] (which is a number). The computer then adds the number of bytes needed to hold three variables of type int to the number for the address of a[0]. The result is the address of a[3]. This implementation is diagrammed in Display 7.2.

Many of the peculiarities of arrays in C++ can be understood only in terms of these details about memory. For example, in the next Pitfall section, we use these details to explain what happens when your program uses an illegal array index.

Array Declaration

Syntax

TypeName ArrayName[Declared_Size];

Examples

int bigArray[100];
double a[3];
double b[5];
char grade[10], oneGrade;

An array declaration, of the form shown, will define Declared_Size indexed variables, namely, the indexed variables ArrayName[0] through ArrayName[Declared_Size-1]. Each indexed variable is a variable of type TypeName.

The array a consists of the indexed variables a[0], a[1], and a[2], all of type double. The array b consists of the indexed variables b[0], b[1], b[2], b[3], and b[4], also all of type double. You can combine array declarations with the declaration of simple variables such as the variable oneGrade shown above.

Pitfall Array Index Out of Range

The most common programming error made when using arrays is attempting to reference a nonexistent array index. For example, consider the following array declaration:

int a[6];

When using the array a, every index expression must evaluate to one of the integers 0 through 5. For example, if your program contains the indexed variable a[i], the i must evaluate to one of the six integers 0, 1, 2, 3, 4, or 5. If i evaluates to anything else, that is an error. When an index expression evaluates to some value other than those allowed by the array declaration, the index is said to be out of range or simply illegal. On most systems, the result of an illegal array index is that your program will do something wrong, possibly disastrously wrong, and will do so without giving you any warning.

Attackers have also exploited this type of error to break into software. An out-of-range programming error could potentially compromise the entire system, so take great care to avoid this error. In 2011, the Common Weakness Enumeration (CWE)/SANS Institute identified this type of error as the third most dangerous programmer error.

Display 7.2 An Array in Memory

Figure 7.2 Full Alternative Text

For example, suppose your system is typical, the array a is declared as shown, and your program contains the following:

a[i] = 238;

Now, suppose the value of i, unfortunately, happens to be 7. The computer proceeds as if a[7] were a legal indexed variable. The computer calculates the address where a[7] would be (if only there were an a[7]), and places the value 238 in that location in memory. However, there is no indexed variable a[7], and the memory that receives this 238 probably belongs to some other variable, maybe a variable named moreStuff. So the value of moreStuff has been unintentionally changed. The situation is illustrated in Display 7.2.

Array indexes get out of range most commonly at the first or last iteration of a loop that processes the array. So, it pays to carefully check all array processing loops to be certain that they begin and end with legal array indexes.

It may sound simple to keep the array indexes within a valid range. In practice it is more difficult, because there are often subtle or unanticipated ways to change an index variable. For example, consider the following code that inputs some numbers into an array:

int num;
int a[10];
cout << "How many numbers? (max of 10)" << endl;
cin >> num;
for (int i = 0; i <= num; i++)
{
   cout << "Enter number " << i << endl;
   cin >> a[i];
}

This program suffers from two errors. First, the loop has an off-by-one error. By starting at index 0 and continuing up to and including num the loop will input num+1 numbers instead of num numbers. As long as a value less than ten is entered for num then you might not notice the problem. The program won’t crash because the numbers will all be entered with the addition of one extra number which still fits in the array. However, if 10 is entered for num then the eleventh number will be stored at index a[10] which is one off the end of the array. To fix this problem the loop should be written as:

for (int i = 0; i < num; i++)

Another problem is the lack of input validation. A malicious or mischievous user could enter 100 as the number of values to enter; the loop would then simply execute 100 times and input data well past the end of the array (the program may crash before looping 100 times as numbers past the end of the array could cause mischief). To address this problem we can validate that the user’s input is within valid range:

cout << "How many numbers? (max of 10)" << endl;
cin >> num;
cout << num << endl;
if (num <= 10)
{    for (int i = 0; i < num; i++)
   {
      cout << "Enter number " << i << endl;
      cin >> a[i];
   }
    }

Even this modified version has the potential for error. If a value is entered for num that exceeds its maximum size then there is the possibility for overflow. For example, on most systems a signed short can only store a number up to +32767. Entering a larger value results in overflow which could store 0 or a negative value in num. Although the for loop will not run if num is zero or negative the program would erroneously pass the if statement. We explore this type of error again in Chapter 8.

Initializing Arrays

An array can be initialized when it is declared. When initializing the array, the values for the various indexed variables are enclosed in braces and separated with commas. For example,

int children[3] = {2, 12, 1};

This declaration is equivalent to the following code:

int children[3];
children[0] = 2;
children[1] = 12;
children[2] = 1;

If you list fewer values than there are indexed variables, those values will be used to initialize the first few indexed variables, and the remaining indexed variables will be initialized to a 0 of the array base type. In this situation, indexed variables not provided with initializers are initialized to 0. However, arrays with no initializers and other variables declared within a function definition, including the main function of a program, are not initialized. Although array indexed variables (and other variables) may sometimes be automatically initialized to 0, you cannot and should not count on it.

If you initialize an array when it is declared, you can omit the size of the array, and the array will automatically be declared to have the minimum size needed for the initialization values. For example, the following declaration

int b[ ] = {5, 12, 11};

is equivalent to

int b[3] = {5, 12, 11};

Programming Tip C++11 Range-Based for Statement

C++11 includes a new type of for loop, the range-based for loop, that simplifies iteration over every element in an array. The syntax is shown below:

for (datatype varname : array)
{
    // varname is successively set to each element in the array
}

For example:

int arr[] = {2, 4, 6, 8};
for (int x : arr)
    cout << x;
cout << endl;

This will output: 2468.

When defining the variable that will iterate through the array we can use the same modifiers that are available when defining a parameter for a function. The example we used above for variable x is equivalent to pass-by-value. If we change x inside the loop it doesn’t change the array. We could define x as pass-by-reference using & and then changes to x will be made to the array. We could also use const to indicate that the variable can’t be changed. The example below increments every element in the array and then outputs them. We used the auto datatype in the output loop to automatically determine the type of element inside the array.

int arr[] = {2, 4, 6, 8};
for (int& x : arr)
    x++;
for (auto x : arr)
    cout << x;
cout << endl;

This will output: 3579. The range-based for loop is especially convenient when iterating over vectors, which are introduced in Chapter 8, and when iterating over containers, which are discussed in Chapter 18.

Self-Test Exercises

Describe the difference in the meaning of int a[5] and the meaning of a[4]. What is the meaning of the [5] and [4] in each case?
In the array declaration
```
double score[5];
```
state the following:
1. The array name
2. The base type
3. The declared size of the array
4. The range of values that an index for this array can have
5. One of the indexed variables (or elements) of this array
Identify any errors in the following array declarations:
1. int x[4] = { 8, 7, 6, 4, 3 };
2. int x[ ] = { 8, 7, 6, 4 };
3. const int SIZE = 4;
4. int x[SIZE];

What is the output of the following code?

char symbol[3] = {'a', 'b', 'c'};
for (int index = 0; index < 3; index++)
    cout << symbol[index];

What is the output of the following code?

double a[3] = {1.1, 2.2, 3.3};
cout << a[0] << " " << a[1] << " " << a[2] << endl;
a[1] = a[2];
cout << a[0] << " " << a[1] << " " << a[2] << endl;

What is the output of the following code?

int i, temp[10];
for (i = 0; i < 10; i++)
    temp[i] = 2 * i;
for (i = 0; i < 10; i++)
    cout << temp[i] << " ";
cout << endl;
for (i = 0; i < 10; i = i + 2)
    cout << temp[i] << " ";

What is wrong with the following piece of code?

int sampleArray[10];
for (int index = 1; index <= 10; index++)
    sampleArray[index] = 3 * index;

Suppose we expect the elements of the array a to be ordered so that
```
a[0] ≤ a[1] ≤ a[2] ≤ ...
```
However, to be safe we want our program to test the array and issue a warning in case it turns out that some elements are out of order. The following code is supposed to output such a warning, but it contains a bug. What is it?
```
double a[10];
<Some code to fill the array a goes here.>
for (int index = 0; index < 10; index++)
    if (a[index] > a[index + 1])
        cout << "Array elements " << index << " and "
             << (index + 1) << " are out of order.";
```
Write some C++ code that will fill an array a with 20 values of type int read in from the keyboard. You need not write a full program, just the code to do this, but do give the declarations for the array and for all variables.
Suppose you have the following array declaration in your program:
```
int yourArray[7];
```
Also, suppose that in your implementation of C++, variables of type int use 2 bytes of memory. When you run your program, how much memory will this array consume? Suppose that when you run your program, the system assigns the memory address 1000 to the indexed variable yourArray[0]. What will be the address of the indexed variable yourArray[3]?