4.22 Multidimensional Arrays

The 80x86 hardware can easily handle single-dimensional arrays. Unfortunately, there is no magic addressing mode that lets you easily access elements of multidimensional arrays. That's going to take some work and several instructions.

Before discussing how to declare or access multidimensional arrays, it would be a good idea to figure out how to implement them in memory. The first problem is to figure out how to store a multidimensional object into a one-dimensional memory space.

Consider for a moment a Pascal array of the form A:array[0..3,0..3] of char;. This array contains 16 bytes organized as four rows of four characters. Somehow you've got to draw a correspondence with each of the 16 bytes in this array and 16 contiguous bytes in main memory. Figure 4-5 shows one way to do this.

Mapping a 4x4 array to sequential memory locations

Figure 4-5. Mapping a 4x4 array to sequential memory locations

The actual mapping is not important as long as two things occur: (1) Each element maps to a unique memory location (that is, no two entries in the array occupy the same memory locations), and (2) the mapping is consistent. That is, a given element in the array always maps to the same memory location. So what you really need is a function with two input parameters (row and column) that produces an offset into a linear array of 16 memory locations.

Now any function that satisfies the above constraints will work fine. Indeed, you could randomly choose a mapping as long as it was consistent. However, what you really want is a mapping that is efficient to compute at runtime and works for any size array (not just 4x4 or even limited to two dimensions). While a large number of possible functions fit this bill, there are two functions in particular that most programmers and high-level languages use: row-major ordering and column-major ordering.

Row-major ordering assigns successive elements, moving across the rows and then down the columns, to successive memory locations. This mapping is demonstrated in Figure 4-6.

Row-major ordering is the method most high-level programming languages employ. It is very easy to implement and use in machine language. You start with the first row (row 0) and then concatenate the second row to its end. You then concatenate the third row to the end of the list, then the fourth row, and so on (see Figure 4-7).

The actual function that converts a list of index values into an offset is a slight modification of the formula for computing the address of an element of a single-dimensional array. The formula to compute the offset for a two-dimensional row-major ordered array is:

Element_Address = Base_Address + (colindex * row_size + rowindex) *
 Element_Size

As usual, Base_Address is the address of the first element of the array (A[0][0] in this case), and Element_Size is the size of an individual element of the array, in bytes. colindex is the leftmost index, and rowindex is the rightmost index into the array. row_size is the number of elements in one row of the array (four, in this case, because each row has four elements). Assuming Element_Size is 1, this formula computes the following offsets from the base address:

Column          Row             Offset
     Index           Index           into Array
     0               0               0
     0               1               1
     0               2               2
     0               3               3
     1               0               4
     1               1               5
     1               2               6
     1               3               7
     2               0               8
     2               1               9
     2               2               10
     2               3               11
     3               0               12
     3               1               13
     3               2               14
     3               3               15

For a three-dimensional array, the formula to compute the offset into memory is the following:

Address = Base + ((depthindex*col_size+colindex) * row_size + rowindex)
 * Element_Size

col_size is the number of items in a column, and row_size is the number of items in a row. In C/C++, if you've declared the array as type A[i] [j] [k];, then row_size is equal to k and col_size is equal to j.

For a four-dimensional array, declared in C/C++ as type A[i] [j] [k] [m];, the formula for computing the address of an array element is:

Address =
Base + (((LeftIndex*depth_size+depthindex)*col_size+colindex) * row_size
 + rowindex) * Element_Size

depth_size is equal to j, col_size is equal to k, and row_size is equal to m. LeftIndex represents the value of the leftmost index.

By now you're probably beginning to see a pattern. There is a generic formula that will compute the offset into memory for an array with any number of dimensions; however, you'll rarely use more than four.

Another convenient way to think of row-major arrays is as arrays of arrays. Consider the following single-dimensional Pascal array definition:

A: array [0..3] of sometype;

Assume that sometype is the type sometype = array [0..3] of char;.

A is a single-dimensional array. Its individual elements happen to be arrays, but you can safely ignore that for the time being. The formula to compute the address of an element of a single-dimensional array is:

Element_Address = Base + Index * Element_Size

In this case Element_Size happens to be 4 because each element of A is an array of four characters. So what does this formula compute? It computes the base address of each row in this 4x4 array of characters (see Figure 4-8).

Of course, once you compute the base address of a row, you can reapply the single-dimensional formula to get the address of a particular element. While this doesn't affect the computation, it's probably a little easier to deal with several single-dimensional computations rather than a complex multidimensional array computation.

Consider a Pascal array defined as A:array [0..3] [0..3] [0..3] [0..3] [0..3] of char;. You can view this five-dimensional array as a single-dimensional array of arrays. The following HLA code provides such a definition:

type
          OneD: char[4];
          TwoD: OneD[4];
          ThreeD: TwoD[4];
          FourD: ThreeD [4];
var
          A : FourD [4];

The size of OneD is 4 bytes. Because TwoD contains four OneD arrays, its size is 16 bytes. Likewise, ThreeD is four TwoDs, so it is 64 bytes long. Finally, FourD is four ThreeDs, so it is 256 bytes long. To compute the address of A [b, c, d, e, f], you could use the following steps:

  1. Compute the address of A [b] as Base + b * size. Here size is 256 bytes. Use this result as the new base address in the next computation.

  2. Compute the address of A [b, c] by the formula Base + c * size, where Base is the value obtained in the previous step and size is 64. Use the result as the new base in the next computation.

  3. Compute the base address of A [b, c, d] by Base + d * size, with Base coming from the previous computation and size is 16. Use the result as the new base in the next computation.

  4. Compute the address of A [b, c, d, e] with the formula Base + e * size, with Base from the previous step with a size of 4. Use this value as the base for the next computation.

  5. Finally, compute the address of A [b, c, d, e, f] using the formula Base + f * size, where Base comes from the previous computation and size is 1 (obviously you can simply ignore this final multiplication). The result you obtain at this point is the address of the desired element.

One of the main reasons you won't find higher-dimensional arrays in assembly language is that assembly language emphasizes the inefficiencies associated with such access. It's easy to enter something like A [b, c, d, e, f] into a Pascal program, not realizing what the compiler is doing with the code. Assembly language programmers are not so cavalier—they see the mess you wind up with when you use higher-dimensional arrays. Indeed, good assembly language programmers try to avoid two-dimensional arrays and often resort to tricks in order to access data in such an array when its use becomes absolutely mandatory.

Column-major ordering is the other function high-level languages frequently used to compute the address of an array element. FORTRAN and various dialects of BASIC (e.g., older versions of Microsoft BASIC) use this method.

In row-major ordering the rightmost index increases the fastest as you move through consecutive memory locations. In column-major ordering the leftmost index increases the fastest. Pictorially, a column-major ordered array is organized as shown in Figure 4-9.

The formula for computing the address of an array element when using column-major ordering is very similar to that for row-major ordering. You simply reverse the indexes and sizes in the computation:

For a two-dimension column-major array:
Element_Address = Base_Address + (rowindex * col_size + colindex) *
                  Element_Size

For a three-dimension column-major array:
Address = Base + ((rowindex * col_size+colindex) * depth_size +
 depthindex) *
          Element_Size

For a four-dimension column-major array:
Address =
     Base + (((rowindex * col_size + colindex)*depth_size + depthindex) *
          Left_size + Leftindex) * Element_Size