The 80x86 hardware can easily handle single-dimensional arrays. Unfortunately, there is no magic addressing mode that lets you easily access elements of multidimensional arrays. That's going to take some work and several instructions.
Before discussing how to declare or access multidimensional arrays, it would be a good idea to figure out how to implement them in memory. The first problem is to figure out how to store a multidimensional object into a one-dimensional memory space.
Consider for a moment a Pascal array of the form A:array[0..3,0..3] of char;
. This array contains 16 bytes organized as four rows of four characters. Somehow you've got to draw a correspondence with each of the 16 bytes in this array and 16 contiguous bytes in main memory. Figure 4-5 shows one way to do this.
The actual mapping is not important as long as two things occur: (1) Each element maps to a unique memory location (that is, no two entries in the array occupy the same memory locations), and (2) the mapping is consistent. That is, a given element in the array always maps to the same memory location. So what you really need is a function with two input parameters (row and column) that produces an offset into a linear array of 16 memory locations.
Now any function that satisfies the above constraints will work fine. Indeed, you could randomly choose a mapping as long as it was consistent. However, what you really want is a mapping that is efficient to compute at runtime and works for any size array (not just 4x4 or even limited to two dimensions). While a large number of possible functions fit this bill, there are two functions in particular that most programmers and high-level languages use: row-major ordering and column-major ordering.
Row-major ordering assigns successive elements, moving across the rows and then down the columns, to successive memory locations. This mapping is demonstrated in Figure 4-6.
Row-major ordering is the method most high-level programming languages employ. It is very easy to implement and use in machine language. You start with the first row (row 0) and then concatenate the second row to its end. You then concatenate the third row to the end of the list, then the fourth row, and so on (see Figure 4-7).
The actual function that converts a list of index values into an offset is a slight modification of the formula for computing the address of an element of a single-dimensional array. The formula to compute the offset for a two-dimensional row-major ordered array is:
Element_Address
=Base_Address
+ (colindex
*row_size
+rowindex
) *Element_Size
As usual, Base_Address
is the address of the first element of the array (A[0][0]
in this case), and Element_Size
is the size of an individual element of the array, in bytes. colindex
is the leftmost index, and rowindex
is the rightmost index into the array. row_size
is the number of elements in one row of the array (four, in this case, because each row has four elements). Assuming Element_Size
is 1, this formula computes the following offsets from the base address:
Column Row Offset Index Index into Array 0 0 0 0 1 1 0 2 2 0 3 3 1 0 4 1 1 5 1 2 6 1 3 7 2 0 8 2 1 9 2 2 10 2 3 11 3 0 12 3 1 13 3 2 14 3 3 15
For a three-dimensional array, the formula to compute the offset into memory is the following:
Address
=Base
+ ((depthindex
*col_size
+colindex
) *row_size
+rowindex
) *Element_Size
col_size
is the number of items in a column, and row_size
is the number of items in a row. In C/C++, if you've declared the array as type
A[i] [j] [k];
, then row_size
is equal to k
and col_size
is equal to j
.
For a four-dimensional array, declared in C/C++ as type
A[i] [j] [k] [m];
, the formula for computing the address of an array element is:
Address
=Base
+ (((LeftIndex
*depth_size
+depthindex
)*col_size
+colindex
) *row_size
+rowindex
) *Element_Size
depth_size
is equal to j
, col_size
is equal to k
, and row_size
is equal to m
. LeftIndex
represents the value of the leftmost index.
By now you're probably beginning to see a pattern. There is a generic formula that will compute the offset into memory for an array with any number of dimensions; however, you'll rarely use more than four.
Another convenient way to think of row-major arrays is as arrays of arrays. Consider the following single-dimensional Pascal array definition:
A: array [0..3] of sometype
;
Assume that sometype
is the type sometype
= array [0..3] of char;
.
A
is a single-dimensional array. Its individual elements happen to be arrays, but you can safely ignore that for the time being. The formula to compute the address of an element of a single-dimensional array is:
Element_Address
=Base
+Index
*Element_Size
In this case Element_Size
happens to be 4 because each element of A
is an array of four characters. So what does this formula compute? It computes the base address of each row in this 4x4 array of characters (see Figure 4-8).
Of course, once you compute the base address of a row, you can reapply the single-dimensional formula to get the address of a particular element. While this doesn't affect the computation, it's probably a little easier to deal with several single-dimensional computations rather than a complex multidimensional array computation.
Consider a Pascal array defined as A:array [0..3] [0..3] [0..3] [0..3] [0..3] of char;
. You can view this five-dimensional array as a single-dimensional array of arrays. The following HLA code provides such a definition:
type OneD: char[4]; TwoD: OneD[4]; ThreeD: TwoD[4]; FourD: ThreeD [4]; var A : FourD [4];
The size of OneD
is 4 bytes. Because TwoD
contains four OneD
arrays, its size is 16 bytes. Likewise, ThreeD
is four TwoD
s, so it is 64 bytes long. Finally, FourD
is four ThreeDs
, so it is 256 bytes long. To compute the address of A [b, c, d, e, f]
, you could use the following steps:
Compute the address of A [b]
as Base
+ b *
size
. Here size is 256 bytes. Use this result as the new base address in the next computation.
Compute the address of A [b, c]
by the formula Base
+ c *
size
, where Base
is the value obtained in the previous step and size
is 64. Use the result as the new base in the next computation.
Compute the base address of A [b, c, d]
by Base
+ d *
size
, with Base
coming from the previous computation and size
is 16. Use the result as the new base in the next computation.
Compute the address of A [b, c, d, e]
with the formula Base
+ e *
size
, with Base
from the previous step with a size of 4. Use this value as the base for the next computation.
Finally, compute the address of A [b, c, d, e, f]
using the formula Base
+ f *
size
, where Base
comes from the previous computation and size
is 1 (obviously you can simply ignore this final multiplication). The result you obtain at this point is the address of the desired element.
One of the main reasons you won't find higher-dimensional arrays in assembly language is that assembly language emphasizes the inefficiencies associated with such access. It's easy to enter something like A [b, c, d, e, f]
into a Pascal program, not realizing what the compiler is doing with the code. Assembly language programmers are not so cavalier—they see the mess you wind up with when you use higher-dimensional arrays. Indeed, good assembly language programmers try to avoid two-dimensional arrays and often resort to tricks in order to access data in such an array when its use becomes absolutely mandatory.
Column-major ordering is the other function high-level languages frequently used to compute the address of an array element. FORTRAN and various dialects of BASIC (e.g., older versions of Microsoft BASIC) use this method.
In row-major ordering the rightmost index increases the fastest as you move through consecutive memory locations. In column-major ordering the leftmost index increases the fastest. Pictorially, a column-major ordered array is organized as shown in Figure 4-9.
The formula for computing the address of an array element when using column-major ordering is very similar to that for row-major ordering. You simply reverse the indexes and sizes in the computation:
For a two-dimension column-major array:Element_Address
=Base_Address
+ (rowindex
*col_size
+colindex
) *Element_Size
For a three-dimension column-major array:Address
=Base
+ ((rowindex
*col_size
+colindex
) *depth_size
+depthindex
) *Element_Size
For a four-dimension column-major array:Address
=Base
+ (((rowindex
*col_size
+colindex
)*depth_size
+depthindex
) *Left_size
+Leftindex
) *Element_Size