3.5 Address Expressions

Earlier, this chapter points out that addressing modes take a couple generic forms, including the following:

VarName[ Reg32 ]
VarName[ Reg32 + offset ]
VarName[ RegNotESP32*scale ]
VarName[ Reg32 + RegNotESP32*scale ]
VarName[ RegNotESP32*scale + offset ]
VarName[ Reg32 + RegNotESP32*scale + offset ]

Another legal form, which isn't actually a new addressing mode but simply an extension of the displacement-only addressing mode, is:

VarName[ offset ]

This latter example computes its effective address by adding the constant offset within the brackets to the variable's address. For example, the instruction mov(Address[3], al); loads the AL register with the byte in memory that is 3 bytes beyond the Address object (see Figure 3-8).

Always remember that the offset value in these examples must be a constant. If Index is an int32 variable, then Variable[Index] is not a legal address expression. If you wish to specify an index that varies at runtime, then you must use one of the indexed or scaled-indexed addressing modes.

Another important thing to remember is that the offset in Address[offset] is a byte address. Despite the fact that this syntax is reminiscent of array indexing in a high-level language like C/C++ or Pascal, this does not properly index into an array of objects unless Address is an array of bytes.

Figure 3-8. Using an address expression to access data beyond a variable

This text will consider an address expression to be any legal 80x86 addressing mode that includes a displacement (i.e., variable name) or an offset. In addition to the above forms, the following are also address expressions:

[ Reg32 + offset ]
[ Reg32 + RegNotESP32*scale + offset ]

This book will not consider the following to be address expressions because they do not involve a displacement or offset component:

[ Reg32 ]
[ Reg32 + RegNotESP32
*scale ]

Address expressions are special because those instructions containing an address expression always encode a displacement constant as part of the machine instruction. That is, the machine instruction contains some number of bits (usually 8 or 32) that hold a numeric constant. That constant is the sum of the displacement (i.e., the address or offset of the variable) plus the offset. Note that HLA automatically adds these two values together for you (or subtracts the offset if you use the − rather than + operator in the addressing mode).

Until this point, the offset in all the addressing mode examples has always been a single numeric constant. However, HLA also allows a constant expression anywhere an offset is legal. A constant expression consists of one or more constant terms manipulated by operators such as addition, subtraction, multiplication, division, modulo, and a wide variety of others. Most address expressions, however, will involve only addition, subtraction, multiplication, and sometimes division. Consider the following example:

mov( X[ 2*4+1 ], al );

This instruction will move the byte at address X+9 into the AL register.

The value of an address expression is always computed at compile time, never while the program is running. When HLA encounters the instruction above, it calculates 2 * 4 + 1 on the spot and adds this result to the base address of X in memory. HLA encodes this single sum (base address of X plus 9) as part of the instruction; HLA does not emit extra instructions to compute this sum for you at runtime (which is good, because doing so would be less efficient). Because HLA computes the value of address expressions at compile time, all components of the expression must be constants because HLA cannot know the runtime value of a variable while it is compiling the program.

Address expressions are useful for accessing the data in memory beyond a variable, particularly when you've used the byte, word, dword, and so on statements in a static or readonly section to tack on additional bytes after a data declaration. For example, consider the program in Example 3-1.

Example 3-1. Demonstration of address expressions

program adrsExpressions;
#include( "stdlib.hhf" )
static
  i: int8; @nostorage;
     byte 0, 1, 2, 3;

begin adrsExpressions;

  stdout.put
  (
    "i[0]=", i[0], nl,
    "i[1]=", i[1], nl,
    "i[2]=", i[2], nl,
    "i[3]=", i[3], nl
  );

end adrsExpressions;

The program in Example 3-1 will display the four values 0, 1, 2, and 3 as though they were array elements. This is because the value at the address of i is 0 (this program declares i using the @nostorage option, so i is the address of the next object in the static section, which just happens to be the value 0 appearing as part of the byte statement). The address expression i[1] tells HLA to fetch the byte appearing at i's address plus 1. This is the value 1, because the byte statement in this program emits the value 1 to the static segment immediately after the value 0. Likewise for i[2] and i[3], this program displays the values 2 and 3.