In order to write fast programs, you need to ensure that you properly align data objects in memory. Proper alignment means that the starting address for an object is a multiple of some size, usually the size of an object if the object's size is a power of 2 for values up to 16 bytes in length. For objects greater than 16 bytes, aligning the object on an 8-byte or 16-byte address boundary is probably sufficient. For objects less than 16 bytes, aligning the object at an address that is the next power of 2 greater than the object's size is usually fine. Accessing data that is not aligned at an appropriate address may require extra time; so if you want to ensure that your program runs as rapidly as possible, you should try to align data objects according to their size.
Consider the following HLA variable declarations:
static dw: dword; b: byte; w: word; dw2: dword; w2: word; b2: byte; dw3: dword;
The first static
declaration in a program (running under Windows, Mac OS X, FreeBSD, Linux, and most 32-bit operating systems) places its variables at an address that is an even multiple of 4,096 bytes. Whatever variable first appears in the static
declaration is guaranteed to be aligned on a reasonable address. Each successive variable is allocated at an address that is the sum of the sizes of all the preceding variables plus the starting address of that static
section. Therefore, assuming HLA allocates the variables in the previous example at a starting address of 4096
, HLA will allocate them at the following addresses:
// Start Adrs Length dw: dword; // 4096 4 b: byte; // 4100 1 w: word; // 4101 2 dw2: dword; // 4103 4 w2: word; // 4107 2 b2: byte; // 4109 1 dw3: dword; // 4110 4
With the exception of the first variable (which is aligned on a 4KB boundary) and the byte variables (whose alignment doesn't matter), all of these variables are misaligned. The w
, w2
, and dw2
variables start at odd addresses, and the dw3
variable is aligned on an even address that is not a multiple of 4.
An easy way to guarantee that your variables are aligned properly is to put all the double-word variables first, the word variables second, and the byte variables last in the declaration, as shown here:
static dw: dword; dw2: dword; dw3: dword; w: word; w2: word; b: byte; b2: byte;
This organization produces the following addresses in memory:
// Start Adrs Length dw: dword; // 4096 4 dw2: dword; // 4100 4 dw3: dword; // 4104 4 w: word; // 4108 2 w2: word; // 4110 2 b: byte; // 4112 1 b2: byte; // 4113 1
As you can see, these variables are all aligned at reasonable addresses.
Unfortunately, it is rarely possible for you to arrange your variables in this manner. While there are many technical reasons that make this alignment impossible, a good practical reason for not doing this is that it doesn't let you organize your variable declarations by logical function (that is, you probably want to keep related variables next to one another regardless of their size).
To resolve this problem, HLA provides the align
directive. The align
directive uses the following syntax:
align( integer_constant
);
The integer constant must be one of the following small unsigned integer values: 1, 2, 4, 8, or 16. If HLA encounters the align
directive in a static
section, it will align the very next variable on an address that is an even multiple of the specified alignment constant. The previous example could be rewritten, using the align
directive, as follows:
static align( 4 ); dw: dword; b: byte; align( 2 ); w: word; align( 4 ); dw2: dword; w2: word; b2: byte; align( 4 ); dw3: dword;
If you're wondering how the align
directive works, it's really quite simple. If HLA determines that the current address (location counter value) is not an even multiple of the specified value, HLA will quietly emit extra bytes of padding after the previous variable declaration until the current address in the static
section is an even multiple of the specified value. This has the effect of making your program slightly larger (by a few bytes) in exchange for faster access to your data. Given that your program will grow by only a few bytes when you use this feature, this is probably a good trade-off.