5.14 The Standard Entry Sequence

The caller of a procedure is responsible for pushing the parameters onto the stack. Of course, the call instruction pushes the return address onto the stack. It is the procedure's responsibility to construct the rest of the activation record. You can accomplish this by using the following "standard entry sequence" code:

push( ebp );         // Save a copy of the old ebp value.
 mov( esp, ebp );     // Get pointer to base of activation record into ebp.
 sub( NumVars, esp ); // Allocate storage for local variables.

If the procedure doesn't have any local variables, the third instruction above, sub( NumVars, esp );, isn't necessary. NumVars represents the number of bytes of local variables needed by the procedure. This is a constant that should be a multiple of 4 (so the ESP register remains aligned on a double-word boundary). If the number of bytes of local variables in the procedure is not a multiple of 4, you should round the value up to the next higher multiple of 4 before subtracting this constant from ESP. Doing so will slightly increase the amount of storage the procedure uses for local variables but will not otherwise affect the operation of the procedure.

Warning

If the NumVars constant is not a multiple of 4, subtracting this value from ESP (which, presumably, contains a double-word-aligned pointer) will virtually guarantee that all future stack accesses are misaligned because the program almost always pushes and pops double-word values. This will have a very negative performance impact on the program. Worse still, many OS API calls will fail if the stack is not double-word aligned upon entry into the operating system. Therefore, you must always ensure that your local variable allocation value is a multiple of 4.

Because of the problems with a misaligned stack, by default HLA will also emit a fourth instruction as part of the standard entry sequence. The HLA compiler actually emits the following standard entry sequence for the ARDemo procedure defined earlier:

push( ebp );
          mov( esp, ebp );
          sub( 12, esp );          // Make room for ARDemo's local variables.
          and( $FFFF_FFFC, esp );  // Force dword stack alignment.

The and instruction at the end of this sequence forces the stack to be aligned on a 4-byte boundary (it reduces the value in the stack pointer by 1, 2, or 3 if the value in ESP is not a multiple of 4). Although the ARDemo entry code correctly subtracts 12 from ESP for the local variables (12 is both a multiple of 4 and the number of bytes of local variables), this leaves ESP double-word aligned only if it was double-word aligned immediately upon entry into the procedure. Had the caller messed with the stack and left ESP containing a value that was not a multiple of 4, subtracting 12 from ESP would leave ESP containing an unaligned value. The and instruction in the sequence above, however, guarantees that ESP is dword aligned regardless of ESP's value upon entry into the procedure. The few bytes and CPU cycles needed to execute this instruction would pay off handsomely if ESP was not double-word aligned.

Although it is always safe to execute the and instruction in the standard entry sequence, it might not be necessary. If you always ensure that ESP contains a double-word-aligned value, the and instruction in the standard entry sequence above is unnecessary. Therefore, if you've specified the @noframe procedure option, you don't have to include that instruction as part of the entry sequence.

If you haven't specified the @noframe option (that is, you're letting HLA emit the instructions to construct the standard entry sequence for you), you can still tell HLA not to emit the extra and instruction if you're sure the stack will be double-word aligned whenever someone calls the procedure. To do this, use the @noalignstack procedure option. For example:

procedure NASDemo( i:uns32; j:int32; k:dword ); @noalignstack;
var
     LocalVar:int32;
begin NASDemo;
     .
     .
     .
end NASDemo;

HLA emits the following entry sequence for the procedure above:

push( ebp );
          mov( esp, ebp );
          sub( 4, esp );