In a high-level object-oriented language like C++ or Delphi, it is quite possible to master the use of objects without really understanding how the machine implements them. One of the reasons for learning assembly language programming is to fully comprehend low-level implementation details so you can make educated decisions concerning the use of programming constructs like objects. Further, because assembly language allows you to poke around with data structures at a very low level, knowing how HLA implements objects can help you create certain algorithms that would not be possible without a detailed knowledge of object implementation. Therefore, this section and its corresponding subsections explain the low-level implementation details you will need to know in order to write object-oriented HLA programs.
HLA implements objects in a manner quite similar to records. In particular, HLA allocates storage for all var
objects in a class in a sequential fashion, just like records. Indeed, if a class consists of only var
data fields, the memory representation of that class is nearly identical to that of a corresponding record
declaration. Consider the student
record declaration taken from Chapter 4 and the corresponding class (see Figure 12-1 and Figure 12-2, respectively).
type student: record Name: char[65]; Major: int16; SSN: char[12]; Midterm1: int16; Midterm2: int16; Final: int16; Homework: int16; Projects: int16; endrecord; student2: class var Name: char[65]; Major: int16; SSN: char[12]; Midterm1: int16; Midterm2: int16; Final: int16; Homework: int16; Projects: int16; endclass;
If you look carefully at Figure 12-1 and Figure 12-2, you'll discover that the only difference between the class and the record implementations is the inclusion of the VMT
(virtual method table) pointer field at the beginning of the class object. This field, which is always present in a class, contains the address of the class's virtual method table that, in turn, contains the addresses of all the class's methods and iterators. The VMT
field, by the way, is present even if a class doesn't contain any methods or iterators.
As pointed out in previous sections, HLA does not allocate storage for static
objects within the object. Instead, HLA allocates a single instance of each static
data field that all objects share. As an example, consider the following class and object declarations:
type tHasStatic: class var i:int32; j:int32; r:real32; static c:char[2]; b:byte; endclass; var hs1: tHasStatic; hs2: tHasStatic;
Figure 12-3 shows the storage allocation for these two objects in memory.
Of course, const
, val
, and #macro
objects do not have any runtime memory requirements associated with them, so HLA does not allocate any storage for these fields. Like the static
data fields, you may access const
, val
, and #macro
fields using the class name as well as an object name. Hence, even if tHasStatic
has these types of fields, the memory organization for tHasStatic
objects would still be the same as shown in Figure 12-3.
Other than the presence of the virtual method table (VMT
) pointer, the presence of methods and procedures has no impact on the storage allocation of an object. Of course, the machine instructions associated with these routines do appear somewhere in memory. So in a sense the code for the routines is quite similar to static
data fields insofar as all the objects share a single instance of the routine.
When HLA calls a class procedure, it directly calls that procedure using a call
instruction, just like any normal procedure call. Methods are another story altogether. Each object in the system carries a pointer to a virtual method table, which is an array of pointers to all the methods and iterators appearing within the object's class (see Figure 12-4).
Each iterator or method you declare in a class has a corresponding entry in the virtual method table. That double-word entry contains the address of the first instruction of that iterator or method. Calling a class method or iterator is a bit more work than calling a class procedure (it requires one additional instruction plus the use of the EDI register). Here is a typical calling sequence for a method:
mov( ObjectAdrs
, ESI ); // All class routines do this.
mov( [esi], edi ); // Get the address of the VMT into edi
call( (type dword [edi+n])); // "n" is the offset of the method's
// entry in the VMT.
For a given class there is only one copy of the virtual method table in memory. This is a static object, so all objects of a given class type share the same virtual method table. This is reasonable because all objects of the same class type have exactly the same methods and iterators (see Figure 12-5).
Although HLA builds the VMT
record structure as it encounters methods and iterators within a class, HLA does not automatically create the virtual method table for you. You must explicitly declare this table in your program. To do this, you include a statement like the following in a static
or readonly
declaration section of your program. For example:
readonly
VMT( classname
);
Because the addresses in a virtual method table should never change during program execution, the readonly
section is probably the best choice for declaring virtual method tables. It should go without saying that changing the pointers in a virtual method table is, in general, a really bad idea. So putting VMT
s in a static
section is usually not a good idea.
A declaration like the one above defines the variable classname
._VMT_
. In 12.9 Constructors and Object Initialization, you will see that you need this name when initializing object variables. The class declaration automatically defines the classname
._VMT_
symbol as an external static variable. The declaration above just provides the actual definition for this external symbol.
The declaration of a VMT
uses a somewhat strange syntax because you aren't actually declaring a new symbol with this declaration; you're simply supplying the data for a symbol that you previously declared implicitly by defining a class. That is, the class declaration defines the static table variable classname
._VMT_
; all you're doing with the VMT
declaration is telling HLA to emit the actual data for the table. If, for some reason, you would like to refer to this table using a name other than classname
._VMT_
, HLA does allow you to prefix the declaration above with a variable name. For example:
readonly
myVMT: VMT( classname
);
In this declaration, myVMT
is an alias of classname
._VMT_
. As a general rule, you should avoid using aliases in a program because they make the program more difficult to read and understand. Therefore, it is unlikely that you would ever need to use this type of declaration.
As with any other global static variable, there should be only one instance of a virtual method table for a given class in a program. The best place to put the VMT
declaration is in the same source file as the class's method, iterator, and procedure code (assuming they all appear in a single file). This way you will automatically link in the virtual method table whenever you link in the routines for a given class.
Up to this point, the discussion of the implementation of class objects has ignored the possibility of inheritance. Inheritance affects the memory representation of an object only by adding fields that are not explicitly stated in the class declaration.
Adding inherited fields from a base class to another class must be done carefully. Remember, an important attribute of a class that inherits fields from a base class is that you can use a pointer to the base class to access the inherited fields from that base class, even if the pointer contains the address of some other class (that inherits the fields from the base class). As an example, consider the following classes:
type tBaseClass: class var i:uns32; j:uns32; r:real32; method mBase; endclass; tChildClassA: class inherits( tBaseClass ) var c:char; b:boolean; w:word; method mA; endclass; tChildClassB: class inherits( tBaseClass ) var d:dword; c:char; a:byte[3]; endclass;
Because both tChildClassA
and tChildClassB
inherit the fields of tBaseClass
, these two child classes include the i
, j
, and r
fields as well as their own specific fields. Furthermore, whenever you have a pointer variable whose base type is tBaseClass
, it is legal to load this pointer with the address of any child class of tBaseClass
; therefore, it is perfectly reasonable to load such a pointer with the address of a tChildClassA
or tChildClassB
variable. For example:
var B1: tBaseClass; CA: tChildClassA; CB: tChildClassB; ptr: pointer to tBaseClass; . . . lea( ebx, B1 ); mov( ebx, ptr ); << Use ptr >> . . . lea( eax, CA ); mov( ebx, ptr ); << Use ptr >> . . . lea( eax, CB ); mov( eax, ptr ); << Use ptr >>
Because ptr
points at an object of type tBaseClass
, you may legally (from a semantic sense) access the i
, j
, and r
fields of the object where ptr
is pointing. It is not legal to access the c
, b
, w
, or d
field of the tChildClassA
or tChildClassB
objects because at any one given moment the program may not know exactly what object type ptr
references.
In order for inheritance to work properly, the i
, j
, and r
fields must appear at the same offsets in all child classes as they do in tBaseClass
. This way, an instruction of the form mov((type tBaseClass [ebx]).i, eax)
; will correctly access the i
field even if EBX points at an object of type tChildClassA
or tChildClassB
. Figure 12-6 shows the layout of the child and base classes.
Note that the new fields in the two child classes bear no relation to one another, even if they have the same name (for example, the c
fields in the two child classes do not lie at the same offset). Although the two child classes share the fields they inherit from their common base class, any new fields they add are unique and separate. Two fields in different classes share the same offset only by coincidence if those fields are not inherited from a common base class.
All classes (even those that aren't related to one another) place the pointer to the virtual method table at offset 0 within the object. There is a single virtual method table associated with each class in a program; even classes that inherit fields from some base class have a virtual method table that is (generally) different than the base class's table. Figure 12-7 shows how objects of type tBaseClass
, tChildClassA
, and tChildClassB
point at their specific virtual method tables.
A virtual method table is nothing more than an array of pointers to the methods and iterators associated with a class. The address of the first method or iterator that appears in a class is at offset 0, the address of the second appears at offset 4, and so on. You can determine the offset value for a given iterator or method by using the @offset
function. If you want to call a method directly (using 80x86 syntax rather than HLA's high-level syntax), you could use code like the following:
var sc: tBaseClass; . . . lea( esi, sc ); // Get the address of the object (& VMT). mov( [esi], edi ); // Put address of VMT into edi. call( (type dword [edi+@offset( tBaseClass.mBase )] );
Of course, if the method has any parameters, you must push them onto the stack before executing the code above. Don't forget when making direct calls to a method, you must load ESI with the address of the object. Any field references within the method will probably depend on ESI containing this address. The choice of EDI to contain the VMT
address is nearly arbitrary. Unless you're doing something tricky (like using EDI to obtain runtime type information), you could use any register you please here. As a general rule, you should use EDI when simulating class method calls because this is the convention that HLA employs, and most programmers will expect this usage.
Whenever a child class inherits fields from some base class, the child class's virtual method table also inherits entries from the base class's table. For example, the virtual method table for class tBaseClass
contains only a single entry—a pointer to method tBaseClass.mBase
. The virtual method table for class tChildClassA
contains two entries: a pointer to tBaseClass.mBase
and tChildClassA.mA
. Because tChildClassB
doesn't define any new methods or iterators, tChildClassB
's virtual method table contains only a single entry, a pointer to the tBaseClass.mBase
method. Note that tChildClassB
's virtual method table is identical to tBaseclass
's table. Nevertheless, HLA produces two distinct virtual method tables. This is a critical fact that we will make use of a little later. Figure 12-8 shows the relationship between these virtual method tables.
Although the virtual method table pointer always appears at offset 0 in an object (and, therefore, you can access the pointer using the address expression [ESI]
if ESI points at an object), HLA actually inserts a symbol into the symbol table so you may refer to the virtual method table pointer symbolically. The symbol _pVMT_
(pointer to virtual method table) provides this capability. So a more readable way to access the pointer (as in the previous code example) is:
lea( esi, sc ); mov( (type tBaseClass [esi])._pVMT_, edi ); call( (type dword [edi+@offset( tBaseClass.mBase )] );
If you need to access the virtual method table directly, there are a couple of ways to do this. Whenever you declare a class object, HLA automatically includes a field named _VMT_
as part of that class. _VMT_
is a static array of double-word objects. Therefore, you may refer to the virtual method table using an identifier of the form classname
._VMT_
. Generally, you shouldn't access the virtual method table directly, but as you'll see shortly, there are some good reasons why you need to know the address of this object in memory.