12.9 Constructors and Object Initialization

If you've tried to get a little ahead of the game and write a program that uses objects prior to this point, you've probably discovered that the program inexplicably crashes whenever you attempt to run it. We've covered a lot of material in this chapter thus far, but you are still missing one crucial piece of information—how to properly initialize objects prior to use. This section will put the final piece into the puzzle and allow you to begin writing programs that use classes.

Consider the following object declaration and code fragment:

var
     bc: tBaseClass;
          .
          .
          .
     bc.mBase();

Remember that variables you declare in the var section are uninitialized at runtime. Therefore, when the program containing these statements gets around to executing bc.mBase, it executes the three-statement sequence you've seen several times already:

lea( esi, bc);
     mov( [esi], edi );
     call( (type dword [edi+@offset( tBaseClass.mBase )] );

The problem with this sequence is that it loads EDI with an undefined value assuming you haven't previously initialized the bc object. Because EDI contains a garbage value, attempting to call a subroutine at address [EDI+@offset(tBaseClass.mBase)] will likely crash the system. Therefore, before using an object, you must initialize the _pVMT_ field with the address of that object's virtual method table. One easy way to do this is with the following statement:

mov( &tBaseClass._VMT_, bc._pVMT_ );

Always remember, before using an object, be sure to initialize the virtual method table pointer for that object.

Although you must initialize the virtual method table pointer for all objects you use, this may not be the only field you need to initialize in those objects. Each specific class may have its own application-specific initialization. Although the initialization may vary by class, you need to perform the same initialization on each object of a specific class that you use. If you ever create more than a single object from a given class, it is probably a good idea to create a procedure to do this initialization for you. This is such a common operation that object-oriented programmers have given these initialization procedures a special name: constructors.

Some object-oriented languages (e.g., C++) use a special syntax to declare a constructor. Others (e.g., Delphi) simply use existing procedure declarations to define a constructor. One advantage to employing a special syntax is that the language knows when you define a constructor and can automatically generate code to call that constructor for you (whenever you declare an object). Languages like Delphi require that you explicitly call the constructor; this can be a minor inconvenience and a source of defects in your programs. HLA does not use a special syntax to declare constructors: you define constructors using standard class procedures. Thus, you will need to explicitly call the constructors in your program; however, you'll see an easy method for automating this in 12.11 HLA's _initialize_ and _finalize_ Strings.

Perhaps the most important fact you must remember is that constructors must be class procedures. You must not define constructors as methods. The reason is quite simple: one of the tasks of the constructor is to initialize the pointer to the virtual method table, and you cannot call a class method or iterator until after you've initialized the VMT pointer. Because class procedures don't use the virtual method table, you can call a class procedure prior to initializing the VMT pointer for an object.

By convention, HLA programmers use the name create for the class constructor. There is no requirement that you use this name, but by doing so you will make your programs easier to read and follow by other programmers.

As you may recall, you can call a class procedure via an object reference or a class reference. For example, if clsProc is a class procedure of class tClass and Obj is an object of type tClass, then the following two class procedure invocations are both legal.

tClass.clsProc();
     Obj.clsProc();

There is a big difference between these two calls. The first one calls clsProc with ESI containing 0 (NULL), while the second invocation loads the address of Obj into ESI before the call. We can use this fact to determine within a method the particular calling mechanism.

As it turns out, most programs allocate objects dynamically using mem.alloc and refer to those objects indirectly using pointers. This adds one more step to the initialization process—allocating storage for the object. The constructor is the perfect place to allocate this storage. Because you probably won't need to allocate all objects dynamically, you'll need two types of constructors: one that allocates storage and then initializes the object, and another that simply initializes an object that already has storage.

Another constructor convention is to merge these two constructors into a single constructor and differentiate the type of constructor call by the value in ESI. On entry into the class's create procedure, the program checks the value in ESI to see if it contains NULL (0). If so, the constructor calls mem.alloc to allocate storage for the object and returns a pointer to the object in ESI. If ESI does not contain NULL upon entry into the procedure, then the constructor assumes that ESI points at a valid object and skips over the memory allocation statements. At the very least, a constructor initializes the pointer to the virtual method table; therefore, the minimalist constructor will look like the following:

procedure tBaseClass.create; @nodisplay;
begin create;

  if( ESI = 0 ) then

      push( eax );      // mem.alloc returns its result here, so save it.
      mem.alloc( @size( tBaseClass ));
      mov( eax, esi );  // Put pointer into esi.
      pop( eax );

  endif;

  // Initialize the pointer to the VMT:
  // Remember, "this" is shorthand for "(type tBaseClass [esi])".

  mov( &tBaseClass._VMT_, this._pVMT_ );

  // Other class initialization would go here.

end create;

After you write a constructor like the preceding, you choose an appropriate calling mechanism based on whether your object's storage is already allocated. For preallocated objects (such as those you've declared in var, static, or storage sections[138] or those you've previously allocated storage for via mem.alloc), you simply load the address of the object into ESI and call the constructor. For those objects you declare as a variable, this is very easy; just call the appropriate create constructor:

var
     bc0: tBaseClass;
     bcp: pointer to tBaseClass;
          .
          .
          .
     bc0.create();  // Initializes preallocated bc0 object.
          .
          .
          .
     // Allocate storage for bcp object.

     mem.alloc( @size( tBaseClass ));
     mov( eax, bcp );
          .
          .
          .
     bcp.create();  // Initializes preallocated bcp object.

Note that although bcp is a pointer to a tBaseClass object, the create method does not automatically allocate storage for this object. The program already allocated the storage earlier. Therefore, when the program calls bcp.create, it loads ESI with the address contained within bcp; because this is not NULL, the tBaseClass.create procedure does not allocate storage for a new object. By the way, the call to bcp.create emits the following sequence of machine instructions:

mov( bcp, esi );
     call tBaseClass.create;

Until now, the code examples for a class procedure call always began with an lea instruction. This is because all the examples to this point have used object variables rather than pointers to object variables. Remember, a class procedure (method) call passes the address of the object in the ESI register. For object variables HLA emits an lea instruction to obtain this address. For pointers to objects, however, the actual object address is the value of the pointer variable; therefore, to load the address of the object into ESI, HLA emits a mov instruction that copies the value of the pointer into the ESI register.

In the preceding example, the program preallocates the storage for an object prior to calling the object constructor. While there are several reasons for preallocating object storage (for example, you're creating a dynamic array of objects), you can achieve most simple object allocations like the one above by calling a standard create procedure (such as one that allocates storage for an object if ESI contains NULL). The following example demonstrates this:

var
     bcp2: pointer to tBaseClass;
          .
          .
          .
   tBaseClass.create(); // Calls create with esi=NULL.
   mov( esi, bcp2 );    // Save pointer to new class object in bcp2.

Remember, a call to a tBaseClass.create constructor returns a pointer to the new object in the ESI register. It is the caller's responsibility to save the pointer this function returns into the appropriate pointer variable; the constructor does not automatically do this for you. Likewise, it is the caller's responsibility to free the storage associated with this object when the application has finished using the object (see the discussion of destructors in 12.10 Destructors).

Constructors for derived (child) classes that inherit fields from a base class represent a special case. Each class must have its own constructor but needs the ability to call the base class constructor. This section explains the reasons for this and how to do it.

A derived class inherits the create procedure from its base class. However, you must override this procedure in a derived class because the derived class probably requires more storage than the base class, and therefore you will probably need to use a different call to mem.alloc to allocate storage for a dynamic object. Hence, it is very unusual for a derived class not to override the definition of the create procedure.

However, overriding a base class's create procedure has problems of its own. When you override the base class's create procedure, you take the full responsibility of initializing the (entire) object, including all the initialization required by the base class. At the very least, this involves putting duplicate code in the overridden procedure to handle the initialization usually done by the base class constructor. In addition to making your program larger (by duplicating code already present in the base class constructor), this also violates information-hiding principles because the derived class must be aware of all the fields in the base class (including those that are logically private to the base class). What we need here is the ability to call a base class's constructor from within the derived class's constructor and let that call do the lower-level initialization of the base class's fields. Fortunately, this is an easy thing to do in HLA.

Consider the following class declarations (which do things the hard way):

type
     tBase: class
          var
               i:uns32;
               j:int32;

          procedure create(); @returns( "esi" );
     endclass;

     tDerived: class inherits( tBase );
          var
               r: real64;
          override procedure create(); @returns( "esi" );
     endclass;

     procedure tBase.create; @nodisplay;
     begin create;

          if( esi = 0 ) then

               push( eax );
               mov( mem.alloc( @size( tBase )), esi );
               pop( eax );

          endif;
          mov( &tBase._VMT_, this._pVMT_ );
          mov( 0, this.i );
          mov( −1, this.j );

     end create;

     procedure tDerived.create; @nodisplay;
     begin create;

          if( esi = 0 ) then

               push( eax );
               mov( mem.alloc( @size( tDerived )), esi );
               pop( eax );

          endif;

          // Initialize the VMT pointer for this object:

          mov( &tDerived._VMT_, this._pVMT_ );

          // Initialize the "r" field of this particular object:

     fldz();
     fstp( this.r );

     // Duplicate the initialization required by tBase.create:

     mov( 0, this.i );
     mov( −1, this.j );

     end create;

Let's take a closer look at the tDerived.create procedure above. Like a conventional constructor, it begins by checking ESI and allocates storage for a new object if ESI contains NULL. Note that the size of a tDerived object includes the size required by the inherited fields, so this properly allocates the necessary storage for all fields in a tDerived object.

Next, the tDerived.create procedure initializes the VMT pointer field of the object. Remember, each class has its own virtual method table and, specifically, derived classes do not use the virtual method table of their base class. Therefore, this constructor must initialize the _pVMT_ field with the address of the tDerived virtual method table.

After initializing the virtual method table pointer, the tDerived constructor initializes the value of the r field to 0.0 (remember, fldz loads 0 onto the FPU stack). This concludes the tDerived-specific initialization.

The remaining instructions in tDerived.create are the problem. These statements duplicate some of the code appearing in the tBase.create procedure. The problem with code duplication becomes apparent when you decide to modify the initial values of these fields; if you've duplicated the initialization code in derived classes, you will need to change the initialization code in more than one create procedure. More often than not, however, this results in defects in the derived class create procedures, especially if those derived classes appear in different source files than the base class.

Another problem with burying base class initialization in derived class constructors is the violation of the information-hiding principle. Some fields of the base class may be logically private. Although HLA does not explicitly support the concept of public and private fields in a class (as, say, C++ does), well-disciplined programmers will still partition the fields as private or public and then use the private fields only in class routines belonging to that class. Initializing these private fields in derived classes is not acceptable to such programmers. Doing so will make it very difficult to change the definition and implementation of some base class at a later date.

Fortunately, HLA provides an easy mechanism for calling the inherited constructor within a derived class's constructor. All you have to do is call the base constructor using the class name syntax; for example, you could call tBase.create directly from within tDerived.create. By calling the base class constructor, your derived class constructors can initialize the base class fields without worrying about the exact implementation (or initial values) of the base class.

Unfortunately, there are two types of initialization that every (conventional) constructor does that will affect the way you call a base class constructor: All conventional constructors allocate memory for the class if ESI contains 0, and all conventional constructors initialize the VMT pointer. Fortunately, it is very easy to deal with these two problems.

The memory required by an object of some base class is usually less than the memory required for an object of a class you derive from that base class (because the derived classes usually add more fields). Therefore, you cannot allow the base class constructor to allocate the storage when you call it from inside the derived class's constructor. You can easily solve this problem by checking ESI within the derived class constructor and allocating any necessary storage for the object before calling the base class constructor.

The second problem is the initialization of the VMT pointer. When you call the base class's constructor, it will initialize the VMT pointer with the address of the base class's virtual method table. A derived class object's _pVMT_ field, however, must point at the virtual method table for the derived class. Calling the base class constructor will always initialize the _pVMT_ field with the wrong pointer. To properly initialize the _pVMT_ field with the appropriate value, the derived class constructor must store the address of the derived class's virtual method table into the _pVMT_ field after the call to the base class constructor (so that it overwrites the value written by the base class constructor).

The tDerived.create constructor, rewritten to call the tBase.create constructors, follows:

procedure tDerived.create; @nodisplay;
     begin create;

        if( esi = 0 ) then

             push( eax );
             mov( mem.alloc( @size( tDerived )), esi );
             pop( eax );

        endif;

        // Call the base class constructor to do any initialization
        // needed by the base class. Note that this call must follow
        // the object allocation code above (so esi will always contain
        // a pointer to an object at this point and tBase.create will
        // never allocate storage).

        (type tBase [esi]).create();

        // Initialize the VMT pointer for this object. This code
        // must always follow the call to the base class constructor
        // because the base class constructor also initializes this
        // field and we don't want the initial value supplied by
        // tBase.create.

        mov( &tDerived._VMT_, this._pVMT_ );

        // Initialize the "r" field of this particular object:

        fldz();
        fstp( this.r );

     end create;

This solution solves all the above concerns with derived class constructors. Note that the call to the base constructor uses the syntax (type tBase [esi]).create(); rather than tBase.create();. The problem with calling tBase.create directly is that it will load NULL into ESI and overwrite the pointer to the storage allocated in tDerived.create. The scheme above uses the existing value in ESI when calling tBase.create.

None of the constructor examples to this point have had any parameters. However, there is nothing special about constructors that prevents the use of parameters. Constructors are procedures; therefore, you can specify any number and any type of parameters you choose. You can use these parameter values to initialize certain fields or control how the constructor initializes the fields. Of course, you may use constructor parameters for any purpose you'd use parameters for in any other procedure. In fact, about the only issue you need concern yourself with is the use of parameters whenever you have a derived class. This section deals with those issues.

The first, and probably most important, problem with parameters in derived class constructors actually applies to all overridden procedures and methods: The parameter list of an overridden routine must exactly match the parameter list of the corresponding routine in the base class. In fact, HLA doesn't even give you the chance to violate this rule because override routine prototypes don't allow parameter list declarations: They automatically inherit the parameter list of the base routine. Therefore, you cannot use a special parameter list in the constructor prototype for one class and a different parameter list for the constructors appearing in base or derived classes. Sometimes it would be nice if this weren't the case, but there are some sound and logical reasons why HLA does not support this.[139]

HLA supports a special overloads declaration that lets you call one of several different procedures, methods, or iterators using a single identifier (with the number of types of parameters specifying which function to call). This would allow you, for example, to create multiple constructors for a given class (or derived class) and invoke the desired constructor using a matching parameter list for that constructor. Interested readers should consult the chapter on procedures in the HLA documentation for more details concerning the overloads declaration.



[138] You generally do not declare objects in readonly sections because you cannot initialize them.

[139] Calling virtual methods and iterators would be a real problem because you don't really know which routine a pointer references. Therefore, you couldn't know the proper parameter list. While the problems with procedures aren't quite as drastic, there are some subtle problems that could creep into your code if base or derived classes allowed overridden procedures with different parameter lists.