stdout.put( "This gets converted to a four-component string by HLA" );
HLA doesn't actually work directly with the string data described in the previous section. Instead, when HLA sees a string object, it always works with a pointer to that object rather than working directly with the object. Without question, this is the most important fact to know about HLA strings and is the biggest source of problems beginning HLA programmers have with strings in HLA: Strings are pointers! A string variable consumes exactly 4 bytes, the same as a pointer (because it is a pointer!). Having said all that, let's look at a simple string variable declaration in HLA:
static StrVariable: string;
Because a string variable is a pointer, you must initialize it before you can use it. There are three general ways you may initialize a string variable with a legal string address: using static initializers, using the str.alloc
routine, or calling some other HLA Standard Library function that initializes a string or returns a pointer to a string.
In one of the static declaration sections that allow initialized variables (static
and readonly
) you can initialize a string variable using the standard initialization syntax. For example:
static InitializedString: string := "This is my string";
Note that this does not initialize the string variable with the string data. Instead, HLA creates the string data structure (see 4.7 Character Strings) in a special, hidden, memory segment and initializes the InitializedString
variable with the address of the first character in this string (the T
in This
). Remember, strings are pointers! The HLA compiler places the actual string data in a read-only memory segment. Therefore, you cannot modify the characters of this string literal at runtime. However, because the string variable (a pointer, remember) is in the static
section, you can change the string variable so that it points at different string data.
Because string variables are pointers, you can load the value of a string variable into a 32-bit register. The pointer itself points at the first character position of the string. You can find the current string length in the double-word 4 bytes prior to this address, and you can find the maximum string length in the double-word 8 bytes prior to this address. The program in Example 4-8 demonstrates one way to access this data.[52]
When accessing the various fields of a string variable, it is not wise to access them using fixed numeric offsets as done in Example 4-8. In the future, the definition of an HLA string may change slightly. In particular, the offsets to the maximum length and length fields are subject to change. A safer way to access string data is to coerce your string pointer using the str.strRec
data type. The str.strRec
data type is a record
data type (see 4.25 Records) that defines symbolic names for the offsets of the length and maximum length fields in the string
data type. If the offsets to the length and maximum length fields were to change in a future version of HLA, then the definitions in str.strRec
would also change. So if you use str.strRec
, then recompiling your program would automatically make any necessary changes to your program.
To use the str.strRec
data type properly, you must first load the string pointer into a 32-bit register; for example, mov( SomeString, ebx );
. Once the pointer to the string data is in a register, you can coerce that register to the str.strRec
data type using the HLA construct (type str.strRec [ebx])
. Finally, to access the length or maximum length fields, you would use either (type str.strRec [ebx]).length
or (type str.strRec [ebx]).maxlen
(respectively). Although there is a little more typing involved (versus using simple offsets like −4 or −8), these forms are far more descriptive and much safer than straight numeric offsets. The program in Example 4-9 corrects the example in Example 4-8 by using the str.strRec
data type.
A second way to manipulate strings in HLA is to allocate storage on the heap to hold string data. Because strings can't directly use pointers returned by mem.alloc
(string operations access the 8 bytes prior to the address), you shouldn't use mem.alloc
to allocate storage for string data. Fortunately, the HLA Standard Library memory module provides a memory allocation routine specifically designed to allocate storage for strings: str.alloc
. Like mem.alloc
, str.alloc
expects a single double-word parameter. This value specifies the maximum number of characters allowed in the string. The str.alloc
routine will allocate the specified number of bytes of memory, plus between 9 and 13 additional bytes to hold the extra string information.[53]
Once you've allocated storage for a string, you can call various string-manipulation routines in the HLA Standard Library to manipulate the string. The next section discusses a few of the HLA string routines in detail; this section introduces a couple of string-related routines for the sake of example. The first such routine is the stdin.gets(
strvar
);
. This routine reads a string from the user and stores the string data into the string storage pointed at by the string parameter (strvar
in this case). If the user attempts to enter more characters than the maximum the string allows, then stdin.gets
raises the ex.StringOverflow
exception. The program in Example 4-10 demonstrates the use of str.alloc
.
To free storage you allocate via str.alloc
, you must call the str.free
routine, passing the string pointer as the single parameter. The program in Example 4-11 is a correction of the program Example 4-10 with this defect corrected.
When looking at this corrected program, please take note that the stdin.gets
routine expects you to pass it a string parameter that points at an allocated string object. Without question, one of the most common mistakes beginning HLA programmers make is to call stdin.gets
and pass it a string variable that they have not initialized. This may be getting old now, but keep in mind that strings are pointers! Like pointers, if you do not initialize a string with a valid address, your program will probably crash when you attempt to manipulate that string object. The call to str.alloc
and the following mov
instruction is how the programs above initialize the string pointer. If you are going to use string variables in your programs, you must ensure that you allocate storage for the string data prior to writing data to the string object.
Allocating storage for a string is such a common operation that many HLA Standard Library routines will automatically allocate the storage for you. Generally, such routines have an a_
prefix as part of their name. For example, the stdin.a_gets
combines a call to str.alloc
and stdin.gets
into the same routine. This routine, which doesn't have any parameters, reads a line of text from the user, allocates a string object to hold the input data, and then returns a pointer to the string in the EAX register. Example 4-12 presents an adaptation of the two programs in Example 4-10 and Example 4-11 that uses stdin.a_gets
.
Note that, as before, you must still free up the storage stdin.a_gets
allocates by calling the str.free
routine. One big difference between this routine and the previous two is the fact that HLA will automatically allocate exactly enough space for the string read from the user. In the previous programs, the call to str.alloc
allocates only 16 bytes. If the user types more than 16 characters, then the program raises an exception and quits. If the user types fewer than 16 characters, then some space at the end of the string is wasted. The stdin.a_gets
routine, on the other hand, always allocates the minimum necessary space for the string read from the user. Because it allocates the storage, there is little chance of overflow.[54]
[52] Note that this scheme is not recommended. If you need to extract the length information from a string, use the routines provided in the HLA string library for this purpose.
[53] str.alloc
may allocate more than 9 bytes for the overhead data because the memory allocated to an HLA string must always be double-word aligned, and the total length of the data structure must be a multiple of 4.
[54] Actually, there are limits on the maximum number of characters that stdin.a_gets
will allocate. This is typically between 1,024 bytes and 4,096 bytes. See the HLA Standard Library source listings and your operating system documentation for the exact value.