A string is a collection of values stored in contiguous memory locations. Strings are usually arrays of bytes, words, or (on 80386 and later processors) double words. The 80x86 microprocessor family supports several instructions specifically designed to cope with strings. This chapter explores some of the uses of these string instructions.
The 80x86 CPUs can process three types of strings: byte strings, word strings, and double-word strings. They can move strings, compare strings, search for a specific value within a string, initialize a string to a fixed value, and do other primitive operations on strings. The 80x86's string instructions are also useful for manipulating arrays, tables, and records. You can easily assign or compare such data structures using the string instructions. Using string instructions may speed up your array-manipulation code considerably.
All members of the 80x86 family support five different string instructions: movs
x
, cmps
x
, scas
x
, lods
x
, and stosx
.[128] (x
= b
, w
, or d
for byte, word, or double word, respectively; this text will generally drop the x
suffix when talking about these string instructions in a general sense.) They are the string primitives on which you can build most other string operations. How you use these five instructions is the topic of the sections that follow.
For MOVS: movsb(); movsw(); movsd(); For CMPS: cmpsb(); cmpsw(); cmpsd(); For SCAS: scasb(); scasw(); scasd(); For STOS: stosb(); stosw(); stosd(); For LODS: lodsb(); lodsw(); lodsd();
The string instructions operate on blocks (contiguous linear arrays) of memory. For example, the movs
instruction moves a sequence of bytes from one memory location to another. The cmps
instruction compares two blocks of memory. The scas
instruction scans a block of memory for a particular value. These string instructions often require three operands: a destination block address, a source block address, and (optionally) an element count. For example, when using the movs
instruction to copy a string, you need a source address, a destination address, and a count (the number of string elements to move).
Unlike other instructions, which operate on memory, the string instructions don't have any explicit operands. The operands for the string instructions are as follows:
For example, one variant of the movs
(move string) instruction copies ECX elements from the source address specified by ESI to the destination address specified by EDI. Likewise, the cmps
instruction compares the string pointed at by ESI, of length ECX, to the string pointed at by EDI.
Not all string instructions have source and destination memory operands (only movs
and cmps
support them). For example, the scas
instruction (scan a string) compares the value in the accumulator (AL, AX, or EAX) to values in memory.
The string instructions, by themselves, do not operate on strings of data. The movs
instruction, for example, will only copy a single byte, word, or double word. When the movs
instruction executes, it ignores the value in the ECX register. The repeat prefixes tell the 80x86 to do a multibyte string operation. The syntax for the repeat prefix is as follows:
For MOVS: rep.movsb(); rep.movsw(); rep.movsd(); For CMPS: repe.cmpsb(); // Note: repz is a synonym for repe. repe.cmpsw(); repe.cmpsd(); repne.cmpsb(); // Note: repnz is a synonym for repne. repne.cmpsw(); repne.cmpsd(); For SCAS: repe.scasb(); // Note: repz is a synonym for repe. repe.scasw(); repe.scasd(); repne.scasb(); // Note: repnz is a synonym for repne. repne.scasw(); repne.scasd(); For STOS: rep.stosb(); rep.stosw(); rep.stosd();
You don't normally use the repeat prefixes with the lods
instruction.
When specifying the repeat prefix before a string instruction, the string instruction repeats its operation ECX times.[129] Without the repeat prefix, the instruction operates only on a single element (byte, word, or double word).
You can use repeat prefixes to process entire strings with a single instruction. You can use the string instructions, without the repeat prefix, as string primitive operations to synthesize more powerful string operations.
In addition to the ESI, EDI, ECX, and AL/AX/EAX registers, one other register controls the operation of the 80x86's string instructions—the EFLAGs register. Specifically, the direction flag in the flags register controls how the CPU processes strings.
If the direction flag is clear, the CPU increments ESI and EDI after operating on each string element. For example, executing movs
will move the byte, word, or double word at ESI to EDI and will then increment ESI and EDI by 1, 2, or 4. When specifying the rep
prefix before this instruction, the CPU increments ESI and EDI for each element in the string (the count in ECX specifies the number of elements). At completion, the ESI and EDI registers will be pointing at the first item beyond the strings.
If the direction flag is set, the 80x86 decrements ESI and EDI after it processes each string element (again, ECX specifies the number of string elements). After a repeated string operation, the ESI and EDI registers will be pointing at the first byte, word, or double word before the strings if the direction flag was set.
You can change the direction flag's value using the cld
(clear direction flag) and std
(set direction flag) instructions. When using these instructions inside a procedure, keep in mind that they modify the machine state. Therefore, you may need to save the direction flag during the execution of that procedure. The following example exhibits the kinds of problems you might encounter.
procedure Str2; @nodisplay; begin Str2; std(); << Do some string operations. >> . . . end Str2; . . . cld(); << Do some operations. >> Str2(); << Do some string operations requiring D=0. >>
This code will not work properly. The calling code assumes that the direction flag is clear after Str2
returns. However, this isn't true. Therefore, the string operations executed after the call to Str2
will not function properly.
There are a couple of ways to handle this problem. The first, and probably the most obvious, is always to insert the cld
or std
instructions immediately before executing a sequence of one or more string instructions. This ensures that the direction flag is always set properly for your code. The other alternative is to save and restore the direction flag using the pushfd
and popfd
instructions. Using these two techniques, the code above would look like the following examples.
Always issuing cld
or std
before a string instruction:
procedure Str2; @nodisplay; begin Str2; std(); << Do some string operations. >> . . . end Str2; . . . cld(); << Do some operations. >> Str2(); cld(); << Do some string operations requiring D=0. >>
Saving and restoring the flags register:
procedure Str2; @nodisplay; begin Str2; pushfd(); std(); << Do some string operations. >> . . . popfd(); end Str2; . . cld(); << Do some operations. >> Str2(); << Do some string operations requiring D=0. >>
If you use the pushfd
and popfd
instructions to save and restore the flags register, keep in mind that you're saving and restoring all the flags. This makes it somewhat difficult to return information in other flag bits. For example, it's a bit of work to return an error condition in the carry flag if you use pushfd
and popfd
to preserve the direction flag in the procedure.
A third solution is to always ensure that the direction flag is clear except for the execution of a particular sequence that requires it to be set. For example, many library calls and some operating systems always assume that the direction flag is clear when you call them. Most standard C library functions work this way, for example. You can follow this convention by always assuming that the direction flag is clear, and then make sure you clear it immediately after a sequence that requires the use of std
.
The movs
instruction uses the following syntax:
movsb() movsw() movsd() rep.movsb() rep.movsw() rep.movsd()
The movsb
(move string, bytes) instruction fetches the byte at address ESI, stores it at address EDI, and then increments or decrements the ESI and EDI registers by 1. If the rep
prefix is present, the CPU checks ECX to see if it contains 0. If not, then it moves the byte from ESI to EDI and decrements the ECX register. This process repeats until ECX becomes 0. If ECX contains 0 upon initial execution, the movs
instruction will not copy any data bytes.
The movsw
(move string, words) instruction fetches the word at address ESI, stores it at address EDI, and then increments or decrements ESI and EDI by 2. If there is a rep
prefix, then the CPU repeats this procedure ECX times.
The movsd
instruction operates in a similar fashion on double words. It increments or decrements ESI and EDI by 4 after each data movement.
When you use the rep
prefix, the movsb
instruction moves the number of bytes you specify in the ECX register. The following code segment copies 384 bytes from CharArray1
to CharArray2
:
CharArray1: byte[ 384 ]; CharArray2: byte[ 384 ]; . . . cld(); lea( esi, CharArray1 ); lea( edi, CharArray2 ); mov( 384, ecx ); rep.movsb();
If you substitute movsw
for movsb
, then the preceding code will move 384 words (768 bytes) rather than 384 bytes:
WordArray1: word[ 384 ]; WordArray2: word[ 384 ]; . . . cld(); lea( esi, WordArray1 ); lea( edi, WordArray2 ); mov( 384, ecx ); rep.movsw();
Remember, the ECX register contains the element count, not the byte count. When using the movsw
instruction, the CPU moves the number of words specified in the ECX register. Similarly, movsd
moves the number of double words you specify in the ECX register, not the number of bytes.
If you've set the direction flag before executing a movsb/movsw/movsd
instruction, the CPU decrements the ESI and EDI registers after moving each string element. This means that the ESI and EDI registers must point at the last element of their respective strings before executing a movsb
, movsw
, or movsd
instruction. For example:
CharArray1: byte[ 384 ]; CharArray2: byte[ 384 ]; . . . cld(); lea( esi, CharArray1[383] ); lea( edi, CharArray2[383] ); mov( 384, ecx ); rep.movsb();
Although there are times when processing a string from tail to head is useful (see the cmps
description in 11.1.5 The cmps Instruction), generally you'll process strings in the forward direction because that's more straightforward. There is one class of string operations where being able to process strings in both directions is absolutely mandatory: moving strings when the source and destination blocks overlap. Consider what happens in the following code:
CharArray1: byte; CharArray2: byte[ 384 ]; . . . cld(); lea( esi, CharArray1 ); lea( edi, CharArray2 ); mov( 384, ecx ); rep.movsb();
This sequence of instructions treats CharArray1
and CharArray2
as a pair of 384-byte strings. However, the last 383 bytes in the CharArray1
array overlap the first 383 bytes in the CharArray2
array. Let's trace the operation of this code byte by byte.
When the CPU executes the movsb
instruction, it copies the byte at ESI (CharArray1
) to the byte pointed at by EDI (CharArray2
). Then it increments ESI and EDI, decrements ECX by 1, and repeats this process. Now the ESI register points at CharArray1+1
(which is the address of CharArray2
), and the EDI register points at CharArray2+1
. The movsb
instruction copies the byte pointed at by ESI to the byte pointed at by EDI. However, this is the byte originally copied from location CharArray1
. So the movsb
instruction copies the value originally in location CharArray1
to both locations CharArray2
and CharArray2+1
. Again, the CPU increments ESI and EDI, decrements ECX, and repeats this operation. Now the movsb
instruction copies the byte from location CharArray1+2
(CharArray2+1
) to location CharArray2+2
. But once again, this is the value that originally appeared in location CharArray1
. Each repetition of the loop copies the next element in CharArray1[0]
to the next available location in the CharArray2
array. Pictorially, it looks something like Figure 11-1.
The end result is that the movsb
instruction replicates X
throughout the string. The movsb
instruction copies the source operand into the memory location, which will become the source operand for the very next move operation, which causes the replication.
If you really want to move one array into another when they overlap like this, you should move each element of the source string to the destination string starting at the end of the two strings, as shown in Figure 11-2.
Setting the direction flag and pointing ESI and EDI at the end of the strings will allow you to (correctly) move one string to another when the two strings overlap and the source string begins at a lower address than the destination string. If the two strings overlap and the source string begins at a higher address than the destination string, then clear the direction flag and point ESI and EDI at the beginning of the two strings.
If the two strings do not overlap, then you can use either technique to move the strings around in memory. Generally, operating with the direction flag clear is the easiest, so that makes the most sense.
You shouldn't use the movs
x
instruction to fill an array with a single byte, word, or double-word value. Another string instruction, stos
, is much better for this purpose. However, for arrays whose elements are 1, 2, or 4 bytes, you can use the movs
instruction to initialize the entire array to the content of the first element.
The movs
instruction is sometimes more efficient when copying double words than it is copying bytes or words. On some systems, it typically takes the same amount of time to copy a byte using movsb
as it does to copy a double word using movsd
. Therefore, if you are moving a large number of bytes from one array to another, the copy operation will be faster if you can use the movsd instruction rather than the movsb
instruction. If the number of bytes you wish to move is an even multiple of 4, this is a trivial change; just divide the number of bytes to copy by 4, load this value into ECX, and then use the movsb
instruction. If the number of bytes is not evenly divisible by 4, then you can use the movsd
instruction to copy all but the last 1, 2, or 3 bytes of the array (that is, the remainder after you divide the byte count by 4). For example, if you want to efficiently move 4,099 bytes, you can do so with the following instruction sequence.
lea( esi, Source ); lea( edi, Destination ); mov( 1024, ecx ); // Copy 1024 dwords = 4096 bytes. rep.movsd(); movsw(); // Copy bytes 4097 and 4098. movsb(); // Copy the last byte.
Using this technique to copy data never requires more than three movs
x
instructions because you can copy 1, 2, or 3 bytes with no more than two movsb
and movsw
instructions. The scheme above is most efficient if the two arrays are aligned on double-word boundaries. If not, you might want to move the movsb
or movsw
instruction (or both) before the movsd
so that the movsd
instruction works with double-word-aligned data.
If you do not know the size of the block you are copying until the program executes, you can still use code like the following to improve the performance of a block move of bytes:
lea( esi, Source ); lea( edi, Dest ); mov( Length, ecx ); shr( 2, ecx ); // Divide by 4. if( @nz ) then // Only execute movsd if 4 or more bytes. rep.movsd(); // Copy the dwords. endif; mov( Length, ecx ); and( %11, ecx ); // Compute (Length mod 4). if( @nz ) then // Only execute movsb if #bytes/4 <> 0. rep.movsb(); // Copy the remaining 1, 2, or 3 bytes. endif;
On many computer systems, the movsd
instruction provides about the fastest way to copy bulk data from one location to another. While there are, arguably, faster ways to copy the data on certain CPUs, ultimately the memory bus performance is the limiting factor, and the CPUs are generally much faster than the memory bus. Therefore, unless you have a special system, writing fancy code to improve memory-to-memory transfers is probably a waste of time. Also note that Intel has improved the performance of the movs
x
instructions on later processors so that movsb
operates almost as efficiently as movsw
and movsd
when copying the same number of bytes. Therefore, when working on a later 80x86 processor, it may be more efficient to simply use movsb
to copy the specified number of bytes rather than go through all the complexity outlined above. The bottom line is this: If the speed of a block move matters to you, try it several different ways and pick the fastest (or the simplest, if they all run the same speed, which is likely).
The cmps
instruction compares two strings. The CPU compares the string referenced by EDI to the string pointed at by ESI. ECX contains the length of the two strings (when using the repe
or repne
prefix). Like the movs
instruction, HLA allows several different forms of this instruction:
cmpsb(); cmpsw(); cmpsd(); repe.cmpsb(); repe.cmpsw(); repe.cmpsd(); repne.cmpsb(); repne.cmpsw(); repne.cmpsd();
As for the movs
instruction, you specify the actual operand addresses in the ESI and EDI registers.
Without a repeat prefix, the cmps
instruction subtracts the value at location EDI from the value at ESI and updates the flags. Other than updating the flags, the CPU doesn't use the difference produced by this subtraction. After comparing the two locations, cmps
increments or decrements the ESI and EDI registers by 1, 2, or 4 (for cmpsb/cmpsw/cmpsd
, respectively). cmps
increments the ESI and EDI registers if the direction flag is clear and decrements them otherwise.
Of course, you will not tap the real power of the cmps
instruction using it to compare single bytes, words, or double words in memory. This instruction shines when you use it to compare whole strings. With cmps
, you can compare consecutive elements in a string until you find a match or until consecutive elements do not match.
To compare two strings to see if they are equal or not equal, you must compare corresponding elements in a string until they don't match. Consider the following strings:
"String1" "String1"
The only way to determine that these two strings are equal is to compare each character in the first string to the corresponding character in the second. After all, the second string could have been String2
, which definitely is not equal to String1
. Once you encounter a character in the destination string that does not equal the corresponding character in the source string, the comparison can stop. You needn't compare any other characters in the two strings.
The repe
prefix accomplishes this operation. It will compare successive elements in a string as long as they are equal and ECX is greater than 0. We could compare the two strings above using the following 80x86 assembly language code:
cld(); mov( AdrsString1, esi ); mov( AdrsString2, edi ); mov( 7, ecx ); repe.cmpsb();
After the execution of the cmpsb
instruction, you can test the flags using the standard (unsigned) conditional jump instructions. This lets you check for equality, inequality, less than, greater than, and so on.
Character strings are usually compared using lexicographical ordering. In lexicographical ordering, the least significant element of a string carries the most weight. This is in direct contrast to standard integer comparisons, where the most significant portion of the number carries the most weight. Furthermore, the length of a string affects the comparison only if the two strings are identical up to the length of the shorter string. For example, Zebra
is less than Zebras
because it is the shorter of the two strings; however, Zebra
is greater than AAAAAAAAAAH!
even though Zebra
is shorter. Lexicographical comparisons compare corresponding elements until encountering a character that doesn't match or until encountering the end of the shorter string. If a pair of corresponding characters do not match, then this algorithm compares the two strings based on that single character. If the two strings match up to the length of the shorter string, we must compare their length. The two strings are equal if and only if their lengths are equal and each corresponding pair of characters in the two strings are identical. Lexicographical ordering is the standard alphabetical ordering you've grown up with.
For character strings, use the cmps
instruction in the following manner:
The direction flag must be cleared before comparing the strings.
Use the cmpsb
instruction to compare the strings on a byte-by-byte basis. Even if the strings contain an even number of characters, you cannot use the cmpsw
or cmpsd
instructions. They do not compare strings in lexicographical order.
You must load the ECX register with the length of the smaller string.
Use the repe
prefix.
The ESI and EDI registers must point at the very first character in the two strings you want to compare.
After the execution of the cmps
instruction, if the two strings were equal, their lengths must be compared in order to finish the comparison. The following code compares a couple of character strings:
mov( AdrsStr1, esi ); mov( AdrsStr2, edi ); mov( LengthSrc, ecx ); if( ecx > LengthDest ) then // Put the length of the // shorter string in ecx. mov( LengthDest, ecx ); endif; repe.cmpsb(); if( @z ) then // If equal to the length of the // shorter string, cmp lengths. mov( LengthSrc, ecx ); cmp( ecx, LengthDest ); endif;
If you're using bytes to hold the string lengths, you should adjust this code appropriately (that is, use a movzx
instruction to load the lengths into ECX). HLA strings use a double word to hold the current length value, so this isn't an issue when using HLA strings.
You can also use the cmps
instruction to compare multiword integer values (that is, extended-precision integer values). Because of the amount of setup required for a string comparison, this isn't practical for integer values less than six or eight double words in length, but for large integer values, it's an excellent way to compare such values. Unlike for character strings, we cannot compare integer strings using lexicographical ordering. When comparing strings, we compare the characters from the least significant byte to the most significant byte. When comparing integers, we must compare the values from the most significant byte (or word/double word) down to the least significant byte, word, or double word. So, to compare two 32-byte (256-bit) integer values, use the following code on the 80x86:
std(); lea( esi, SourceInteger[28] ); lea( edi, DestInteger[28] ); mov( 8, ecx ); rep.cmpsd();
This code compares the integers from their most significant dword down to the least significant dword. The cmpsd
instruction finishes when the two values are unequal or upon decrementing ECX to 0 (implying that the two values are equal). Once again, the flags provide the result of the comparison.
The repne
prefix will instruct the cmps
instruction to compare successive string elements as long as they do not match. The 80x86 flags are of little use after the execution of this instruction. Either the ECX register is 0 (in which case the two strings are totally different), or it contains the number of elements compared in the two strings until a match is found. While this form of the cmps
instruction isn't particularly useful for comparing strings, it is useful for locating the first pair of matching items in a couple of byte, word, or double-word arrays. In general, though, you'll rarely use the repne
prefix with cmps
.
One last thing to keep in mind with using the cmps
instruction: The value in the ECX register determines the number of elements to process, not the number of bytes. Therefore, when using cmpsw
, ECX specifies the number of words to compare. Likewise, for cmpsd
, ECX contains the number of double words to process.
The cmps
instruction compares two strings against each other. You do not use it to search for a particular element within a string. For example, you could not use the cmps
instruction to quickly scan for a 0 throughout some other string. You can use the scas
(scan string) instruction for this task.
Unlike the movs
and cmps
instructions, the scas
instruction requires only a destination string (pointed at by EDI) rather than both a source and destination string. The source operand is the value in the AL (scasb), AX (scasw
), or EAX (scasd
) register. The scas
instruction compares the value in the accumulator (AL, AX, or EAX) against the value pointed at by EDI and then increments (or decrements) EDI by 1, 2, or 4. The CPU sets the flags according to the result of the comparison. While this might be useful on occasion, scas
is a lot more useful when using the repe
and repne
prefixes.
With the repe
prefix (repeat while equal), scas
scans the string searching for an element that does not match the value in the accumulator. When using the repne
prefix (repeat while not equal), scas
scans the string, searching for the first string element that is equal to the value in the accumulator.
You're probably wondering, "Why do these prefixes do exactly the opposite of what they ought to do?" The preceding paragraphs haven't quite phrased the operation of the scas
instruction properly. When using the repe
prefix with scas
, the 80x86 scans through the string while the value in the accumulator is equal to the string operand. This is equivalent to searching through the string for the first element that does not match the value in the accumulator. The scas
instruction with repne
scans through the string while the accumulator is not equal to the string operand. Of course, this form searches for the first value in the string that matches the value in the accumulator register. The scas
instructions take the following forms:
scasb() scasw() scasd() repe.scasb() repe.scasw() repe.scasd() repne.scasb() repne.scasw() repne.scasd()
Like the cmps
and movs
instructions, the value in the ECX register specifies the number of elements, not bytes, to process when using a repeat prefix.
The stos
instruction stores the value in the accumulator at the location specified by EDI. After storing the value, the CPU increments or decrements EDI depending on the state of the direction flag. Although the stos
instruction has many uses, its primary use is to initialize arrays and strings to a constant value. For example, if you have a 256-byte array you want to clear out with zeros, use the following code:
cld(); lea( edi, DestArray ); mov( 64, ecx ); // 64 double words = 256 bytes. xor( eax, eax ); // Zero out eax. rep.stosd();
This code writes 64 double words rather than 256 bytes because a single stosd
operation is faster than four stosb
operations.
The stos
instructions take six forms. They are:
stosb(); stosw(); stosd(); rep.stosb(); rep.stosw(); rep.stosd();
The stosb
instruction stores the value in the AL register into the specified memory location(s), the stosw
instruction stores the AX register into the specified memory location(s), and the stosd
instruction stores EAX into the specified location(s).
Keep in mind that the stos
instruction is useful only for initializing a byte, word, or double-word array to a constant value. If you need to initialize an array with elements that have different values, you cannot use the stos
instruction.
The lods
instruction is unique among the string instructions. You will probably never use a repeat prefix with this instruction. The lods
instruction copies the byte, word, or double word pointed at by ESI into the AL, AX, or EAX register, after which it increments or decrements the ESI register by 1, 2, or 4. Repeating this instruction via the repeat prefix would serve almost no purpose whatsoever because the accumulator register will be overwritten each time the lods
instruction repeats. At the end of the repeat operation, the accumulator will contain the last value read from memory.
Instead, use the lods
instruction to fetch bytes (lodsb
), words (lodsw
), or double words (lodsd
) from memory for further processing. By using the lods
and stos
instructions, you can synthesize powerful string operations.
Like the stos
instruction, the lods
instructions take six forms:
lodsb(); lodsw(); lodsd(); rep.lodsb(); rep.lodsw(); rep.lodsd();
As mentioned earlier, you'll rarely, if ever, use the rep
prefixes with these instructions.[130] The 80x86 increments or decrements ESI by 1, 2, or 4 depending on the direction flag and whether you're using the lodsb
, lodsw
, or lodsd
instruction.
The 80x86 supports only five different string instructions: movs
, cmps
, scas
, lods
, and stos
.[131] These certainly aren't the only string operations you'll ever want to use. However, you can use the lods
and stos
instructions to easily generate any particular string operation you like. For example, suppose you wanted a string operation that converts all the uppercase characters in a string to lowercase. You could use the following code:
mov( StringAddress, esi ); // Load string address into esi. mov( esi, edi ); // Also point edi here. mov( (type str.strRec [esi]).length, ecx ); repeat lodsb(); // Get the next character in the string. if( al in 'A'..'Z' ) then or( $20, al ); // Convert uppercase to lowercase. endif; stosb(); // Store converted char into string. dec( ecx ); until( @z ); // Zero flag is set when ecx is 0.
Because the lods
and stos
instructions use the accumulator as an intermediary location, you can use any accumulator operation to quickly manipulate string elements.
[128] The 80x86 processor support two additional string instructions, ins
and outs
, which input strings of data from an input port or output strings of data to an output port. We will not consider these instructions because they are privileged instructions, and you cannot execute them in a standard 32-bit OS application.
[129] Except for the cmps
instruction, which repeats at most the number of times specified in the ECX register.
[130] They appear here simply because they are allowed. They're not very useful, but they are allowed. About the only use for this form of the instruction is to "touch" items in the cache so they are preloaded into the cache. However, there are better ways to accomplish this.
[131] Not counting ins
and outs
, which we're ignoring here.