The last example in the previous section demonstrates a specific case of a very general problem: counting bits. Unfortunately, that example has a severe limitation: It only counts a single run of 1 bits appearing in the source operand. This section discusses a more general solution to this problem.
Hardly a week goes by that someone doesn't ask on one of the Internet newsgroups how to count the number of bits in a register operand. This is a common request, undoubtedly, because many assembly language course instructors assign this task as a project to their students as a way to teach them about the shift and rotate instructions. Undoubtedly, the solution these instructors expect is something like the following:
// BitCount1: // // Counts the bits in the eax register, returning the count in ebx. mov( 32, cl ); // Count the 32 bits in eax. sub( ebx, ebx ); // Accumulate the count here. CntLoop: shr( 1, eax ); // Shift next bit out of eax and into Carry. adc( 0, bl ); // Add the carry into the ebx register. dec( cl ); // Repeat 32 times. jnz CntLoop;
The "trick" worth noting here is that this code uses the adc
instruction to add the value of the carry flag into the BL register. Because the count is going to be less than 32, the result will fit comfortably into BL.
Tricky code or not, this instruction sequence is not particularly fast. As you can tell with just a small amount of analysis, the loop above always executes 32 times, so this code sequence executes 130 instructions (4 instructions per iteration plus 2 extra instructions). You might ask if there is a more efficient solution; the answer is yes. The following code, taken from the AMD Athlon optimization guide, provides a faster solution (see the comments for a description of the algorithm):
// bitCount // // Counts the number of "1" bits in a dword value. // This function returns the dword count value in eax. procedure bitCount( BitsToCnt:dword ); @nodisplay; const EveryOtherBit := $5555_5555; EveryAlternatePair := $3333_3333; EvenNibbles := $0f0f_0f0f; begin bitCount; push( edx ); mov( BitsToCnt, eax ); mov( eax, edx ); // Compute sum of each pair of bits // in eax. The algorithm treats // each pair of bits in eax as a // 2-bit number and calculates the // number of bits as follows (description // is for bits 0 and 1, it generalizes // to each pair): // // edx = Bit1 Bit0 // eax = 0 Bit1 // // edx-eax = 00 if both bits were 0. // 01 if Bit0=1 and Bit1=0. // 01 if Bit0=0 and Bit1=1. // 10 if Bit0=1 and Bit1=1. // // Note that the result is left in edx. shr( 1, eax ); and( EveryOtherBit, eax ); sub( eax, edx ); // Now sum up the groups of 2 bits to // produces sums of 4 bits. This works // as follows: // // edx = bits 2,3, 6,7, 10,11, 14,15, ..., 30,31 // in bit positions 0,1, 4,5, ..., 28,29 with // zeros in the other positions. // // eax = bits 0,1, 4,5, 8,9, ... 28,29 with zeros // in the other positions. // // edx+eax produces the sums of these pairs of bits. // The sums consume bits 0,1,2, 4,5,6, 8,9,10, ... 28,29,30 // in eax with the remaining bits all containing 0. mov( edx, eax ); shr( 2, edx ); and( EveryAlternatePair, eax ); and( EveryAlternatePair, edx ); add( edx, eax ); // Now compute the sums of the even and odd nibbles in the // number. Because bits 3, 7, 11, etc. in eax all contain // 0 from the above calculation, we don't need to AND // anything first, just shift and add the two values. // This computes the sum of the bits in the 4 bytes // as four separate values in eax (al contains number of // bits in original al, ah contains number of bits in // original ah, etc.) mov( eax, edx ); shr( 4, eax ); add( edx, eax ); and( EvenNibbles, eax ); // Now for the tricky part. // We want to compute the sum of the 4 bytes // and return the result in eax. The following // multiplication achieves this. It works // as follows: // (1) the $01 component leaves bits 24..31 // in bits 24..31. // // (2) the $100 component adds bits 17..23 // into bits 24..31. // // (3) the $1_0000 component adds bits 8..15 // into bits 24..31. // // (4) the $1000_0000 component adds bits 0..7 // into bits 24..31. // // Bits 0..23 are filled with garbage, but bits // 24..31 contain the actual sum of the bits // in eax's original value. The shr instruction // moves this value into bits 0..7 and zeros // out the H.O. bits of eax. intmul( $0101_0101, eax ); shr( 24, eax ); pop( edx ); end bitCount;