Appendix 3: Cost Models for Time and Space

Section 7.2 describes two little programs for estimating the time and space consumed by various primitive operations. This appendix shows how those can grow into useful programs for generating one-page time and space estimates. The complete source code for both programs can be found at this book’s web site.

The program spacemod.cpp produces a model of the space consumed by various constructs in C++. The first part of the program uses a sequence of statements like

cout << "sizeof(char)=" << sizeof(char);
cout << " sizeof(short)=" << sizeof(short);

to give precise measurements of primitive objects:

sizeof(char)=l  sizeof(short)=2  sizeof(int)=4
sizeof(float)=4  sizeof(struct *)=4  sizeof(long)=4
sizeof(double)=8

The program also defines a dozen structures, using the simple naming convention illustrated in these examples:

struct structc { char c; };
struct structic { int i; char c; };
struct structip { int i; structip *p; };
struct structdc { double d; char c; };
struct structcl2 { char c[12]; };

The program uses a macro to report the sizeof the structure, and then to estimate the number of bytes that new allocates in a report like this:

structc     1   48 48 48 48 48 48 48 48 48 48
structic    8   48 48 48 48 48 48 48 48 48 48
structip    8   48 48 48 48 48 48 48 48 48 48
structdc    16  64 64 64 64 64 64 64 64 64 64
structcd    16  64 64 64 64 64 64 64 64 64 64
structcdc   24  -3744 4096 64 64 64 64 64 64 64 64
structiii   12  48 48 48 48 48 48 48 48 48 48

The first number is given by sizeof and the next ten numbers report the differences between successive pointers returned by new. This output is typical: most of the numbers are quite consistent, but the allocator skips around every now and then.

This macro makes one line of the report:

#define MEASURE(T, text) {                \
    cout << text << "\t";                 \
    cout << sizeof(T) << "\t";            \
    int lastp = 0;                        \
    for (int i = 0; i < 11; i++) {        \
        T *p = new T;                     \
        int thisp = (int) p;              \
        if (lastp != 0)                   \
            cout << " " << thisp - lastp; \
        lastp = thisp;                    \
    }                                     \
    cout << "\n";                         \
}

The macro is called with the structure name followed by the same name in quotes, so it can be printed:

MEASURE(structc, "structc") ;

(My first draft used a C++ template parameterized by structure type, but measurements were led wildly astray by artifacts of the C++ implementation.)

This table summarizes the output of the program on my machine:

Image

The left column of numbers helps us to estimate the sizeof structures. We start by summing the sizeof the types; that explains the 8 bytes for structip. We must also account for the alignment; although its components sum to 10 bytes (two chars and a double), structcdc consumes 24 bytes.

The right column gives us insight into the space overhead of the new operator. It appears that any structure with a sizeof 12 bytes or less will consume 48 bytes. Structures with 13 through 28 bytes consume 64 bytes. In general, the allocated block size will be a multiple of 16, with 36 to 47 bytes of overhead. This is surprisingly expensive; other systems that I use consume just 8 bytes of overhead to represent an 8-byte record.

Section 7.2 also describes a little program for estimating the cost of one particular C operation. We can generalize that to the timemod.c program that produces a one-page cost model for a set of C operations. (Brian Kernighan, Chris Van Wyk and I built a predecessor of this program in 1991.) The main function of that program consists of a sequence of a T (for title) lines followed by M lines to measure the cost of operations:

T("Integer Arithmetic");
M({});
M(k++);
M(k = i + j);
M(k = i – j);
    ...

Those lines (and a few more like them) produce this output:

Integer Arithmetic (n=5000)
 {}           250   261   250   250   251    10
 k++          471   460   471   461   460    19
 k = i + j    491   491   500   491   491    20
 k = i – j    440   441   441   440   441    18
 k = i * j    491   490   491   491   490    20
 k = i / j   2414  2433  2424  2423  2414    97
 k = i % j   2423  2414  2423  2414  2423    97
 k = i & j    491   491   480   491   491    20
 k = i | j    440   441   441   440   441    18

The first column gives the operation that is executed inside the loop

for i = [1, n]
    for j = [1, n]
        op

The next five columns show the raw time in clock clicks (milliseconds on this system) for five executions of that loop (these times are all consistent; inconsistent numbers help to identify suspicious runs). The final column gives the average cost in nanoseconds per operation. The first row of the table says that it takes about ten nanoseconds to execute a loop that contains the null operation. The next row says that incrementing the variable k consumes about 9 additional nanoseconds. All of the arithmetic and logical operations have about the same cost, with the exception of the division and remainder operators, which are an order-of-magnitude more expensive.

This approach gives rough estimates for my machine, and they must not be over-interpreted. I conducted all experiments with optimization disabled. When I enabled that option, the optimizer removed the timing loops and all times were zero.

The work is done by the M macro, which can be sketched in pseudocode as:

#define M(op)
    print op as a string
    timesum = 0
    for trial = [0, trials)
        start = clock()
        for i = [1, n]
            fi = i
            for j = [1, n]
                op
        t = clock()-start
        print t
        timesum += t
    print 1e9*timesum / (n*n * trials * CLOCKS_PER_SEC)

The complete code for this cost model can be found at this book’s web site.

We’ll now survey the output of the program on my particular system. Because the clock clicks are all consistent, we will omit them and report only the average time in nanoseconds.

Floating Point Arithmetic (n=5000)
 fj=j;                 18
 fj=j; fk = fi + fj    26
 fj=j; fk = fi – fj    27
 fj=j; fk = fi * fj    24
 fj=j; fk = fi / fj    78
Array Operations (n=5000)
 k = i + j             17
 k = x[i] + j          18
 k = i + x[j]          24
 k = x[i] + x[j]       27

The floating point operations originally assign the integer j to the floating-point fi (in about 8 nanoseconds); the outer loop assigns i to the floating-point fi. The floating operations themselves cost about as much as their integer counterparts, and the array operations are equally inexpensive.

The next tests give us insight into control flow in general and some sorting operations in particular:

Comparisons (n=5000)
 if (i < j) k++           20
 if (x[i] < x[j]) k++     25
Array Comparisons and Swaps (n=5000)
 k = (x[i]<x[k]) ? –1:1   34
 k = intcmp(x+i, x+j)     52
 swapmac(i, j)            41
 swapfunc(i, j)           65

The function versions of comparing and swapping each cost about 20 nanoseconds more than their inline counterparts. Section 9.2 compares the cost of computing the maximum of two values with functions, macros and inline code:

Max Function, Macro and Inline (n=5000)
 k = (i > j) ? i : j   26
 k = maxmac(i, j)      26
 k = maxfunc(i, j)     54

The rand function is relatively inexpensive (though recall that the bigrand function makes two calls to rand), square root is an order of magnitude greater than basic arithmetic operations (though only twice the cost of a division), simple trigonometric operations cost twice that, and advanced trigonometric operations run into microseconds.

Math Functions (n=1000)
 k = rand()          40
 fk = j+fi           20
 fk = sqrt(j+fi)    188
 fk = sin(j+fi)     344
 fk = sinh(j+fi)   2229
 fk = asin(j+fi)    973
 fk = cos(j+fi)     353
 fk = tan(j+fi)     465

Because those are so pricey, we shrunk the value of n. Memory allocation is more expensive yet, and merits an even smaller n:

Memory Allocation (n=500)
 free(malloc(16))       2484
 free(malloc(100))      3044
 free(malloc(2000))     4959