API Design for C++

MyObject mObject;

• SetObjectByValue(object)

If you assume that the SetObjectByValue() and SetObjectByConstReference() methods both simply assign their argument to the mObject member variable, then the sequence of operations that get performed when each of these methods is called is as follows.

• std::string constructor

• MyObject copy constructor

• MyObject assignment operator

• MyObject destructor

• std::string destructor

• SetObjectByConstReference(object)

• MyObject assignment operator

The situation becomes worse if MyObject is derived from some base class because then the copy constructor and destructor of each base class in the object hierarchy would also have to be called for the pass by value case.

There is another reason to avoid passing arguments by value and that’s the “slicing problem” (Meyers, 2005). This is the problem that if a method accepts a base class argument by value and you pass a derived class, then any extra fields of the derived class will be sliced off. This is because the size of the object to be passed by value is determined, at compile time, to be the size of the base class. Passing arguments as const references instead of passing by value avoids the slicing problem.

7.2 Minimize #include Dependencies

The time it takes to compile a large project can depend greatly on the number and depth of #include files. As such, one of the common techniques for decreasing build times is to try to reduce the number of #include statements in header files.

7.2.1 Avoid “Winnebago” Headers

Some APIs provide a single large header file that pulls in all of the classes and global definitions for the interface. This can seem like a convenient affordance for your clients, however, it only serves to increase the compile-time coupling between your clients’ code and your API, which means that even the most minimal use of your API must pull in every public symbol.

For example, the standard Win32 header windows.h pulls in well over 200,000 lines of code (under Visual Studio 9.0). Every .cpp file that includes this header effectively adds over 4 MB of extra code that needs to be loaded from around 90 separate files and compiled for every source file. Similarly, the Mac OS X header Cocoa/Cocoa.h expands to over 100,000 lines of code at over 3 MB.

Precompiled headers can help alleviate this burden by preprocessing these large common include files to a more optimal form, such as a .pch or .gch file. However, a more modular and loosely coupled solution would involve providing a collection of smaller individual headers for each component of your API. Clients can then choose to #include only the declarations for the subset of your API that they are using. This can make for longer lists of #include statements in your client’s code, but the result is an overall reduction in the amount of your API that their code must pull in.

7.2.2 Forward Declarations

A header file, A, includes another header file, B, in order to pull in the declaration of a class, function, struct, enum, or other entity that is used in header A. The most common situation in an object-oriented program is that header A wants to pull in the declaration of one or more classes from header B. However, in many situations, header A does not actually need to include header B and can instead simply provide a forward declaration for the classes needed. A forward declaration can be used when

1. The size of the class is not required. If you include the class as a member variable or subclass from it, then the compiler will need to know the size of the class.

2. You do not reference any member methods of the class. Doing so would require knowing the method prototype: its argument and return types.

3. You do not reference any member variables of the class; but you already know to never make those public (or protected).

For example, you can use forward declarations if header A only refers to the name of classes from header B via pointers or references.

class B; // forward declaration

class A

void SetObject(const B &obj);

B *mObj;

void SetObject(const B &obj);

However, if you were to change the definition of class A so that that compiler needs to know the actual size of class B, then you must include the actual declaration of class B, that is, you must #include its header. For example, if you store an actual copy of B inside of A.

B mObj;

Obviously, you will need to #include the full header in any .cpp file that uses the classes in that header, for example, A.cpp must include B.h. A forward declare simply tells the compiler to add the name to its symbol table with a promise that you’ll provide full declaration when it actually needs it.

One interesting point is that a forward declaration is sufficient if you pass a variable to a method by value or return it by value (although you should of course prefer a const reference over passing a parameter by value). You might think that the compiler needs to know the size of the class at this point, but in actual fact it is only needed by the code that implements the method and any client code that calls it. So the following example is quite legal:

void SetObject(B obj);

B GetObject() const;

Tip

As a rule of thumb, you should only need to #include the header file for a class if you use an object of that class as a data member in your own class or if you inherit from that class.

Note that it is generally safer to only use forward declarations for your own code. By using forward declarations, you are embedding knowledge of how symbols are declared in the header that you are eliding. For example, if you use forward declares for a foreign header and your API is used in environments with different versions of that header, the declaration of a class within that header could be changed to a typedef or the class could be changed to a templated class, which would break your forward declaration. This is one reason why you should specifically never try to forward declare STL objects, for example, always do #include <string>, never attempt to forward declare std::string.(Note, however, that when using streams, you can use the iosfwd include file to forward declare various template classes in iostreams.)

Tip

Only forward declare symbols from your own API.

Finally, it is worth noting that a header file should declare all of its dependencies, either as forward declarations or as explicit #include lines. Otherwise, the inclusion of the header in other files may become order dependent. This is at odds with the statement to minimize the number of #include statements, but it is an important exception for the sake of API robustness and elegance. A good way to test this is to ensure that an empty .cpp file that only includes your public header can compile without error.

Tip

A header file should #include or forward declare all of its dependencies.

7.2.3 Redundant #include Guards

Another way to reduce the overhead of parsing too many include files is to add redundant preprocessor guards at the point of inclusion. For example, if you have an include file, bigfile.h, that looks like this

#ifndef BIGFILE_H

#define BIGFILE_H

// lots and lots of code

then you might include this file from another header by doing the following:

#ifndef BIGFILE_H

#include "bigfile.h"

#ifndef INTUITION_SCREENS_H

This saves the cost of pointlessly opening and parsing the entire include file if you’ve already included it. This may seem like a trivial optimization, and indeed it can be for small include hierarchies. However, if you have a large code base with many include files, this optimization can make a significant difference. Back in 1996, John Lakos performed several experiments to demonstrate the degree of performance improvements that this optimization can affect on a large project. The results were striking (Lakos, 1996). However, given that these results are from the mid-1990s, I designed a similar experiment to test this effect on a modern compiler, and the results correlate well with those of Lakos.

For a given number N, I generated N include files that each included the N − 1 other include files. Each include file also contained around 100 lines of class declarations. I also generated N.cpp files, where each .cpp file included only 1 header. I then timed how long it took to compile every .cpp file. This experiment therefore chooses a worst-case O(n²) include structure, although it also includes the time to run the compiler N times. The experiment was performed for a set of include files that used redundant include guards and a set that did not. Table 7.1 shows averaged results of this experiment using the GNU C++ compiler, version 4.2.1, on an Intel Core 2 Duo processor running Mac OS X 10.6 with 2 GB of RAM.

Table 7.1

Compilation time speedup from using redundant include guards for a worst-case include hierarchy containing N include files

This behavior will, of course, vary by compiler and platform, so in the interest of good experimental technique, I repeated the experiment with the Microsoft C++ compiler, version 14.0, on an Intel Core 2 Quad CPU running Windows XP with 3.25 GB of RAM. Results in this case were even more pronounced, with a speedup of around 18 times for the N = 128 case.

Tip

Consider adding redundant #include guards to your headers to optimize compile time for your clients.

By comparison, I found that the experiment showed almost no effect under Linux (1.03 × speedup for N = 128), where presumably the combination of the GNU compiler and the Linux disk cache produces a more efficient environment. However, users of your API may be using a broad range of platforms so this optimization could have a large impact on many of them. Even a speedup of only 1.29× could make a big difference to the amount of time that they spend waiting for a build to finish. The code to run the experiment is included in the full source code package on the accompanying Web site for this book so you can try it out for your own platforms.

This technique has been used to practical benefit by many large-scale APIs. To give a rather retro example, the Commodore Amiga platform used this technique to improve the performance of the AmigaOS APIs. For instance, here’s what the top of the intuition/screens.h header file looked like for the Amiga in the early 1990s:

#define INTUITION_SCREENS_H TRUE

** $Filename: intuition/screens.h $

** $Release: 2.04 Includes, V37.4 $

** $Revision: 36.36 $

** $Date: 91/10/07 $

** The Screen and NewScreen structures and attributes

#ifndef EXEC_TYPES_H

#include <exec/types.h>

#include <graphics/gfx.h>

#ifndef GRAPHICS_GFX_H

#include <graphics/clip.h>

#ifndef GRAPHICS_CLIP_H

#include <graphics/view.h>

#ifndef GRAPHICS_VIEW_H

#ifndef GRAPHICS_RASTPORT_H

#include <graphics/rastport.h>

#ifndef GRAPHICS_LAYERS_H

#include <graphics/layers.h>

#ifndef UTILITY_TAGITEM_H

#include <utility/tagitem.h>

const int MAX_NAME_LENGTH = 128;

7.3 Declaring Constants

Often you want to define a number of public constants for your API. This is a great technique for avoiding the proliferation of hardcoded values throughout your client’s code, such as maximum values or default strings. For example, you might declare several constants in the global scope of your header in this way.

const float LOG_2E = log2(2.71828183f);

const std::string LOG_FILENAME = "filename.log";

The issue to be aware of here is that only very simple constants for built-in types will be inlined by your C++ compiler. By default, any variable that you define in this way will cause your compiler to store space for the variable in every module that includes your header. In the aforementioned case, this will likely happen for both the float and the string constant. If you declare many constants and your API headers are included in many .cpp files, then this can cause bloat in the client’s .o object files and the final binary. The solution is to declare the constants as extern.

extern const int MAX_NAME_LENGTH;

extern const float LOG_2E;

extern const std::string LOG_FILENAME;

Then define the value of each constant in the accompanying .cpp file. In this way, the space for the variables is only allocated once. This also has the additional benefit of hiding actual constant values from the header file: they are implementation details after all.

A better way to do this is if you can declare the constants within a class. Then you can declare them as static const (so they will not count toward the per-object memory size).

static const int MAX_NAME_LENGTH;

static const int MAX_RECORDS;

static const std::string LOG_FILENAME;

const int MyAPI::MAX_NAME_LENGTH = 128;

You can then define the value for these constants in the associated .cpp file.

// myapi.cpp

const int MyAPI::MAX_RECORDS = 65525;

const std::string MyAPI::LOG_FILENAME = "filename.log";

You may think that this is a trivial optimization and will save an insignificant amount of space. However, at Linden Lab, we decided to clean up all of these instances in the Second Life Viewer source code. The end effect was an appreciable 10% reduction in the size of the generated files.

Another option to avoid the bloat issue in certain cases is to use enums as an alternative to variables or you could also use getter methods to return the constant values, such as

static int GetMaxNameLength();

static int GetMaxRecords();

static std::string GetLogFilename();

char name1[MyAPI::MAX_NAME_LENGTH]; // illegal

Tip

Declare global scope constants with extern or declare constants in classes as static const. Then define the value of the constant in the .cpp file. This reduces the size of object files for modules that include your headers. Even better, hide these constants behind a function call.

7.3.1 The New constexpr Keyword

The problem with all of the options just listed is that the compiler can no longer evaluate the const expression at compile time because the actual value is hidden in the .cpp file. For example, clients would not be able to do either of the following:

char name2[MyAPI::GetMaxNameLength()]; // illegal

However, as part of the new C++11 specification, a new feature has been added to allow for more aggressive compile-time optimizations: the constexpr keyword. This can be used to indicate functions or variables that you know to be constant so that the compiler can perform greater optimization. For example, consider the following code:

int GetTableSize(int elems) { return elems * 2; }

double myTable[GetTableSize(2)]; // illegal in C++98

This is illegal according to the C++98 standard because the compiler has no way of knowing that the value returned by GetTableSize() is a compile-time constant. However, under the new C++ specification you would be able to tell the compiler that this is in fact the case:

constexpr int GetTableSize(int elems) { return elems * 2; }

double myTable[GetTableSize(2)]; // legal in C++11

The constexpr keyword can also be applied to variables. However, the fact that it can be used to mark the result of a function as a compile-time constant opens up the door to letting us define constants using a function call while still allowing clients to use the constant value at compile time. For example,

// myapi.h (C++11 only)

class MyAPI

constexpr int GetMaxNameLength() { return 128; }

constexpr int GetMaxRecords() { return 65525; }

constexpr std::string GetLogFilename() { return "filename.log";}

Avatar(const std::string &first, const std::string &last)

7.4 Initialization Lists

C++ provides constructor initialization lists to let you easily initialize all of the member variables in your class. Using this feature can afford a slight performance increase over simply initializing each member variable in the body of the constructor. For example, instead of writing

std::string mFirstName;

std::string mLastName;

you could write

Avatar(const std::string &first, const std::string &last) :

mFirstName(first),

std::string mFirstName;

std::string mLastName;

Avatar(const std::string &first, const std::string &last);

Because member variables are constructed before the body of the constructor is called, in the first example the default std::string constructor will be called to initialize the mName member variable, then inside the constructor the assignment operator is called (DeLoura, 2001). However, in the second example, only the copy constructor is invoked. Using an initialization list therefore avoids the cost of calling the default constructor for each member variable that you include in the list.

Of course, in terms of good API design, you should hide as much implementation detail from your header files as possible. So the best approach is to define the constructor in your header file as follows:

std::string mFirstName;

std::string mLastName;

Avatar::Avatar(const std::string &first, const std::string &last) :

Then provide the constructor and initialization list in the associated .cpp file:

// avatar.cpp

Tip

Use constructor initialization lists to avoid the cost of a constructor call for each data member, but declare these in the .cpp file to hide implementation details.

Here are a few things to be aware of when using initialization lists:

1. The order of variables in the initialization list should match the order that the variables are specified in the class.

2. You cannot specify arrays in an initialization list. However, you can specify a std::vector, which may be a better choice of data structure anyway.

3. If you are declaring a derived class, the default constructor for any base classes will be called implicitly. You can use the initialization list to call a non-default constructor instead. If specified, a call to a base class constructor must appear before any member variables.

4. If you have declared any of your member variables as references or as const, then you must initialize them via the initialization list (to avoid the default constructor defining their initial, and only, value).

Furthermore, the new C++11 specification includes improvements in object construction that are relevant here. In C++98, constructors cannot call other constructors. However, this constraint has been loosened in the C++11 standard. Specifically, C++11 will let constructors call other constructors in the same class. This lets you avoid copying code between constructors by delegating the implementation of one constructor to another, such as in the following example:

class MyClass

MyClass(int answer) : mAnswer(answer) {}

MyClass() : MyClass(42) {} // legal in C++11

int mAnswer;

On modern CPUs, memory latency can be one of the largest performance concerns for a large application. That’s because while processor speeds have been improving at a rate of roughly 55% per year, access times for DRAM have been improving at around 7% per year (Hennessy and Patterson, 2006). This has resulted in the so-called Processor–Memory Performance Gap, as shown in Figure 7.1.

While the same effect could be achieved by using a single constructor with a default argument, it would bake the default value into your client’s code. The new C++11 syntax lets you hide this value because you can (and should) define the initialization list in the .cpp file.

7.5 Memory Optimization

Figure 7.1 The widening gap in CPU performance improvements versus memory improvements. Note that the vertical axis is on a logarithmic scale. Adapted from Hennessy and Patterson (2006). Copyright © Morgan Kaufmann Publishers.

Because of this trend, memory overhead is now a principal factor in the execution time of most programs. This is further exacerbated by the fact that the cost for a cache miss, that is, the cost to access main memory, has increased from a few CPU cycles 30 years ago to over 400 cycles on modern architectures. The effect of this is that a seemingly elegant and demonstrably correct algorithm can behave poorly in real-world situations due to unanticipated cache behavior (Albrecht, 2009). As a result, cache-miss optimization has become an extremely important element of performance optimization activities in recent years.

While it is not the focus of this book to provide techniques to optimize your implementation details in the presence of caches, there are some API-related efforts that you can undertake to improve data cache efficiency. One key technique is to reduce the size of your objects: the smaller your objects are, the more of them can potentially fit into a cache. There are several ways that you can reduce the size of your objects.

1. Cluster member variables by type. Modern computers access memory a single “word” at a time. Your C++ compiler will therefore align certain data members so that their memory addresses fall on word boundaries. A number of unused padding bytes may be added to a structure in order to make this happen. By clustering all member variables of the same type next to each other, you can minimize the amount of memory lost to these padding bytes. Table 7.2 provides example alignment figures for member variable on the Windows platform.

Table 7.2

Typical alignment of member variables of different types under Windows on x86 CPUs (these sizes may vary by platform and processor)

Type	Size (bytes)	Alignment (bytes)
bool	1	1
char	1	1
short int	2	2
int	4	4
float	4	4
double	8	8
pointer/reference (32 bit)	4	4
pointer/reference (64 bit)	8	8

2. Use bit fields. A bit field is a decorator for a member variable that specifies how many bits the variable should occupy, for example, int tinyInt:4. This is particularly useful for packing several bools into a single byte or for squeezing two or more numbers into the space of a single int. The downside is that there is normally a performance penalty for using bit field sizes that are not a multiple of 8, but if memory is your biggest concern then this may be an acceptable cost. When implementing performance optimizations, you often have to trade speed for size, or vice versa. Remember, when in doubt about the impact of a feature, measure the real-world performance.

3. Use unions. A union is a structure where data members share the same memory space. This can be used to allow multiple values that are never used at the same time to share the same area of memory, thus saving space. The size of a union is the size of the largest type in the union. For example,

4. Don’t add virtual methods until you need them. I recommended this as a way to keep an API minimally complete back in Chapter 2, but there are also performance reasons to do this. Once you add one virtual method to a class, that class needs to have a vtable. Only one copy of the vtable needs to be allocated per class type, but a pointer to the vtable is stored in every instance of your object. This adds the size of one pointer to your overall object size (normally 4 bytes for a 32-bit application or 8 bytes for a 64-bit application).

5. Use explicit size-based types. The size of various types can differ by platform, compiler, and whether you are building a 32-bit or a 64-bit application. If you want to specify the exact size of a member variable, then you should use a type that specifically enforces this rather than assuming that types such as bool, short, or int will be a specific size. Unfortunately, the way to declare a fixed-size variable varies for different platforms. For example, on UNIX-based systems, the stdint.h header file provides types such as int8_t, uint32_t, and int64_t to specify an 8-bit integer, 32-bit unsigned integer, and a 64-bit integer, respectively. However, the Boost library provides platform-independent versions of these types in the boost/cstdint.hpp header.

Let’s look at an example. The following structure defines a collection of variables to describe a fireworks effect. It contains information about the color and color variance of the firework particles, some flags such as whether the effect is currently active, and a screen location for the effect to begin from. A real fireworks class would have a lot more state, but this is sufficient for illustration purposes.

The variables in this class are ordered roughly in terms of their logical function, without any consideration to how efficiently they are packed in terms of memory. Most member variables are ordered in this way, or sometimes even more randomly by simply adding new variables to the end of the class. For this particular example, the total size for the structure on a 32-bit computer is 48 bytes, that is, sizeof(Fireworks_A) == 48.

If you simply cluster the member variables based on their type and sort them based on the size of each type (bools, chars, then ints), then the size of the structure can be reduced to 32 bytes, a 33% reduction.

Tip

Cluster member variables by their type to optimize object size.

You can still squeeze a few more bytes out of the structure, however. By using bit fields you can make each of the bool flags occupy a single bit instead of an entire byte. Doing this lets you get the structure size down to 28 bytes, a 42% reduction.

bool mFadeParticles:1;

2. Code embedded in client applications. Inlined code in your API headers is compiled directly into your clients’ applications. This means that clients must recompile their code whenever you release a new version of the API with any changes to inlined code. They cannot simply drop a new version of your shared library into their installation and expect their application to just work. In other words, inlining breaks binary compatibility.

Tip

Consider the use of bit fields to compress an object further, but be aware of their performance impact.

Finally, you can step back and consider the actual size requirements for each variable rather than simply using ints for all integer values. Doing so, you might decide that the RGB variance values only need to be 1 byte, and the screen space coordinates can be 2 bytes each. I’ll use the types of char and short, respectively, to simplify what follows, but in reality you could use size-specific types such as int8_t and uint16_t.

bool mFadeParticles:1;

Tip

Use size-specific types, such as int32_t or uint16_t, to specify the maximum number of required bits for a variable.

This version of our structure occupies only 16 bytes: a reduction of 66% from the original unoptimized size of 48 bytes. That’s quite a massive memory savings, just by rearranging the member variables and thinking a little more about how large they need to be. Figure 7.2 shows you exactly where all your memory is going in each of these four configurations.

Figure 7.2 Memory layout for four different configurations of member variables in a class. Fireworks_A is the original unoptimized version, Fireworks_B employs type clustering, Fireworks_C uses bit fields to compress bool variables, and Fireworks_D uses smaller integer types.

7.6 Don’t Inline Until You Need To

This may seem like strange advice to give in a chapter on performance: don’t inline code! However, this is first and foremost a book about APIs, and inlining code in your header files breaks one of the cardinal rules of API design: don’t expose implementation details, which is not to say that you should never inline code. Sometimes your performance requirements demand it, but you should do so with your eyes wide open and with a full understanding of the implications, such as the following.

1. Exposing implementation details. As just covered, the primary reason for avoiding inlining in public API headers is that it causes you to expose the implementation of your API methods directly in the header. I spent an entire section in Chapter 2 detailing why you should not do that.

3. Code bloat. Excessive inlining can grow the size of your object files and resulting binary significantly. This is, of course, because each call to an inlined method is replaced by all of the operations of that method. This larger code size can negatively impact performance by causing more disk access and virtual-memory page faults.

4. Debugging complications. Many debuggers have problems dealing with inlined code. This is perfectly understandable: it’s difficult to put a breakpoint in a function that doesn’t actually exist! The common way to circumvent these problems is to turn off inlining for debug code.

As Donald Knuth famously stated: “Premature optimization is the root of all evil” (Knuth, 1974).

Despite these downsides, there still may be cases where you need to put inlined code into your API’s public headers. The two main reasons to do this are as follows.

1. Performance. Using getter/setter methods to wrap access to member variables can cause a performance impact in your code if those methods are called many times a second. Inlining can recover those performance losses while still allowing you to keep the getter/setter methods (the accompanying source code for this book has a simple program to let you test this for yourself). However, it should be pointed out that marking a function as inline may not necessarily give you performance gains. For one, this is simply a hint to the compiler that can be ignored. Some situations where the request is likely to be ignored are: using loops in the function, calling another inline function, or recursion. Even when the compiler does inline the method, the resulting code could be larger or smaller, and it could be faster or slower, depending on the original size of the method, your CPU’s instruction cache, and your virtual memory system (Cline et al., 1998). Inlining tends to work best for small, simple, frequently called functions.

2. Templates. You may also be using templates in your header and so are forced to inline the template implementation. Although as covered in the previous chapter on C++ usage, you can sometimes use explicit template instantiation to avoid this.

Tip

Avoid using inlined code in your public headers until you have proven that your code is causing a performance problem and confirmed that inlining will fix that problem.

For those cases where you need to use inlined code, I will discuss the best way to do it. One way to inline code is simply to include the implementation of a method in the class body, such as

class Vector

double GetX() const { return mX; }

double GetY() const { return mY; }

double GetZ() const { return mZ; }

void SetX(double x) { mX = x; }

void SetY(double y) { mY = y; }

void SetZ(double z) { mZ = z; }

double mX, mY, mZ;

inline void Vector::SetX(double x) { mX = x; }

This perfectly demonstrates the concern of including implementation details in your header files: a user of your API can look at your header and see exactly how an inlined method is implemented. In this example, the implementation is simple, but it could easily expose much more complexity.

Another way to inline code is by using the C++ inline keyword. This approach offers at least one improvement over the syntax I’ve just shown in that it lets you define the code for a method outside of the class body. Although the code is still in the header, at least you don’t obfuscate the class declaration with code.

inline void Vector::SetY(double y) { mY = y; }

inline void Vector::SetZ(double z) { mZ = z; }

inline double Vector::GetX() const { return mX; }

inline double Vector::GetY() const { return mY; }

inline double Vector::GetZ() const { return mZ; }

An even better style would be to hide the inline statements in a separate header, where the filename of that header indicates that it contains implementation details. This is the same technique suggested earlier for dealing with templates and is used by several industry-strength APIs, such as Boost headers. Boost uses the convention of a “detail” subdirectory to hold all private details that have to be exposed in header files and then #include those from the public header files. For example,

#include "detail/Vector.h"

Boost headers also often use the convention of using a “detail” subnamespace to contain all private implementation code, for example, boost::tuples::detail. This is a good practice to further segment necessary private code in public headers.

7.7 Copy on Write

One of the best ways to save memory is to not allocate any until you really need to. This is the essential goal of copy-on-write techniques. These work by allowing all clients to share a single resource until one of them needs to modify the resource. Only at that point is a copy made. Hence the name: copy on write. The advantage is that if the resource is never modified then it can be shared for all clients. This is related to the Flyweight design pattern, which describes objects that share as much memory as possible to minimize memory consumption (Gamma et al., 1994).

Tip

Use copy-on-write semantics to reduce the memory cost for many copies of your objects.

For example, several string objects that store the same text could all share the same memory buffer. Then when one of the strings must modify the text, it creates a copy of the memory buffer so that the modification does not affect the other strings. Figure 7.3 illustrates this concept. Most STL string implementations use copy on write so that passing them by value is fairly inexpensive (Stroustrup, 2000).

Figure 7.3 Illustrating copy on write: both Object1 and Object2 can share the same resource until Object2 wants to change it, at which point a copy must be made so that the changes do not affect Object1’s state.

There are several ways to implement copy on write. One popular way is to declare a class template that lets you create pointers to objects managed with copy-on-write semantics in the same way that you would create a shared or weak pointer template. The class normally contains a standard shared pointer, to track the reference count of the underlying object, and provides a private Detach() method for operations that modify the object and hence require detaching from the shared object and creating a new copy. The following implementation uses C++11 shared pointers. This is shown with inline methods within the class declaration for reasons of clarity. In real-world applications, you should of course hide these inline definitions in a separate header as they obscure the interface declarations. I have hidden the implementation code in this way for the version of the class in the source code package that accompanies this book.

#include <memory>

template <class T> class CowPtr

typedef std::shared_ptr<T> RefPtr;

inline CowPtr() : mPtr(0) {}

inline ~CowPtr() {}

inline explicit CowPtr(T *other) : mPtr(other) {}

inline CowPtr(const CowPtr<T> &other) : mPtr(other.mPtr) {}

inline T &operator*()

inline const T &operator*() const

return *mPtr.get();

inline const T *operator->() const

inline T *operator->()

return mPtr.get();

inline operator const T *() const

inline operator T *()

inline const T *data() const

return mPtr.get();

inline const T *constData() const

return mPtr.get();

inline bool operator==(const CowPtr<T> &other) const

return mPtr.get() == other.mPtr.get();

inline bool operator!=(const CowPtr<T> &other) const

return mPtr.get() != other.mPtr.get();

inline bool operator!() const

return !mPtr.get();

inline CowPtr<T> & operator=(const CowPtr<T> &other)

if (other.mPtr != mPtr) {

inline CowPtr &operator=(T *other)

if (temp && ! mPtr.unique()) {

mPtr = RefPtr(other);

T* temp = mPtr.get();

mPtr = RefPtr(new T(*temp));

RefPtr mPtr;

This class can then be used as follows:

CowPtr<std::string> string1(new std::string("Share Me"));

CowPtr<std::string> string2(string1);

CowPtr<std::string> string3(string1);

string3->append("!");

In this example, string2 points to the same object as string1, whereas string3 points to a copy of the object because it needed to modify it. As already mentioned, many implementations of std::string will use copy-on-write semantics anyway. I’m simply using this as a convenient example.

There is a loophole that can be exploited in the CowPtr implementation I’ve presented here. It’s possible for users to dig into the copy-on-write pointer and access the underlying object in order to hold onto references to its data. They could then modify data directly, thus affecting all CowPtr variables sharing that object. For example,

CowPtr<std::string> string1(new std::string("Share Me"));

char &char_ref = string1->operator[](1);

CowPtr<std::string> string2(string1);

char_ref = ’p’;

In this code, the user takes a reference to a character in the underlying std::string of string1. After string2 is created, which shares the same memory as string1, the user then changes the second character in the shared string directly, causing both string1 and string2 to now equal “Spare Me.”

The best way to avoid this sort of misuse is simply to not expose CowPtr to your clients. In most cases, you don’t need your clients to be aware of the fact that you are using a copy-on-write optimization: it’s an implementation detail after all. Instead, you could use CowPtr to declare member variables in your objects and not change your public API in any way. This is called implicit sharing by the Qt library. For example,

std::string GetValue() const;

void SetValue(const std::string &value);

CowPtr<std::string> mData;

where the implementation of MyObject may look like

// myobject.cpp

MyObject::MyObject() : mData(0) {}

std::string MyObject::GetValue() const

return (mData) ? *mData : "";

void MyObject::SetValue(const std::string &value)

mData = new std::string(value);

In this way, your clients can use your MyObject API without any knowledge that it uses copy on write, but underneath the covers the object is sharing memory whenever possible and enabling more efficient copy and assignment operations.

MyObject obj1;

obj1.SetValue("Hello");

MyObject obj2 = obj1;

std::string val = obj2.GetValue();

MyObject obj3 = obj1;

obj3.SetValue("There");

In this example, obj1 and obj2 will share the same underlying string object, whereas obj3 will contain its own copy because it modified the string.

7.8 Iterating Over Elements

Iterating over a collection of objects is an extremely common task for client code so it’s worth spending some time looking at alternative strategies that offer different strengths and weaknesses. That way you can choose the best solution for your particular API requirements.

7.8.1 Iterators

The STL approach to this problem is to use iterators. These are objects that can traverse over some or all elements in a container class (Josuttis, 1999). An iterator points to a single element in a container, with various operators available, such as operator* to return the current element, operator-> to access the members of the container element directly, and operator++ to step forward to the next element. This design intentionally mimics the interface of plain pointer manipulation in C/C++.

Clients can then use the begin() and end() methods on each container class to return iterators that bound all elements in the container or they can use various STL algorithms that return iterators within the set of all elements, such as std::find(), std::lower_bound(), and std::upper_bound(). The following code segment provides a simple example of using an STL iterator to sum all the values in a std::vector:

float sum = 0.0f;

std::vector<float>::const_iterator it;

for (it = values.begin(); it != values.end(); ++it)

sum += *it;

This is purely an illustrative example. If you really wanted to calculate the sum of all elements in a container, you should prefer the use of the STL algorithm std::accumulate.

In terms of your own API designs, here are some reasons why you may want to adopt an iterator model to allow your clients to iterate over data.

• Iterators are a well-known pattern that most engineers are already familiar with. As such, using an iterator model in your own APIs will minimize the learning curve for users. This addresses the ease-of-use quality introduced in Chapter 2. For example, most engineers will already be aware of any performance issues, such as knowing that they should prefer the preincrement operator for iterators (++it) as opposed to postincrement (it++), to avoid the construction and destruction of temporary variables.

• The iterator abstraction can be applied to simple sequential data structures, such as arrays or lists, as well as more complicated data structures, such as sets and maps, which are often implemented as self-balancing binary search trees such as red-black trees (Josuttis, 1999).

• Iterators can be implemented quite efficiently, even as simply as a pointer in some cases. In fact, std::vector iterators were actually implemented this way in Visual C++ 6 and GNU C++ 3 (although most modern STL implementations now use dedicated iterator classes).

• Iterators can be used to traverse massive data sets that may not even fit entirely into memory. For example, the iterator could be implemented to page in blocks of data from disk as needed and free previously processed blocks. Of course, the client can also stop traversing at any point without having to visit every element in the container.

• Clients can create multiple iterators to traverse the same data and use these iterators simultaneously. In the case where clients wish to insert or delete elements while traversing the container, there are established patterns for doing this while maintaining the integrity of the iterators.

7.8.2 Random Access

An iterator allows clients to traverse linearly through each element in a container. However, you may have cases where you wish to support random access to any element, such as accessing a specific element in an array or vector container. STL container classes that support random accesses provide this in a couple of ways.

1. The [] operator. This is meant to simulate the array indexing syntax of C/C++. Normally this operator is implemented without any bounds checking so that it can be made very efficient.

2. The at() method. This method is required to check if the supplied index is out of range and throw an exception in this case. As a result, this approach can be slower than the [] operator.

To illustrate these concepts, the iterator source example in the previous section can be recast in terms of the [] operator as follows:

float sum = 0.0f;

const size_t len = values.size();

for (size_t it = 0; it < len; ++it)

sum += values[it];