C++ Templates The Complete Guide

Chapter 14 Instantiation

Template instantiation is the process that generates types, functions, and variables from generic template definitions.1 The concept of instantiation of C++ templates is fundamental but also somewhat intricate. One of the underlying reasons for this intricacy is that the definitions of entities generated by a template are no longer limited to a single location in the source code. The location of the template, the location where the template is used, and the locations where the template arguments are defined all play a role in the meaning of the entity.

In this chapter we explain how we can organize our source code to enable proper template use. In addition, we survey the various methods that are used by the most popular C++ compilers to handle template instantiation. Although all these methods should be semantically equivalent, it is useful to understand basic principles of the compiler’s instantiation strategy. Each mechanism comes with its set of little quirks when building real-life software and, conversely, each influenced the final specifications of standard C++.

14.1 On-Demand Instantiation

When a C++ compiler encounters the use of a template specialization, it will create that specialization by substituting the required arguments for the template parameters.2 This is done automatically and requires no direction from the client code (or from the template definition, for that matter). This on-demand instantiation feature sets C++ templates apart from similar facilities in other early compiled languages (like Ada or Eiffel; some of these languages require explicit instantiation directives, whereas others use run-time dispatch mechanisms to avoid the instantiation process altogether). It is sometimes also called implicit or automatic instantiation.

On-demand instantiation implies that the compiler often needs access to the full definition (in other words, not just the declaration) of the template and some of its members at the point of use. Consider the following tiny source code file:

Click here to view code image

template<typename T> class C;  // #1 declaration only
C<int>* p = 0;                 // #2 fine: definition of C<int> not needed

template<typename T>
class C {
  public:
    void f();                  // #3 member declaration
};                             // #4 class template definition completed

void g (C<int>& c)             // #5 use class template declaration only
{
   c.f();                      // #6 use class template definition;
}                              //    will need definition of C::f()
                               //    in this translation unit
template<typename T>
void C<T>::f()                 //required definition due to #6
{
}

At point #1 in the source code, only the declaration of the template is available, not the definition (such a declaration is sometimes called a forward declaration). As is the case with ordinary classes, we do not need the definition of a class template to be visible to declare pointers or references to this type, as was done at point #2 . For example, the type of the parameter of function g() does not require the full definition of the template C. However, as soon as a component needs to know the size of a template specialization or if it accesses a member of such a specialization, the entire class template definition is required to be visible. This explains why at point #6 in the source code, the class template definition must be seen; otherwise, the compiler cannot verify that the member exists and is accessible (not private or protected). Furthermore, the member function definition is needed too, since the call at point #6 requires C<int>::f() to exist.

Here is another expression that needs the instantiation of the previous class template because the size of C<void> is needed:

C<void>* p = new C<void>;

In this case, instantiation is needed so that the compiler can determine the size of C<void>, which the new-expression needs to determine how much storage to allocate. You might observe that for this particular template, the type of the argument X substituted for T will not influence the size of the template because in any case, C<X> is an empty class. However, a compiler is not required to avoid instantiation by analyzing the template definition (and all compilers do perform the instantiation in practice). Furthermore, instantiation is also needed in this example to determine whether C<void> has an accessible default constructor and to ensure C<void> does not declare member operators new or delete.

The need to access a member of a class template is not always very explicitly visible in the source code. For example, C++ overload resolution requires visibility into class types for parameters of candidate functions:

Click here to view code image

template<typename T>
class C {
  public:
    C(int);        // a constructor that can be called with a single parameter
};                 // may be used for implicit conversions

void candidate(C<double>);  // #1
void candidate(int) { }     // #2

int main()
{
    candidate(42);  // both previous function declarations can be called
}

The call candidate(42) will resolve to the overloaded declaration at point #2 . However, the declaration at point #1 could also be instantiated to check whether it is a viable candidate for the call (it is in this case because the one-argument constructor can implicitly convert 42 to an rvalue of type C<double>). Note that the compiler is allowed (but not required) to perform this instantiation if it can resolve the call without it (as could be the case in this example because an implicit conversion would not be selected over an exact match). Note also that the instantiation of C<double> could trigger an error, which may be surprising.

14.2 Lazy Instantiation

The examples so far illustrate requirements that are not fundamentally different from the requirements when using nontemplate classes. Many uses require a class type to be complete (see Section 10.3.1 on page 154). For the template case, the compiler will generate this complete definition from the class template definition.

A pertinent question now arises: How much of the template is instantiated? A vague answer is the following: Only as much as is really needed. In other words, a compiler should be “lazy” when instantiating templates. Let’s look at exactly what this laziness entails.

14.2.1 Partial and Full Instantiation

As we have seen, the compiler sometimes doesn’t need to substitute the complete definition of a class or function template. For example:

Click here to view code image

template<typename T> T f (T p) { return 2*p; }
decltype(f(2)) x = 2;

In this example, the type indicated by decltype(f(2)) does not require the complete instantiation of the function template f(). A compiler is therefore only permitted to substitute the declaration of f(), but not its “body.” This is sometimes called partial instantiation.

Similarly, if an instance of a class template is referred to without the need for that instance to be a complete type, the compiler should not perform a complete instantiation of that class template instance. Consider the following example:

Click here to view code image

template<typename T> class Q {
using Type = typename T::Type;
};

Q<int>* p = 0; // OK: the body of Q<int> is not substituted

Here, the full instantiation of Q<int> would trigger an error, because T::Type doesn’t make sense when T is int. But because Q<int> need not be complete in this example, no full instantiation is performed and the code is okay (albeit suspicious).

Variable templates also have a “full” vs. “partial” instantiation distinction. The following example illustrates it:

Click here to view code image

template<typename T> T v = T::default_value();
decltype(v<int>) s; // OK: initializer of v<int> not instantiated

A full instantiation of v<int> would elicit an error, but that is not needed if we only need the type of the variable template instance.

Interestingly, alias templates do not have this distinction: There are no two ways of substituting them.

In C++, when speaking about “template instantiation” without being specific about full or partial instantiation, the former is intended. That is, instantiation is full instantiation by default.

14.2.2 Instantiated Components

When a class template is implicitly (fully) instantiated, each declaration of its members is instantiated as well, but the corresponding definitions are not (i.e., the member are partially instantiated). There are a few exceptions to this. First, if the class template contains an anonymous union, the members of that union’s definition are also instantiated.3 The other exception occurs with virtual member functions. Their definitions may or may not be instantiated as a result of instantiating a class template. Many implementations will, in fact, instantiate the definition because the internal structure that enables the virtual call mechanism requires the virtual functions actually to exist as linkable entities.

Default function call arguments are considered separately when instantiating templates. Specifically, they are not instantiated unless there is a call to that function (or member function) that actually makes use of the default argument. If, on the other hand, the function is called with explicit arguments that override the default, then the default arguments are not instantiated.

Similarly, exception specifications and default member initializers are not instantiated unless they are needed.

Let’s put together some examples that illustrate some of these principles:

Click here to view code image

details/lazy1.hpp

template<typename T>
class Safe {
};

template<int N>
class Danger {
    int arr[N];                    // OK here, although would fail for N<=0
};

template<typename T, int N>
class Tricky {
  public:
    void noBodyHere(Safe<T> = 3);  // OK until usage of default value results in an error
    void inclass() {
        Danger<N> noBoomYet;       // OK until inclass() is used with N<=0
    }
    struct Nested {
        Danger<N> pfew;            // OK until Nested is used with N<=0
    };
    union {                        //due anonymous union:
        Danger<N> anonymous;       // OK until Tricky is instantiated with N<=0
        int align;
    };
    void unsafe(T (*p)[N]);        // OK until Tricky is instantiated with N<=0
    void error() {
        Danger<-1> boom;           // always ERROR (which not all compilers detect)
    }
};

A standard C++ compiler will examine these template definitions to check the syntax and general semantic constraints. While doing so, it will “assume the best” when checking constraints involving template parameters. For example, the parameter N in the member Danger::arr could be zero or negative (which would be invalid), but it is assumed that this isn’t the case.4 The definitions of inclass(), struct Nested, and the anonymous union are thus not a problem.

For the same reason, the declaration of the member unsafe(T (*p)[N]) is not a problem, as long as N is an unsubstituted template parameter.

The default argument specification (= 3) on the declaration of the member noBodyHere() is suspicious because the template Safe<> isn’t initializable with an integer, but the assumption is that either the default argument won’t actually be needed for the generic definition of Safe<T> or that Safe<T> will be specialized (see Chapter 16) to enable initialization with an integer value. However, the definition of the member function error() is an error even when the template is not instantiated, because the use of Danger<-1> requires a complete definition of the class Danger<-1>, and generating that class runs into an attempt to define an array with negative size. Interestingly, while the standard clearly states that this code is invalid, it also allows compilers not to diagnose the error when the template instance is not actually used. That is, since Tricky<T,N>::error() is not used for any concrete T and N, a compiler is not required to issue an error for this case. For example, GCC and Visual C++ do not diagnose this error at the time of this writing.

Now let’s analyze what happens when we add the following definition:

Tricky<int, -1> inst;

This causes the compiler to (fully) instantiate Tricky<int, -1> by substituting int for T and -1 for N in the definition of template Tricky<>. Not all the member definitions will be needed, but the default constructor and the destructor (both implicitly declared in this case) are definitely called, and hence their definitions must be available somehow (which is the case in our example, since they are implicitly generated). As explained above, the members of Tricky<int, -1> are partially instantiated (i.e., their declarations are substituted): That process can potentially result in errors. For example, the declaration of unsafe(T (*p)[N]) creates an array type with a negative of number elements, and that is an error. Similarly, the member anonymous now triggers an error, because type Danger<-1> cannot be completed. In contrast, the definitions of the members inclass() and struct Nested are not yet instantiated, and thus no errors occur from their need for the complete type Danger<-1> (which contains an invalid array definition as we discussed earlier).

As written, when instantiating a template, in practice, the definitions of virtual members should also be provided. Otherwise, linker errors are likely to occur. For example:

Click here to view code image

details/lazy2.cpp

template<typename T>
class VirtualClass {
  public:
    virtual ~VirtualClass() {}
    virtual T vmem();  // Likely ERROR if instantiated without definition
};

int main()
{
   VirtualClass<int> inst;
}

Finally, a note about operator->. Consider:

template<typename T>
class C {
public:
T operator-> ();
};

Normally, operator-> must return a pointer type or another class type to which operator-> applies. This suggests that the completion of C<int> triggers an error, because it declares a return type of int for operator->. However, because certain natural class template definitions trigger these kinds of definitions,5 the language rule is more flexible. A user-defined operator-> is only required to return a type to which another (e.g., built-in) operator-> applies if that operator is actually selected by overload resolution. This is true even outside templates (although the relaxed behavior is less useful in those contexts). Hence, the declaration here triggers no error, even though int is substituted for the return type.

14.3 The C++ Instantiation Model

Template instantiation is the process of obtaining a regular type, function, or variable from a corresponding template entity by appropriately substituting the template parameters. This may sound fairly straightforward, but in practice many details need to be formally established.

14.3.1 Two-Phase Lookup

In Chapter 13 we saw that dependent names cannot be resolved when parsing templates. Instead, they are looked up again at the point of instantiation. Nondependent names, however, are looked up early so that many errors can be diagnosed when the template is first seen. This leads to the concept of two-phase lookup:⁶ The first phase is the parsing of a template, and the second phase is its instantiation:

1. During the first phase, while parsing a template, nondependent names are looked up using both the ordinary lookup rules and, if applicable, the rules for argument-dependent lookup (ADL). Unqualified dependent names (which are dependent because they look like the name of a function in a function call with dependent arguments) are looked up using the ordinary lookup rules, but the result of the lookup is not considered complete until an additional lookup is performed in the second phase (when the template is instantiated).

2. During the second phase, while instantiating a template at a point called the point of instantiation (POI), dependent qualified names are looked up (with the template parameters replaced with the template arguments for that specific instantiation), and an additional ADL is performed for the unqualified dependent names that were looked up using ordinary lookup in the first phase.

For unqualified dependent names, the initial ordinary lookup—while not complete—is used to decide whether the name is a template. Consider the following example:

Click here to view code image

namespace N {
  template<typename> void g() {}
  enum E { e };
}

template<typename> void f() {}

template<typename T> void h(T P) {
  f<int>(p);  // #1
  g<int>(p);  // #2 ERROR
}

int main() {
  h(N::e);    // calls template h with T = N::E
}

In line #1 , when seeing the name f followed by a <, the compiler has to decide whether that < is an angle bracket or a less-than sign. That depends on whether f is known to be the name of a template or not; in this case, ordinary lookup finds the declaration of f, which is indeed a template, and so parsing succeeds with angle brackets.

Line #2 , however, produces an error because no template g is found using ordinary lookup; the < is thus treated as a less-than sign, which is a syntax error in this example. If we could get past this issue, we’d eventually find the template N::g using ADL when instantiating h for T = N::E (since N is a namespace associated with E), but we cannot get that far until we successfully parse the generic definition of h.

14.3.2 Points of Instantiation

We have already illustrated that there are points in the source of template clients where a C++ compiler must have access to the declaration or the definition of a template entity. A point of instantiation (POI) is created when a code construct refers to a template specialization in such a way that the definition of the corresponding template needs to be instantiated to create that specialization. The POI is a point in the source where the substituted template could be inserted. For example:

Click here to view code image

class MyInt {
  public:
    MyInt(int i);
};

MyInt operator - (MyInt const&);

bool operator > (MyInt const&, MyInt const&);

using Int = MyInt;

template<typename T>
void f(T i)
{
    if (i>0) {
        g(-i);
    }
}
// #1
void g(Int)
{
    // #2
    f<Int>(42);  // point of call
    // #3
}
// #4

When a C++ compiler sees the call f<Int>(42), it knows the template f will need to be instantiated for T substituted with MyInt: A POI is created. Points #2 and #3 are very close to the point of call, but they cannot be POIs because C++ does not allow us to insert the definition of ::f<Int>(Int) there. The essential difference between point #1 and point #4 is that at point #4 the function g(Int) is visible, and hence the template-dependent call g(-i) can be resolved. However, if point #1 were the POI, then that call could not be resolved because g(Int) is not yet visible. Fortunately, C++ defines the POI for a reference to a function template specialization to be immediately after the nearest namespace scope declaration or definition that contains that reference. In our example, this is point #4 .

You may wonder why this example involved the type MyInt rather than simple int. The answer lies in the fact that the second lookup performed at the POI is only an ADL. Because int has no associated namespace, the POI lookup would therefore not take place and would not find function g. Hence, if we were to replace the type alias declaration for Int with

using Int = int;

the previous example would no longer compile. The following example suffers from a similar problem:

Click here to view code image

template<typename T>
void f1(T x)
{
g1(x); // #1
}

void g1(int)
{
}

int main()
{
f1(7); // ERROR: g1 not found!
}
// #2 POI for f1<int>(int)

The call f1(7) creates a POI for f1<int>(int) just outside of main() at point #2 . In this instantiation, the key issue is the lookup of function g1. When the definition of the template f1 is first encountered, it is noted that the unqualified name g1 is dependent because it is the name of a function in a function call with dependent arguments (the type of the argument x depends on the template parameter T). Therefore, g1 is looked up at point #1 using ordinary lookup rules; however, no g1 is visible at this point. At point #2 , the POI, the function is looked up again in associated namespaces and classes, but the only argument type is int, and it has no associated namespaces and classes. Therefore, g1 is never found even though ordinary lookup at the POI would have found g1.

The point of instantiation for variable templates is handled similarly to that of function templates.⁷ For class template specializations, the situation is different, as the following example illustrates:

Click here to view code image

template<typename T>
class S {
  public:
    T m;
};
// #1
unsigned long h()
{
    // #2
    return (unsigned long)sizeof(S<int>);
    // #3
}
// #4

Again, the function scope points #2 and #3 cannot be POIs because a definition of a namespace scope class S<int> cannot appear there (and templates can generally not appear in function scope 8). If we were to follow the rule for function template instances, the POI would be at point #4 , but then the expression sizeof(S<int>) is invalid because the size of S<int> cannot be determined until point #4 is reached. Therefore, the POI for a reference to a generated class instance is defined to be the point immediately before the nearest namespace scope declaration or definition that contains the reference to that instance. In our example, this is point #1 .

When a template is actually instantiated, the need for additional instantiations may appear. Consider a short example:

Click here to view code image

template<typename T>
class S {
  public:
    using I = int;
};

// #1
template<typename T>
void f()
{
    S<char>::I var1 = 41;
    typename S<T>::I var2 = 42;
}

int main()
{
    f<double>();
}
// #2 : #2a , #2b

Our preceding discussion already established that the POI for f<double>() is at point #2 . The function template f() also refers to the class specialization S<char> with a POI that is therefore at point #1 . It references S<T> too, but because this is still dependent, we cannot really instantiate it at this point. However, if we instantiate f<double>() at point #2 , we notice that we also need to instantiate the definition of S<double>. Such secondary or transitive POIs are defined slightly differently. For function templates, the secondary POI is exactly the same as the primary POI. For class entities, the secondary POI immediately precedes (in the nearest enclosing namespace scope) the primary POI. In our example, this means that the POI of f<double>() can be placed at point #2b , and just before it—at point #2a —is the secondary POI for S<double>. Note how this differs from the POI for S<char>.

A translation unit often contains multiple POIs for the same instance. For class template instances, only the first POI in each translation unit is retained, and the subsequent ones are ignored (they are not really considered POIs). For instances of function and variable templates, all POIs are retained. In either case, the ODR requires that the instantiations occurring at any of the retained POIs be equivalent, but a C++ compiler does not need to verify and diagnose violations of this rule. This allows a C++ compiler to pick just one nonclass POI to perform the actual instantiation without worrying that another POI might result in a different instantiation.

In practice, most compilers delay the actual instantiation of most function templates to the end of the translation unit. Some instantiations cannot be delayed, including cases where instantiation is needed to determine a deduced return type (see Section 15.10.1 on page 296 and Section 15.10.4 on page 303) and cases where the function is constexpr and must be evaluated to produce a constant result. Some compilers instantiate inline functions when they’re first used to potentially inline the call right away.⁹ This effectively moves the POIs of the corresponding template specializations to the end of the translation unit, which is permitted by the C++ standard as an alternative POI.

14.3.3 The Inclusion Model

Whenever a POI is encountered, the definition of the corresponding template must somehow be accessible. For class specializations this means that the class template definition must have been seen earlier in the translation unit. For the POIs of function and variable templates (and member functions and static data members of class templates) this is also needed, and typically template definitions are simply added to header files that are #included into the translation unit, even when they’re nontype templates. This source model for template definitions is called the inclusion model, and it is the only automatic source model for templates supported by the current C++ standard.10

Although the inclusion model encourages programmers to place all their template definitions in header files so that they are available to satisfy any POIs that may arise, it is also possible to explicitly manage instantiations using explicit instantiation declarations and explicit instantiation definitions (see Section 14.5 on page 260). Doing so is logistically not trivial and most of the time programmers will prefer to rely on the automatic instantiation mechanism instead. One challenge for an implementation with the automatic scheme is to deal with the possibility of having POIs for the same specialization of a function or variable templates (or the same member function or static data member of a class template instance) across different translation units. We discuss approaches to this problem next.

14.4 Implementation Schemes

In this section we review some ways in which C++ implementations support the inclusion model. All these implementations rely on two classic components: a compiler and a linker. The compiler translates source code to object files, which contain machine code with symbolic annotations (cross-referencing other object files and libraries). The linker creates executable programs or libraries by combining the object files and resolving the symbolic cross-references they contain. In what follows, we assume such a model even though it is entirely possible (but not popular) to implement C++ in other ways. For example, one might imagine a C++ interpreter.

When a class template specialization is used in multiple translation units, a compiler will repeat the instantiation process in every translation unit. This poses very few problems because class definitions do not directly create low-level code. They are used only internally by a C++ implementation to verify and interpret various other expressions and declarations. In this regard, the multiple instantiations of a class definition are not materially different from the multiple inclusions of a class definition— typically through header file inclusion—in various translation units.

However, if you instantiate a (noninline) function template, the situation may be different. If you were to provide multiple definitions of an ordinary noninline function, you would violate the ODR. Assume, for example, that you compile and link a program consisting of the following two files:

// ==== a.cpp:
int main()
{
}

// ==== b.cpp:
int main()
{
}

C++ compilers will compile each module separately without any problems because indeed they are valid C++ translation units. However, your linker will most likely protest if you try to link the two together: Duplicate definitions are not allowed.

In contrast, consider the template case:

Click here to view code image

// ==== t.hpp:
// common header (inclusion model)
template<typename T>
class S {
  public:
    void f();
};

template<typename T>
void S::f()    //member definition
{
}

void helper(S<int>*);
// ==== a.cpp:
#include "t.hpp"
void helper(S<int>* s)
{
    s->f();   // #1 first point of instantiation of S::f
}

// ==== b.cpp:
#include "t.hpp"
int main()
{
    S<int> s;
    helper(&s);
    s.f();    // #2 second point of instantiation of S::f
}

If the linker treats instantiated member functions of class templates just like it does for ordinary functions or member functions, the compiler needs to ensure that it generates code at only one of the two POIs: at points #1 or #2 , but not both. To achieve this, a compiler has to carry information from one translation unit to the other, and this is something C++ compilers were never required to do prior to the introduction of templates. In what follows, we discuss the three broad classes of solutions that have been used by C++ implementers.

Note that the same problem occurs with all linkable entities produced by template instantiation: instantiated function templates and member function templates, as well as instantiated static data members and instantiated variable templates.

14.4.1 Greedy Instantiation

The first C++ compilers that popularized greedy instantiation were produced by a company called Borland. It has grown to be by far the most commonly used technique among the various C++ systems.

Greedy instantiation assumes that the linker is aware that certain entities—linkable template instantiations in particular—may in fact appear in duplicate across the various object files and libraries. The compiler will typically mark these entities in a special way. When the linker finds multiple instances, it keeps one and discards all the others. There is not much more to it than that.

In theory, greedy instantiation has some serious drawbacks:

• The compiler may be wasting time on generating and optimizing N instantiations, of which only one will be kept.

• Linkers typically do not check that two instantiations are identical because some insignificant differences in generated code can validly occur for multiple instances of one template specialization. These small differences should not cause the linker to fail. (These differences could result from tiny differences in the state of the compiler at the instantiation times.) However, this often also results in the linker not noticing more substantial differences, such as when one instantiation was compiled with strict floating-point math rules whereas the other was compiled with relaxed, higher-performance floating-point math rules.11

• The sum of all the object files could potentially be much larger than with alternatives because the same code may be duplicated many times.

In practice, these shortcomings do not seem to have caused major problems. Perhaps this is because greedy instantiation contrasts very favorably with the alternatives in one important aspect: The traditional source-object dependency is preserved. In particular, one translation unit generates but one object file, and each object file contains compiled code for all the linkable definitions in the corresponding source file (which includes the instantiated definitions). Another important benefit is that all function template instances are candidates for inlining without resorting to expensive “link-time” optimization mechanisms (and, in practice, function template instances are often small functions that benefit from inlining). The other instantiation mechanisms treat inline function template instances specially to ensure they can be expanded inline. However, greedy instantiation allows even noninline function template instances to be expanded inline.

Finally, it may be worth noting that the linker mechanism that allows duplicate definitions of linkable entities is also typically used to handle duplicate spilled inlined functions¹² and virtual function dispatch tables.¹³ If this mechanism is not available, the alternative is usually to emit these items with internal linkage, at the expense of generating larger code. The requirement that an inline function have a single address makes it difficult to implement that alternative in a standard-conforming way.

14.4.2 Queried Instantiation

In the mid-1990s, a company called Sun Microsystems14 released a reimplementation of its C++ compiler (version 4.0) with a new and interesting solution of the instantiation problem, which we call queried instantiation. Queried instantiation is conceptually remarkably simple and elegant, and yet it is chronologically the most recent class of instantiation schemes that we review here. In this scheme, a database shared by the compilations of all translation units participating in a program is maintained. This database keeps track of which specializations have been instantiated and on what source code they depend. The generated specializations themselves are typically stored with this information in the database. Whenever a point of instantiation for a linkable entity is encountered, one of three things can happen:

1. No specialization is available: In this case, instantiation occurs, and the resulting specialization is entered in the database.

2. A specialization is available but is out of date because source changes have occurred since it was generated. Here, too, instantiation occurs, but the resulting specialization replaces the one previously stored in the database.

3. An up-to-date specialization is available in the database. Nothing needs to be done. Although conceptually simple, this design presents a few implementation challenges:

• It is not trivial to maintain correctly the dependencies of the database contents with respect to the state of the source code. Although it is not incorrect to mistake the third case for the second, doing so increases the amount of work done by the compiler (and hence overall build time).

• It is quite common to compile multiple source files concurrently. Hence, an industrial-strength implementation needs to provide the appropriate amount of concurrency control in the database.

Despite these challenges, the scheme can be implemented quite efficiently. Furthermore, there are no obvious pathological cases that would make this solution scale poorly, in contrast, for example, with greedy instantiation, which may lead to a lot of wasted work.

The use of a database may also present some problems to the programmer, unfortunately. The origin of most of these problems lies in that fact that the traditional compilation model inherited from most C compilers no longer applies: A single translation unit no longer produces a single standalone object file. Assume, for example, that you wish to link your final program. This link operation needs not only the contents of each of the object files associated with your various translation units, but also the object files stored in the database. Similarly, if you create a binary library, you need to ensure that the tool that creates that library (typically a linker or an archiver) is aware of the database contents. More generally, any tool that operates on object files may need to be made aware of the contents of the database. Many of these problems can be alleviated by not storing the instantiations in the database, but instead by emitting the object code in the object file that caused the instantiation in the first place.

Libraries present yet another challenge. A number of generated specializations may be packaged in a library. When the library is added to another project, that project’s database may need to be made aware of the instantiations that are already available. If not, and if the project creates some of its own points of instantiation for the specializations present in the library, duplicate instantiation may occur. A possible strategy to deal with such situations is to use the same linker technology that enables greedy instantiation: Make the linker aware of generated specializations and have it weed out duplicates (which should nonetheless occur much less frequently than with greedy instantiation). Various other subtle arrangements of sources, object files, and libraries can lead to frustrating problems such as missing instantiations because the object code containing the required instantiation was not linked in the final executable program.

Ultimately, queried instantiation did not survive in the marketplace, and even Sun’s compiler now uses greedy instantiation.

14.4.3 Iterated Instantiation

The first compiler to support C++ templates was Cfront 3.0—a direct descendant of the compiler that Bjarne Stroustrup wrote to develop the language.15 An inflexible constraint on Cfront was that it had to be very portable from platform to platform, and this meant that it (1) used the C language as a common target representation across all target platforms and (2) used the local target linker. In particular, this implied that the linker was not aware of templates. In fact, Cfront emitted template instantiations as ordinary C functions, and therefore it had to avoid duplicate instantiations. Although the Cfront source model was different from the standard inclusion model, its instantiation strategy can be adapted to fit the inclusion model. As such, it also merits recognition as the first incarnation of iterated instantiation. The Cfront iteration can be described as follows:

1. Compile the sources without instantiating any required linkable specializations.

2. Link the object files using a prelinker.

3. The prelinker invokes the linker and parses its error messages to determine whether any are the result of missing instantiations. If so, the prelinker invokes the compiler on sources that contain the needed template definitions, with options to generate the missing instantiations.

4. Repeat step 3 if any definitions are generated.

The need to iterate step 3 is prompted by the observation that the instantiation of one linkable entity may lead to the need for another such entity that was not yet instantiated. Eventually the iteration will “converge,” and the linker will succeed in building a complete program.

The drawbacks of the original Cfront scheme are quite severe:

• The perceived time to link is augmented not only by the prelinker overhead but also by the cost of every required recompilation and relinking. Some users of Cfront-based systems reported link times of “a few days” compared with “about an hour” with the alternative schemes reported earlier.

• Diagnostics (errors, warnings) are delayed until link time. This is especially painful when linking becomes expensive and the developer must wait hours just to find out about a typo in a template definition.

• Special care must be taken to remember where the source containing a particular definition is located (step 1). Cfront in particular used a central repository, which had to deal with some of the challenges of the central database in the queried instantiation approach. In particular, the original Cfront implementation was not engineered to support concurrent compilations.

The iteration principle was subsequently refined both by the Edison Design Group’s (EDG) implementation and by HP’s aC++,¹⁶ eliminating some of the drawbacks of the original Cfront implementation. In practice, these implementations work quite well, and, although a build “from scratch” is typically more time consuming than the alternative schemes, subsequent build times are quite competitive. Still, relatively few C++ compilers use iterated instantiation anymore.

14.5 Explicit Instantiation

It is possible to create explicitly a point of instantiation for a template specialization. The construct that achieves this is called an explicit instantiation directive. Syntactically, it consists of the keyword template followed by a declaration of the specialization to be instantiated. For example:

Click here to view code image

template<typename T>
void f(T)
{
}

// four valid explicit instantiations:
template void f<int>(int);
template void f<>(float);
template void f(long);
template void f(char);

Note that every instantiation directive is valid. Template arguments can be deduced (see Chapter 15).

Members of class templates can also be explicitly instantiated in this way:

Click here to view code image

template<typename T>
class S {
  public:
    void f() {
    }
};

template void S<int>::f();

template class S<void>;

Furthermore, all the members of a class template specialization can be explicitly instantiated by explicitly instantiating the class template specialization. Because these explicit instantiation directives ensure that a definition of the named template specialization (or member thereof) is created, the explicit instantiation directives above are more accurately referred to as explicit instantiation definitions. A template specialization that is explicitly instantiated should not be explicitly specialized, and vice versa, because that would imply that the two definitions could be different (thus violating the ODR).

14.5.1 Manual Instantiation

Many C++ programmers have observed that automatic template instantiation has a nontrivial negative impact on build times. This is particularly true with compilers that implement greedy instantiation (Section 14.4.1 on page 256), because the same template specializations may be instantiated and optimized in many different translation units.

A technique to improve build times consists in manually instantiating those template specializations that the program requires in a single location and inhibiting the instantiation in all other translation units. One portable way to ensure this inhibition is to not provide the template definition except in the translation unit where it is explicitly instantiated.17 For example:

Click here to view code image

// ===== translation unit 1:
template<typename T> void f(); // no definition: prevents instantiation
                               // in this translation unit
void g()
{
    f<int>();
}

// ===== translation unit 2:
template<typename T> void f()
{
  // implementation
}

template void f<int>();        // manual instantiation

void g();

int main()
{
    g();
}

In the first translation unit, the compiler cannot see the definition of the function template f, so it will not (cannot) produce an instantiation of f<int>. The second translation unit provides the definition of f<int> via an explicit instantiation definition; without it, the program would fail to link.

Manual instantiation has a clear disadvantage: We must carefully keep track of which entities to instantiate. For large projects this quickly becomes an excessive burden; hence we do not recommend it. We have worked on several projects that initially underestimated this burden, and we came to regret our decision as the code matured.

However, manual instantiation also has a few advantages because the instantiation can be tuned to the needs of the program. Clearly, the overhead of large headers is avoided, as is the overhead of repeatedly instantiating the same templates with the same arguments in multiple translation units. Moreover, the source code of template definition can be kept hidden, but then no additional instantiations can be created by a client program.

Some of the burden of manual instantiation can be alleviated by placing the template definition into a third source file, conventionally with the extension .tpp. For our function f, this breaks down into:

Click here to view code image

// ===== f.hpp:
template<typename T> void f(); // no definition: prevents instantiation

// ===== t.hpp:
#include "f.hpp"
template<typename T> void f()  //definition
{
  // implementation
}

// ===== f.cpp:
#include "f.tpp"

template void f<int>();        // manual instantiation

This structure provides some flexibility. One can include only f.hpp to get the declaration of f, with no automatic instantiation. Explicit instantiations can be manually added to f.cpp as needed. Or, if manual instantiations become too onerous, one can also include f.tpp to enable automatic instantiation.

14.5.2 Explicit Instantiation Declarations

A more targeted approach to the elimination of redundant automatic instantiations is the use of an explicit instantiation declaration, which is an explicit instantiation directive prefixed by the keyword extern. An explicit instantiation declaration generally suppresses automatic instantiation of the named template specialization, because it declares that the named template specialization will be defined somewhere in the program (by an explicit instantiation definition). We say generally, because there are many exceptions to this:

• Inline functions can still be instantiated for the purpose of expanding them inline (but no separate object code is generated).

• Variables with deduced auto or decltype(auto) types and functions with deduced return types can still be instantiated to determine their types.

• Variables whose values are usable as constant-expressions can still be instantiated so their values can be evaluated.

• Variables of reference types can still be instantiated so the entity they reference can be resolved.

• Class templates and alias templates can still be instantiated to check the resulting types.

Using explicit instantiation declarations, we can provide the template definition for f in the header (t.hpp), then suppress automatic instantiation for commonly used specializations, as follows:

Click here to view code image

// ===== t.hpp:
template<typename T> void f()
{
}

extern template void f<int>();    // declared but not defined
extern template void f<float>();  // declared but not defined

// ===== t.cpp:
template void f<int>();           // definition
template void f<float>();         // definition

Each explicit instantiation declaration must be paired with a corresponding explicit instantiation definition, which must follow the explicit instantiation declaration. Omitting the definition will result in a linker error.

Explicit instantiation declarations can be used to improve compile or link times when certain specializations are used in many different translation units. Unlike with manual instantiation, which requires manually updating the list of explicit instantiation definitions each time a new specialization is required, explicit instantiation declarations can be introduced as an optimization at any point. However, the compile-time benefits may not be as significant as with manual instantiation, both because some redundant automatic instantiation is likely to occur¹⁸ and because the template definitions are still parsed as part of the header.

14.6 Compile-Time if Statements

As introduced in Section 8.5 on page 134, C++17 added a new statement kind that turns out to be remarkably useful when writing templates: compile-time if. It also introduces a new wrinkle in the instantiation process.

The following example illustrates its basic operation:

Click here to view code image

template<typename T> bool f(T p) {
  if constexpr (sizeof(T) <= sizeof(long long)) {
    return p>0;
  } else {
    return p.compare(0) > 0;
  }
}
bool g(int n) {
  return f(n);  // OK
}

The compile-time if is an if statement, where the if keyword is immediately followed by the constexpr keyword (as in this example).19 The parenthesized condition that follows must have a constant Boolean value (implicit conversions to bool are included in that consideration). The compiler therefore knows which branch will be selected; the other branch is called the discarded branch. Of particular interest is that during the instantiation of templates (including generic lambdas), the discarded branch is not instantiated. That is necessary for our example to be valid: We are instantiating f(T) with T = int, which means that the else branch is discarded. If it weren’t discarded, it would be instantiated and we’d run into an error for the expression p.compare(0) (which isn’t valid when p is a simple integer).

Prior to C++17 and its constexpr if statements, avoiding such errors required explicit template specialization or overloading (see Chapter 16) to achieve similar effects.

The example above, in C++14, might be implemented as follows:

Click here to view code image

template<bool b> struct Dispatch {  //only to be instantiated when b is false
  static bool f(T p) {              //(due to next specialization for true)
    return p.compare(0) > 0;
  }
};

template<> struct Dispatch<true> {
  static bool f(T p) {
    return p > 0;
  }
};

template<typename T> bool f(T p) {
  return Dispatch<sizeof(T) <= sizeof(long long)>::f(p);
}

bool g(int n) {
  return f(n);  // OK
}

Clearly, the constexpr if alternative expresses our intention far more clearly and concisely. However, it requires implementations to refine the unit of instantiation: Whereas previously function definitions were always instantiated as a whole, now it must be possible to inhibit the instantiation of parts of them.

Another very handy use of constexpr if is expressing the recursion needed to handle function parameter packs. To generalize the example, introduced in Section 8.5 on page 134:

Click here to view code image

template<typename Head, typename… Remainder>
void f(Head&& h, Remainder&&… r) {
  doSomething(std::forward<Head>(h));
  if constexpr (sizeof…(r) != 0) {
    // handle the remainder recursively (perfectly forwarding the arguments):
    f(std::forward<Remainder>(r)…);
  }
}

Without constexpr if statements, this requires an additional overload of the f() template to ensure that recursion terminates.

Even in nontemplate contexts, constexpr if statements have a somewhat unique effect:

Click here to view code image

void h();
void g() {
  if constexpr (sizeof(int) == 1) {
    h();
  }
}

On most platforms, the condition in g() is false and the call to h() is therefore discarded. As a consequence, h() need not necessarily be defined at all (unless it is used elsewhere, of course). Had the keyword constexpr been omitted in this example, a lack of a definition for h() would often elicit an error at link time.²⁰

14.7 In the Standard Library

The C++ standard library includes a number of templates that are only commonly used with a few basic types. For example, the std::basic_string class template is most commonly used with char (because std::string is a type alias of std::basic_string<char>) or wchar_t, although it is possible to instantiate it with other character-like types. Therefore, it is common for standard library implementations to introduce explicit instantiation declarations for these common cases. For example:

Click here to view code image

namespace std {
  template<typename charT, typename traits = char_traits<charT>,
           typename Allocator = allocator<charT>>
  class basic_string {
   …
  };
  extern template class basic_string<char>;
  extern template class basic_string<wchar_t>;
}

The source files implementing the standard library will then contain the corresponding explicit instantiation definitions, so that these common implementations can be shared among all users of the standard library. Similar explicit instantiations often exist for the various stream classes, such as basic_iostream, basic_istream, and so on.

14.8 Afternotes

This chapter deals with two related but different issues: the C++ template compilation model and various C++ template instantiation mechanisms.

The compilation model determines the meaning of a template at various stages of the translation of a program. In particular, it determines what the various constructs in a template mean when it is instantiated. Name lookup is an essential ingredient of the compilation model.

Standard C++ only supports a single compilation model, the inclusion model. However, the 1998 and 2003 standards also supported a separation model of template compilation, which allowed a template definition to be written in a different translation unit from its instantiations. These exported templates were only ever implemented once, by the Edison Design Group (EDG).21 Their implementation effort determined that (1) implementing the separation model of C++ templates was vastly more difficult and time consuming than had been anticipated, and (2) the presumed benefits of the separation model, such as improved compile times, did not materialize due to complexities of the model. As the development of the 2011 standard was wrapping up, it became clear that other implementers were not going to support the feature, and the C++ standards committee voted to remove exported templates from the language. We refer readers interested in the details of the separation model to the first edition of this book ([VandevoordeJosuttisTemplates1st]), which describes the behavior of exported templates.

The instantiation mechanisms are the external mechanisms that allow C++ implementations to create instantiations correctly. These mechanisms may be constrained by requirements of the linker and other software building tools. While instantiation mechanisms differ from one implementation to the next (and each has its trade-offs), they generally do not have a significant impact on day-to-day programming in C++.

Shortly after C++11 was completed, Walter Bright, Herb Sutter, and Andrei Alexandrescu proposed a “static if” feature not unlike constexpr if (via paper N3329). It was, however, a more general feature that could appear even outside of function definitions. (Walter Bright is the principal designer and implementer of the D programming language, which has a similar feature.) For example:

Click here to view code image

template<unsigned long N>
struct Fact {
  static if (N <= 1) {
    constexpr unsigned long value = 1;
  } else {
    constexpr unsigned long value = N*Fact<N-1>::value;
  }
};

Note how class-scope declarations are made conditional in this example. This powerful ability was controversial, however, with some committee members fearing that it might be abused and others not liking some technical aspects of the proposal (such as the fact that no scope is introduced by the braces and the discarded branch is not parsed at all).

A few years later, Ville Voutilainen came back with a proposal (P0128) that was mostly what would become constexpr if statements. It went through a few minor design iterations (involving tentative keywords static_if and constexpr_if) and, with the help of Jens Maurer, Ville eventually shepherded the proposal into the language (via paper P0292r2).

1 The term instantiation is sometimes also used to refer to the creation of objects from types. In this book, however, it always refers to template instantiation.

² The term specialization is used in the general sense of an entity that is a specific instance of a template (see Chapter 10). It does not refer to the explicit specialization mechanism described in Chapter 16.

³ Anonymous unions are always special in this way: Their members can be considered to be members of the enclosing class. An anonymous union is primarily a construct that says that some class members share the same storage.

⁴ Some compilers, such as GCC, allow zero-length arrays as extensions and may therefore accept this code even when N ends up being 0.

⁵ Typical examples are smart pointer templates (e.g., the standard std::unique_ptr<T>).

⁶ Besides two-phase lookup, terms such as two-stage lookup or two-phase name lookup are also used.

⁷ Surprisingly, this is not clearly specified in the standard at the time of this writing. However, it is not expected to be a controversial issue.

⁸ The call operator of generic lambdas are a subtle exception to that observation.

⁹ In modern compilers the inlining of calls is typically handled by a mostly language-independent component of the compiler dedicated to optimizations (a “back end” or “middle end”). However, C++ “front ends” (the C++-specific part of the C++ compiler) that were designed in the earlier days of C++ may also have the ability to expand calls inline because older back ends were too conservative when considering calls for inline expansion.

¹⁰ The original C++98 standard also provided a separation model. It never gained popularity and was removed just before publishing the C++11 standard.

¹¹ Current systems have grown to detect certain other differences, however. For example, they might report if one instantiation has associated debugging information and another does not.

¹² When a compiler is unable to “inline” every call to a function that you marked with the keyword inline, a separate copy of the function is emitted in the object file. This may happen in multiple object files.

¹³ Virtual function calls are usually implemented as indirect calls through a table of pointers to functions. See [LippmanObjMod] for a thorough study of such implementation aspects of C++.

¹⁴ Sun Microsystems was later acquired by Oracle.

¹⁵ Do not let this phrase mislead you into thinking that Cfront was an academic prototype: It was used in industrial contexts and formed the basis of many commercial C++ compiler offerings. Release 3.0 appeared in 1991 but was plagued with bugs. Version 3.0.1 followed soon thereafter and made templates usable.

¹⁶ HP’s aC++ was grown out of technology from a company called Taligent (later absorbed by International Business Machines, or IBM). HP also added greedy instantiation to aC++ and made that the default mechanism.

¹⁷ In the 1998 and 2003 C++ standards, this was the only portable way to inhibit instantiation in other translation units.

¹⁸ An interesting part of this optimization problem is to determine exactly which specializations are good candidates for explicit instantiation declarations. Low-level utilities such as the common Unix tool nm can be useful in identifying which automatic instantiations actually made it into the object files that comprise a program.

¹⁹ Although the code reads if constexpr, the feature is called constexpr if, because it is the “constexpr” form of if.

²⁰ Optimization may nonetheless mask the error. With constexpr if the problem is guaranteed not to exist.

²¹ Ironically, EDG was the most vocal opponent of the feature when it was added to the working paper for the original standard.