Cippi is back at school.
According to the C++ Core Guidelines, “expressions and statements are the lowest and most direct way of expressing actions and computation.” This section has about sixty-five rules that list best practices for expressions and statements in general and declarations in arithmetic expressions in particular.
First of all, I want to give you an informal definition of what expressions and statements are:
An expression evaluates to a value.
A statement does something and is often composed of expressions or statements.
5 * 5; // expression std::cout << 25; // print statement auto a = 10; // assignment statement auto b = 5 * 5; // expression statement
Declarations in a block scope are statements. A block scope is something within curly braces.
The C++ Core Guidelines have two general rules with a particular focus on expressions and statements.
Prefer the standard library to other libraries and to “handcrafted code” |
There is no reason to write a raw loop to sum up a vector of doubles:
int max = v.size(); double sum = 0.0; for (int i = 0; i < max; ++i) sum += v[i];
Instead, use the std::accumulate
algorithm from the Standard Template Library (STL). This clearly communicates your intent and makes the code more readable.
auto sum = std::accumulate(std::begin(v), std::end(v), 0.0);
Maybe your next task is to build the product of the doubles. Just invoke std::accumulate
with the suitable lambda.
auto pro = std::accumulate(std::begin(v), std::end(v), 1.0, [](double fir, double sec){ return fir * sec; });
The solution is good but not perfect. The C++ standard already defines many function objects such as multiplication.
auto pro = std::accumulate(std::begin(v), std::end(v), 1.0, std::multiplies<>());
This rule reminds me of a quote from Sean Parent at the C++ Seasoning conference in 2013: “If you want to improve the code quality in your organization, replace all your coding guidelines with one goal: Prefer an algorithm to a raw loop.” Or to say it more directly: If you write a raw loop, you probably don’t know the algorithms of the STL well enough. The STL has more than 100 algorithms.
Prefer suitable abstractions to direct use of language features |
This is the next déjà vu. In one C++ seminar, I had a long discussion followed by an even more extended analysis of a few quite sophisticated and handmade functions for reading and writing std::strstream
s. My students had to maintain a function, and after one week, they had no idea what was going on. The main reason why they got confused was that the functionality was not based on the right abstraction.
For example, consider this handmade function for reading a std::istream
.
char** read1(istream& is, int maxelem, int maxstring, int* nread) { auto res = new char*[maxelem]; int elemcount = 0; while (is && elemcount < maxelem) { auto s = new char[maxstring]; is.read(s, maxstring); res[elemcount++] = s; } nread = &elemcount; return res; }
In contrast, how easy is it to comprehend the following function?
std::vector<std::string> read2(std::istream& is) { std::vector<std::string> res; for (string s; is >> s;) res.push_back(s); return res; }
The right abstraction often means that you don’t have to think about ownership such as in the function read2
. This concern does hold for the function read1
. The caller of read1
is the owner of the result and has, therefore, to delete it.
First of all, here is how a declaration is defined in the C++ Core Guidelines:
A declaration is a statement. A declaration introduces a name into a scope and may cause the construction of a named object.
The rules for declarations are about names, the variables and their initialization, and macros.
On the one hand, the following rules are obvious, and I describe them only briefly. On the other hand, I know many code bases that permanently break these rules. For example, I spoke with a former Fortran programmer who stated the following: Each name should have exactly three characters.
Let me first name the most important rule: Good names are probably the most important rule for good software.
If a scope is small, you can put it on a screen and get an idea of what is going on. If a scope becomes too big, you should structure your code into functions or classes. Identify logical entities and use self-explanatory names in your refactoring process. Afterward, it becomes easier to think about your code.
Declare names in for-statement initializers and conditions to limit scope |
Since the first C++ standard, we can declare a variable in a for
statement.
Since C++17, we can declare variables also in an if
or a switch
statement.
std::map<int,std::string> myMap; if (auto result = myMap.insert(value); result.second) { useResult(*result.first); // ... } else { // ... } // result is automatically destroyed
The variable result is only valid inside the if
and else
branch of the if
statement. result
does not pollute the outer scope and is automatically destroyed. Before C++17, you had to declare result
in the outer scope.
std::map<int,std::string> myMap; auto result = myMap.insert(value) if (result.second){ useResult(*result.first); // ... } else { // ... }
Keep common and local names short, and keep uncommon and nonlocal names longer |
This rule sounds strange, but we are already used to it. Giving a variable the name i
or j
or giving a variable the name T
makes its intention immediately clear: i
and j
are indices, and T
is a type for a template parameter.
template<typename T> void print(std::ostream& os, const std::vector<T>& v) { for (int i = 0; i < v.size(); ++i) os << v[i] << '\n';
}
i
is an okay name for a loop control variable, a poor name for a function parameter, and a terrible name for a global variable.
There is a meta-rule underlying this rule. A name should be self-explanatory. In a brief context, you understand at a glance what the variable means. This will not automatically hold for longer contexts; therefore, use longer names.
Can you read this example without any hesitation?
if (readable(i1 + l1 + ol + o1 + o0 + ol + o1 + I0 + l0)) surprise();
For example, I often have problems with the number 0 and the capital letter O. Depending on the font used, they look quite similar. A few years ago, it took me quite some time to log in to a server. My automatically generated password had a letter O.
If you use ALL_CAPS
, macro substitution may kick in because ALL_CAPS
are commonly used for macros. The following code snippet may be a little surprising.
// somewhere in some header: #define NE != // somewhere else in some other header: enum Coord { N, NE, NW, S, SE, SW, E, W }; // third, somewhere in some poor programmer's .cpp: switch (direction) { case N: // ... case NE: // ... // ... }
Let me give you two examples. Do you spot two issues?
char* p, p2; char a = 'a'; p = &a; p2 = a; int a = 7, b = 9, c, d = 10, e = 3;
p2
is just a char
, and c
is not initialized. With C++17, we acquired one exception to this rule: structured binding.
Now I can write the if
statement with initializer in rule “ES.6: Declare names in for-statement initializers and conditions to limit scope” using a cleaner and more readable syntax.
std::map<int, std::string> myMap; if (auto [iter, succeeded] = myMap.insert(value); succeeded) { useResult(iter); // ... } else { // ... } // iter and succeeded are automatically destroyed
If you use auto
, changing your code may become a piece of cake.
The following code snippet only uses auto
. You do not have to think about the types, and therefore, you cannot make an error. This means the type of res
will be int
at the end. Thanks to the typeid
operator, you get a string representation of the type.
auto a = 5; auto b = 10; auto sum = a * b * 3; auto res = sum + 10; std::cout << typeid(res).name() << '\n'; // i
If you decide to change the literal b
from int
to double
(1), or use instead of the int
literal 3
a float
literal 3.1f
(2), res
always has the correct type. The compiler automatically deduces the correct type.
auto a = 5; auto b = 10.5; // (1) auto sum = a * b * 3; auto res = sum * 10; std::cout << typeid(res).name() << '\n'; // d auto a = 5; auto b = 10; auto sum = a * b * 3.2f; // (2) auto res = sum * 10; std::cout << typeid(res).name() << '\n'; // f
The GCC and the Clang compiler generated the type hints i
, d
, and f
in the tree code snippets. The MSVC compiler would write more verbose type hints such as int
, double
, and float
.
For readability and maintenance reasons, you should not reuse names in nested scopes.
// shadow.cpp #include <iostream> int shadow(bool cond) { int d = 0; if (cond) { d = 1; } else { int d = 2; // declare a local scoped d; // hiding d of the parent scope d = 3; } // the local scoped d is removed return d; } int main() { std::cout << '\n'; std::cout << "shadow(true): " << shadow(true) << '\n'; std::cout << "shadow(false): " << shadow(false) << '\n'; std::cout << '\n'; }
What is the output of the program? Confused by the d
s? Figure 8.1 shows the result.
Figure 8.1 Reusing names in nested scopes
This was easy! Right? But the same behavior is quite surprising in a class hierarchy.
// shadowClass.cpp #include <iostream> #include <string> struct Base { void shadow(std::string) { // (1) std::cout << "Base::shadow" << '\n'; } }; struct Derived: Base { void shadow(int) { // (2) std::cout << "Derived::shadow" << '\n'; } }; int main() { std::cout << '\n'; Derived derived; derived.shadow(std::string{}); // (3) derived.shadow(int{}); std::cout << '\n'; }
Both structs Base
and Derived
have a member function shadow
. The one in the Base
accepts a std::string
(1) and the other one an int
(2). When you invoke the object derived with a default-constructed std::string
(3), you may assume that the base version is called. Wrong! The member function shadow
is implemented in the class Derived
. The member function of the base class is not considered during name resolution. Figure 8.2 shows the compilation error of GCC.
Figure 8.2 Hiding member functions of a base
Thanks to the using declaration, the base variant of shadow
is visible in Derived
.
struct Derived: Base { using Base::shadow; void shadow(int) { std::cout << "Derived::shadow" << '\n'; } };
After adding the using Base::shadow
into Derived
, the program behaves as expected. The guideline “C.138: Create an overload set for a derived class and its bases with using” showed the issue of shadowing in a class hierarchy. See Figure 8.3.
Figure 8.3 Change visibility with a using declaration
As in the previous section on names, the rules in this section regarding variables and their initialization are often quite obvious but sometimes provide precious insights. Consequently, I cover the intuitive rules quickly and write about the valuable insights in more depth.
This is one of these elementary techniques that many professional C++ programmers get wrong. The simple question is: Which variable is initialized?
struct T1 {}; class T2{ public: T2() {} }; int n; // OK int main() { int n2; // BAD std::string s; // OK T1 t1; // OK T2 t2; // OK }
n
has a global scope and has a fundamental type. Consequently, it is initialized to 0. The initialization does not happen for n2
because it has a local scope and is, therefore, not initialized. But if you use a user-defined type such as std::string
, T1
, or T2
, it is initialized even in a local scope.
There is a simple fix to prevent this issue: Use auto
. Now you cannot forget to initialize a variable.
struct T1 {}; class T2{ public: T2() {} }; auto n = 0; int main() { auto n2 = 0; auto s = ""s; auto t1 = T1(); auto t2 = T2(); }
Don’t introduce a variable (or constant) before you need to use it |
In the C standard C89, you must declare all of your variables at the beginning of a scope. We program in C++, not in C89.
Don’t declare a variable until you have a value to initialize it with |
If you don’t follow this rule, you may have a so-called use-before-set error. Have a look at the example from the guidelines.
int var; if (cond) set(&var); // some non-trivial condition else if (cond2 || !cond3) { var = set2(3.14); } // use var
If cond3
holds but not cond
, or cond2
, then var
is not initialized when it is used.
There are many reasons to use {}-initialization.
{}-initialization
Is always applicable
Overcomes the most vexing parse
Prevents narrowing conversion
While the first two arguments make C++ more intuitive, the last argument often prevents undefined behavior.
{}-initialization is always applicable. Here are a few examples:
// uniformInitialization.cpp #include <map> #include <vector> #include <string> // Initialization of a C-array class Array { public: Array(): myData{1,2,3,4,5} {} private: const int myData[5]; }; class MyClass { public: int x; double y; }; class MyClass2 { public: MyClass2(int fir, double sec): x{fir}, y{sec} {}; private: int x; double y; }; int main() { // Direct initialization of standard containers int intArray[]= {1, 2, 3, 4, 5}; std::vector<int> intArray1{1, 2, 3, 4, 5}; std::map<std::string, int> myMap{ {"Scott", 1976}, {"Dijkstra", 1972} }; Array arr; // Default initialization of arbitrary objects int i{}; // i becomes 0 std::string s{}; // s becomes "" std::vector<float> v{}; // v becomes an empty vector double d{}; // d becomes 0.0 // Direct initialization of an object with public members MyClass myClass{2011, 3.14}; MyClass myClass1 = {2011, 3.14}; // Initialization of an object using the constructor MyClass2 myClass2{2011, 3.14}; MyClass2 myClass3 = {2011, 3.14}; }
You should never say always. There is a weird behavior, which is fixed in C++17.
auto
Always applicable? Yes, but you have to keep a special rule in mind. If you use automatic type deduction with auto
in combination with {}-initialization, you get a std::initializer_list
.
auto initA{1}; // std::initializer_list<int> auto initB = {2}; // std::initializer_list<int> auto initC{1, 2}; // std::initializer_list<int> auto initD = {1, 2}; // std::initializer_list<int>
This counterintuitive behavior changes with C++17.
auto initA{1}; // int auto initB = {2}; // std::initializer_list<int> auto initC{1, 2}; // error, no single element auto initD = {1, 2}; // std::initializer_list<int>
The most vexing parse is well known, and almost any professional C++ developer has already fallen into this trap. The following short program demonstrates the trap.
// mostVexingParse.cpp #include <iostream> struct MyInt { MyInt(int arg = 0): i(arg) {} int i; }; int main() { MyInt myInt(2011); MyInt myInt2(); std::cout << myInt.i; std::cout << myInt2.i; }
This simple-looking program does not compile! See Figure 8.4.
Figure 8.4 The most vexing parse
The error message is not very meaningful. The compiler can interpret the expression MyInt myInt2()
as a call of a constructor or as a declaration of a function. When there is an ambiguity, it selects a function declaration. Consequently, the call myInt2.i
is not valid.
Replacing round braces in the call MyInt myInt2()
with curly braces, MyInt myInt2{}
, solves the ambiguity.
// mostVexingParseSolved.cpp #include <iostream> struct MyInt { MyInt(int arg = 0): i(arg) {} int i; }; int main() { MyInt myInt(2011); MyInt myInt2{}; std::cout << myInt.i; std::cout << myInt2.i; }
Narrowing conversion is an implicit conversion of arithmetic values, including a loss of accuracy. That sounds extremely dangerous and is a common cause of undefined behavior.
The following code snippet exemplifies narrowing conversion for the two fundamental types char
and int
. It doesn’t matter whether I use direct initialization or copy initialization.
// narrowingConversion.cpp #include <iostream> int main() { char c1(999); char c2 = 999; std::cout << "c1: " << c1 << '\n'; std::cout << "c2: " << c2 << '\n'; int i1(3.14); int i2 = 3.14; std::cout << "i1: " << i1 << '\n'; std::cout << "i2: " << i2 << '\n'; }
The output of the program shows both issues. First, the int
literal 999 doesn’t fit into the type char
; second, the double
literal doesn’t fit into the int
type. See Figure 8.5.
Figure 8.5 Narrowing conversion
Narrowing conversion is not possible with {}-initialization.
// narrowingConversionSolved.cpp #include <iostream> int main() { char c1{999}; char c2 = {999}; std::cout << "c1: " << c1 << '\n'; std::cout << "c2: " << c2 << '\n'; int i1{3.14}; int i2 = {3.14}; std::cout << "i1: " << i1 << '\n'; std::cout << "i2: " << i2 << '\n'; }
The program is ill formed because {}-initialization detects narrowing conversion. The compiler has at least to diagnose a warning. Most of the compilers treat narrowing conversion as an error. To be on the safe side, compile your program always with the narrowing flag set. Figure 8.6 shows the failing compilation with GCC.
Figure 8.6 Narrowing conversion detected
Do you like the following code?
void use() { int i; for (i = 0; i < 20; ++i) { /* ... */ } for (i = 0; i < 200; ++i) { /* ... */ } // bad: i recycled }
I hope not. Put the declaration of i
into the for
loop and you are fine. i
is now bound to the lifetime of the for
loop.
void use() { for (int i = 0; i < 20; ++i) { /* ... */ } for (int i = 0; i < 200; ++i) { /* ... */ } }
With C++17, you can declare variables directly in an if statement or a switch statement.
Use lambdas for complex initialization, especially of |
I often hear the question: Why should I invoke a lambda function in place? This rule answers this question. You can put complex initialization steps in a lambda. The in-place invocation of a lambda is, in particular, valuable if your variable should become const
.
If you don’t want to modify your variable after initialization, you should make it const
. But sometimes, the initialization of the variable consists of more than one step. Consequently, you cannot make the variable const
.
The widget x
in the following example should be const
after its initialization. It cannot be const
because it is modified a few times during its initialization.
widget x; // should be const, but: for (auto i = 2; i <= N; ++i) { x += some_obj.do_something_with(i); } // from here, x should be const, // but we can't say so in code in this style
Now a lambda expression comes to our rescue. Use a technique called Immediately Invoked Lambda Expression (IILE).
Put the initialization stuff into a lambda expression, capture the environment by reference, and initialize your const
variable with the in-place invoked lambda function.
const widget x = [&]{ widget val; for (auto i = 2; i <= N; ++i) { val += some_obj.do_something_with(i); } return val; }();
Admittedly, it looks a little bit strange to invoke a lambda function just in place, but from the conceptional view, I like it. You put the whole initialization stuff just in the body of a lambda. The final pair of parentheses invokes the lambda.
If there is one unanimous consensus in the C++ standardization committee, then this is it: Macros must go. Macros are just text substitution without any C++ semantics. They transform the written code so that the compiler sees different code. This transformation is highly error prone and obscures the cause of the error.
But sometimes you have to deal with legacy code, which relies on macros. For completeness, the C++ Core Guidelines have four rules for macros.
Let me start with the don’ts. The following example shows the usage of the function-like macro max
. I copied max
from the param.h
header file, which is part of the GNU C library.
// macro.cpp #include <stdio.h> #define max(a, b) ((a) > (b)) ? (a) : (b) int main() { int a = 1, b = 2; printf("\nmax(a, b): %d\n", max(a, b)); printf("a = %d, b = %d\n", a, b); printf("\nmax(++a, ++b): %d\n", max(++a, ++b)); // (1) printf("a = %d, b = %d\n\n", a, b); // (2) }
The output in (2) may surprise you. See Figure 8.7.
Figure 8.7 Usage of the function-like macro max
The variable b
is two times evaluated and, therefore, incremented twice. Use instead of the function-like macro max
a constexpr
function or a max
function template.
template<typename T> T max (T i, T j) { return ((i > j) ? i : j); } constexpr int max (int i, int j){ return ((i > j) ? i : j); }
The same argumentation applies to macros as constants.
#define PI 3.14 // bad constexpr double pi = 3.14 // good
If, for whatever reason, you have to use or to maintain macros, write them ALL_CAPS and give them unique names. The following code snippet breaks both rules. forever
is written in lowercase letters and the macro CHAR may conflict with some-one else using the name CHAR
.
#define forever for (;;) #define CHAR
There are about twenty rules related to expressions. They are quite diverse and over-lap with existing rules. Here I focus on the rules applying to complicated expressions, pointers, the order of evaluation, and conversions.
First and foremost, you should avoid complicated expressions.
What does complicated mean? Here is the example from the C++ Core Guidelines, including the explanation:
// bad: assignment hidden in subexpression while ((c = getc()) != -1) // bad: two non-local variables assigned in a subexpression while ((cin >> c1, cin >> c2), c1 == c2) // better, but possibly still too complicated for (char c1, c2; cin >> c1 >> c2 && c1 == c2;) // OK: if i and j are not aliased (names for the same data) int x = ++i + ++j; // OK: if i != j and i != k v[i] = v[j] + v[k]; // bad: multiple assignments "hidden" in subexpressions x = a + (b = f()) + (c = g()) * 7; // bad: relies on commonly misunderstood precedence rules x = a & b + c * d && e ^ f == 7; // bad: undefined behavior x = x++ + x++ + ++x;
On one hand, the guidelines say that if you are in doubt about operator precedence, use parentheses. On the other hand, they state that you should know enough not to need parentheses here. Finding the right balance is, therefore, the challenge and depends on the expertise of the users.
const unsigned int flag = 2; unsigned int a = flag; if (a & flag != 0) // bad: means a&(flag != 0) if (a < 0 || a <= max) { // good: quite obvious // ... }
For an expert, the expression may be obvious, but for a beginner, it may be a challenge.
I have only two tips in mind:
If in doubt about precedence, use parentheses. The precedence table gives you all the details.
Program for the beginners! Keep the precedence table under your pillow.
Let me quote the C++ Core Guidelines: “Complicated pointer manipulation is a major source of errors.” Why should we care? Of course, our legacy code is full of pointer manipulations such as in the following code snippet.
void f(int* p, int count) { if (count < 2) return; int* q = p + 1; int n = *p++; if (count < 6) return; p[4] = 1; p[count - 1] = 2; use(&p[0], 3); } int myArray[100]; f(myArray, 100);
The main issue with these lines of code is that the caller must provide the correct length of the C-array. If not, undefined behavior kicks in.
Think about the last two lines of the code snippet for a few seconds. We start with a C-array and remove its type information by passing it to the function f
. This process is called an array to pointer decay and is the reason for many errors. Maybe we counted the number of elements wrong or the size of the C-arrays changed. The result is the same in both cases: undefined behavior.
What should we do? We should use the appropriate data type. C++20 offers std:::span
.
void f(std::span<int> a) { if (a.size() < 2) return; int n = a[0]; // OK std::span<int> q = a.subspan(1); if (a.size() < 6) return; a[4] = 1; a[count - 1] = 2; use(a.data(), a.size()); }
std::span
knows its size. I hear your complaint. C++20 is not an option for you. To our rescue, C++ has templates; therefore, it’s easy to overcome this restriction and write bounds-safe code.
1 // at.cpp 2 3 #include <algorithm> 4 #include <array> 5 #include <deque> 6 #include <string> 7 #include <vector> 8 9 template <typename T> 10 void use(T*, int) {} 11 12 template <typename T> 13 void f(T& a) { 14 15 if (a.size() < 2) return; 16 17 int n = a.at(0); 18 19 std::array<typename T::value_type , 99> q; 20 std::copy(a.begin() + 1, a.end(), q.begin()); 21 22 if (a.size() < 6) return; 23 24 a.at(4) = 1; 25 26 a.at(a.size() - 1) = 2; 27 28 use(a.data(), a.size()); 29 } 30 31 int main() { 32 33 std::array<int, 100> arr{}; 34 f(arr); 35 36 std::array<double, 20> arr2{}; 37 f(arr2); 38 39 std::vector<double> vec{1, 2, 3, 4, 5, 6, 7, 8, 9}; 40 f(vec); 41 42 std::string myString= "123456789"; 43 f(myString); 44 45 // std::deque<int> deq{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; 46 // f(deq); 47 48 }
Now the function f
works for std::array
s of different sizes and types (lines 34 and 37) but also for a std::vector
(line 40) or a std::string
(line 43). These containers have in common that their data is stored in a contiguous memory block. This is not the case for std::deque
; therefore, the call a.data()
in the comment (line 46) fails. The key observation in the example is that the at
call on a container checks its boundaries and throws eventually a std::out_of_range
exception.
The expression T::value_type
helps to get the type of the elements of the container. T
is a so-called dependent type because T
is a type parameter of the function template f
. This is the reason I have to give the compiler a hint that T::value_type
is actually a type: typename T::value_type
.
A symbolic constant is more explicit than a magic constant. The example in the C++ Core Guidelines starts with the magic constants 1
and 12
and ends with the symbolic constant first_month
and last_month
.
// don't: magic constants 1 and 12 for (int m = 1; m <= 12; ++m) std::cout << month[m] << '\n'; // months are indexed 1..12 (symbolic constant) constexpr int first_month = 1; constexpr int last_month = 12; for (int m = first_month; m <= last_month; ++m) { std::cout << month[m] << '\n'; }
If you don’t have to check the length of a range, you will not get an off-by-one error. Let’s sum up the elements of a std::vector
.
// sumUp.cpp #include <iostream> #include <numeric> #include <vector> int main() { std::vector<int> vec{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; // bad int sum1 = 0; auto sizeVec = vec.size(); for (int i = 0; i < sizeVec; ++i) sum1 += vec[i]; std::cout << sum1 << '\n'; // 55 // better int sum2 = 0; for (auto v: vec) sum2 += v; std::cout << sum2 << '\n'; // 55 // the best auto sum3 = std::accumulate(vec.begin(), vec.end(), 0); std::cout << sum3 << '\n'; // 55 }
Iterating explicitly through a container is very error prone. In contrast, iterating implicitly with a range-based for loop is safe. Additionally, the algorithm std::accumulate
of the STL documents its intention.
The rules for pointers start with null pointers and continue with the deletion and dereferencing of pointers.
Why should you not use 0
or NULL
to denote a null pointer?
0:
The literal 0
can be the null pointer (void*)0
or the number 0
. This is dependent on the context. Consequently, what started as null pointer could end up as number.
NULL:
NULL
is a macro, and therefore, you don’t know what’s inside. A possible implementation according to cppreference.com could be the following:
#define NULL 0 //since C++11 #define NULL nullptr
The null pointer nullptr
avoids the ambiguity of the number 0
and the macro NULL
. nullptr
is and remains of type std::nullptr_t
. You can assign a nullptr
to an arbitrary pointer. The pointer becomes a null pointer and points, therefore, to no data. You cannot dereference a nullptr
. The pointer of this type can on one hand be compared with all pointers and can on the other hand be converted to all pointers. You cannot compare and convert a nullptr
to an integral type. There is one exception to this rule. nullptr
can be explicitly or contextually converted to bool
. Hence, you can use a nullptr
in a logical expression.
Using the three kinds of null pointers in generic code shows immediately the flaws of the number 0
and the macro NULL
. Thanks to template argument deduction, the literals 0
and NULL
deduce to integral types. The information that both literals should be null pointer constants is lost.
// nullPointer.cpp #include <cstddef> #include <iostream> template<class P > void functionTemplate(P p) { int* a = p; } int main() { int* a = 0; int* b = NULL; int* c = nullptr; functionTemplate(0); // (1) functionTemplate(NULL); // (2) functionTemplate(nullptr); }
You can use 0
and NULL
to initialize the int
pointer in (1) and (2). But if you use the values 0
and NULL
as arguments to the function template, the compiler will loudly complain. See Figure 8.8.
Figure 8.8 The null pointers 0
, NULL
, and nullptr
The compiler deduces 0
in the function template to type int
; it deduces NULL
to the type long int
. This observation does not hold for nullptr
. nullptr
preserves its type std::nullptr_t
through template argument deduction.
Explicit memory management and not using a container of the STL or a smart pointer such as std::unique_ptr<X[]>
is very error prone:
void f(int n) { auto p = new X[n]; // n default constructed Xs // ... delete p; // error: just delete the object p, // rather than deleting the array p[] }
Deleting a C-array with an nonarray delete
is undefined behavior.
If you have to manage raw memory, read the rules in the Allocation and Deallocation section of Chapter 7.
If you dereference an invalid pointer, your program has undefined behavior. The only way to avoid this behavior is to check your pointer before its usage.
void func(int* p) { if (!p) { // do something special } int x = *p; }
How can you overcome this issue? Don’t use a naked pointer. Use a smart pointer such as std::unique_ptr
or std::shared_ptr
if you need pointer-like semantics.
If you don’t apply the right order of evaluation in an expression, your program may end in undefined behavior.
In C++14, the following expression has undefined behavior.
v[i] = ++i; // the result is undefined
This undefined behavior has been addressed in C++17. With C++17, the order of evaluation of the last code snippet is right to left; therefore, the expression has well-defined behavior.
Here are the additional guarantees we have with C++17:
Postfix expressions are evaluated from left to right. This includes function calls and member selection expressions.
Assignment expressions are evaluated from right to left. This includes compound assignments such as +=
.
Operands to shift operators are evaluated from left to right.
Here are a few examples:
a.b a->b a->*b a(b1, b2, b3) b @= a a[b] a << b a >> b
How should you read these examples? First, a
is evaluated and then b
.
The function call a(b1, b2, b3)
is tricky. With C++17, we have the guarantee that each function argument is entirely evaluated before each of the other function arguments, but the order of the evaluation of the arguments is still unspecified.
Let me elaborate a little bit more on the last sentence.
In the last few years, I have seen many errors because developers assumed that the order of the evaluation of function arguments is left to right. Wrong! There is no such guarantee!
// unspecified.cpp #include <iostream> void func(int fir, int sec) { std::cout << "(" << fir << "," << sec << ")" << '\n'; } int main(){ int i = 0; func(i++, i++); }
The order of the evaluation of the function arguments is unspecified. Unspecified behavior means that the behavior of the program may vary between implementations and the conforming implementation is not required to document the effects of each behavior.
Consequently, the output from GCC and Clang differs even if both compilers conform to the C++ standard (see Figure 8.9).
Figure 8.9 Unspecified behavior
Casting types is a common cause of undefined behavior. If necessary, use explicit casts.
Let’s see what happens if I screw up the type system and cast a double
to a long int
and to a long long int
.
// casts.cpp #include <iostream> int main() { double d = 2; auto p = (long*)&d; auto q = (long long*)&d; std::cout << d << ' ' << *p << ' ' << *q << '\n'; }
The result with the Visual Studio compiler is not promising (see Figure 8.10).
Figure 8.10 Wrong casts with the Visual Studio compiler
Nor is the result with the GCC or Clang compiler promising (see Figure 8.11).
Figure 8.11 Wrong casts with the GCC or Clang compiler
What is terrible about the C-cast? You don’t see which cast is actually performed. If you perform a C-cast, a combination of casts is applied if necessary.
Roughly speaking, a C-cast starts with a static_cast
, continues with a const_cast
, and finally performs a reinterpret_cast
.
The principle from The Zen of Python, “explicit is better than implicit,” also applies to casts in C++: Use a named cast if necessary.
With C++11, we have the following six casts:
static_cast:
converts between similar types such as pointer types or numeric types
const_cast:
adds or removes const
or volatile
reinterpret_cast:
converts between pointers or between integral types and pointers
dynamic_cast:
converts between polymorphic pointers or references in the same class hierarchy
std::move:
converts to an rvalue reference
std::forward:
converts an lvalue to an lvalue reference and an rvalue to an rvalue reference
I assume you are surprised that I presented std::move
and std::forward
as casts. Let’s have a closer look at the internals of std::move
:
static_cast<std::remove_reference<decltype(arg)>::type&&>(arg)
What’s happening here? First, the type of the argument arg
is deduced by decltype(arg)
. Afterward, all references are removed, and two new references are added. The function std::remove_reference
is from the type-traits library. In the end, we always get an rvalue reference.
Casting away const
is undefined behavior if the underlying object such as constInt
is const
and you try to modify the underlying object.
const int constInt = 10; const int* pToConstInt = &constInt; int* pToInt = const_cast<int*>(pToConstInt); *pToInt = 12; // undefined behavior
You can find the rationale for this rule in the C standard, which is also relevant for the C++ standard: “The implementation may place a const object that is not volatile in a read-only region of storage” (International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 9899:2011, subclause 6.7.3, paragraph 4).
Statements fall mainly into two categories: iteration statements and selection statements. The rules for both kinds of statements are quite clear. Consequently, I quote the rule of the C++ Core Guidelines and add a few pieces of information when necessary.
C++ implements three iteration statements: while
, do while
, and for
. With C++11, syntactic sugar is added to the for loop: range-based for loop.
std::vector<int> vec = {0, 1, 2, 3, 4, 5}; // for loop for(std::size_t i = 0; i < vec.size(); ++i) { std::cout << vec[i] << ' '; } // range-based for loop for (auto ele : vec) std::cout << ele << ' ';
A range-based for loop is easier to read, and you cannot make an index error or change the index while looping (“ES.71: Prefer a range-for
-statement to a for
-statement when there is a choice” and “ES.86: Avoid modifying loop control variables inside the body of raw for loops”).
When you have an obvious loop variable, you should use a for
loop instead of a while
statement (“ES.72: Prefer a for
-statement to a while
-statement when there is an obvious loop variable”); if not, you should use a while
statement (“ES.73: Prefer a while
-statement to a for
-statement when there is no obvious loop variable”).
for (auto i = 0; i < vec.size(); ++i) { // do work } int events = 0; while (wait_for_event()) { ++events; // do work }
You should declare a loop variable in a for
loop (“ES.74: Prefer to declare a loop variable in the initializer part of a for
-statement”). To remind you, since C++17, declaring a variable such as result
can also be done in the initialization part of an if
or a switch
statement.
std::map<int,std::string> myMap; if (auto result = myMap.insert(value); result.second){ useResult(result.first); // ... } else{ // ... } // result is automatically destroyed
Avoid do while
statements (“ES.75: Avoid do
-statements”) and goto
statements (“ES.76: Avoid goto
”), and minimize the use of break
and continue
in iteration statements (“ES.77: Minimize the use of break
and continue
in loops”) because they are difficult to read. If something is difficult to read, it’s error prone and makes refactoring of your code challenging. A break
statement ends the iteration statement, and a continue
statement ends the current iteration step.
if
and switch
are the selection statements of C++ that we inherited from C.
You should prefer a switch
statement to an if
statement when there is a choice (“ES.70: Prefer a switch
statement to an if
-statement when there is a choice”) because a switch
statement may be more readable and can be better optimized than an if
statement.
The next two rules related to switch
statements need more attention than the ones before.
I saw switch
statements in legacy code, which had more than 100 case labels. If you use non-empty cases without a break
, the maintenance of these switch
statements becomes a nightmare. Here is an example from the C++ Core Guidelines:
switch (eventType) { case Information: update_status_bar(); break; case Warning: write_event_log(); // Bad - implicit fallthrough case Error: display_error_window(); break; }
Maybe you overlooked it. The Warning
case has no break
statement; therefore, the Error
case is automatically executed.
Since C++17, we have a cure with the attribute [[fallthrough]]
. Now you can explicitly express your intention. [[fallthrough]]
has to be in its own statement line and immediately before a case label. [[fallthrough]]
indicates to the compiler that a fallthrough is intentional. Consequently, the compiler may not diagnose a warning.
void f(int n) { void g(), h(), i(); switch (n) { case 1: case 2: g(); [[fallthrough]]; // (1) case 3: h(); // (2) case 4: i(); [[fallthrough]]; // (3) } }
The [[fallthrough]]
attribute in (1) suppresses a compiler warning. That does not hold for (2). The compiler may warn. (3) is ill formed because no case label follows.
The program switch.cpp
should exemplify this rule.
// switch.cpp #include <iostream> enum class Message{ information, warning, error, fatal }; void writeMessage() { std::cerr << "message" << '\n'; } void writeWarning() { std::cerr << "warning" << '\n'; } void writeUnexpected() { std::cerr << "unexpected" << '\n'; } void withDefault(Message message) { switch(message) { case Message::information: writeMessage(); break; case Message:: warning: writeWarning(); break; default: writeUnexpected(); break; } } void withoutDefaultGood(Message message) { switch(message) { case Message::information: writeMessage(); break; case Message:: warning: writeWarning(); break; default: // nothing can be done break; } } void withoutDefaultBad(Message message) { switch(message) { case Message::information: writeMessage(); break; case Message::warning: writeWarning(); break; } } int main() { withDefault(Message::fatal); withoutDefaultGood(Message::information); withoutDefaultBad(Message::warning); }
The implementation of the functions withDefault
and withoutDefaultGood
are expressive enough. The maintainer of the function withoutDefaultGood
knows because of the comment that there is no default case for this switch
statement. Compare the functions withoutDefaultGood
and withoutDefaultBad
from a maintenance point of view. Do you know if the implementer of the function withoutDefaultBad
forgot the default case or if the enumerator’s Message::error
and Message::fatal
were later added? To make sure, you have to study the source code or ask the original author of the code, if possible.
The seven arithmetic rules provide a significant surprise potential. They focus on two topics: arithmetic with signed and unsigned integers, and typical arithmetic errors such as overflow/underflow and division by zero.
Breaking these arithmetic rules often ends in unexpected results.
If you mix signed and unsigned arithmetic, you may not get the expected result.
// mixSignedUnsigned.cpp #include <iostream> int main() { int x = -3; unsigned int y = 7; std::cout << x - y << '\n'; // 4294967286 std::cout << x + y << '\n'; // 4 std::cout << x * y << '\n'; // 4294967275 std::cout << x / y << '\n'; // 613566756 }
GCC, Clang, and the Microsoft compiler produce the same result.
Bit manipulations with bitwise operators (~
, >>
, >>=
, <<
, , &
, &=
, ^
, ^=
, |
, and |=
) have implementation-defined behavior when performed on signed operands. Implementation-defined behavior means that the behavior varies between implementations, and the implementation must document the effects of each behavior. Consequently, don’t perform bit manipulations on signed types, but use unsigned types instead:
unsigned char x = 0b00110010; unsigned char y = ~x; // y == 0b11001101
ES.102 | Use signed types for arithmetic |
First, you should not do arithmetic with unsigned types because subtraction of two values often gives a negative value. Second, you should not mix signed and unsigned arithmetic according to the previous rule: “ES.100: Don’t mix signed and unsigned arithmetic.” Let’s see what happens when I break the rule.
GCC, Clang, and the Microsoft compiler produce the same result.
// signedTypes.cpp #include <iostream> template<typename T, typename T2> T subtract(T x, T2 y) { return x - y; } int main() { int s = 5; unsigned int us = 5; std::cout << subtract(s, 7) << '\n'; // -2 std::cout << subtract(us, 7u) << '\n'; // 4294967294 std::cout << subtract(s, 7u) << '\n'; // -2 std::cout << subtract(us, 7) << '\n'; // 4294967294 std::cout << subtract(s, us + 2) << '\n'; // -2 std::cout << subtract(us, s + 2) << '\n'; // 4294967294 }
There is an interesting relation. When you assign a -1 to an unsigned int
, you get the largest unsigned int
.
The behavior of arithmetic expression may differ between signed
and unsigned
types.
Let’s start with a simple program.
// modulo.cpp #include <cstddef> #include <iostream> int main(){ std::cout << '\n'; unsigned int max{100000}; unsigned short x{0}; std::size_t count{0}; while (x < max && count < 20) { std::cout << x << " "; x += 10000; // (1) ++count; } std::cout << "\n\n"; }
The crucial point of the program is that the successive addition to x
in (1) does not trigger an overflow but a modulo operation if the value range of x
ends. The reason is that x
is of type unsigned short
.
Making x
signed changes the behavior of the program drastically.
// overflow.cpp #include <cstddef> #include <iostream> int main() { std::cout << '\n'; int max{100000}; short x{0}; std::size_t count{0}; while (x < max && count < 20) { std::cout << x << " "; x += 10000; ++count; } std::cout << "\n\n"; }
The addition now triggers an overflow. In Figure 8.12, I marked the key points with red circles.
Figure 8.12 Modulo versus overflow with unsigned
s and signed
s
The following three rules always result in undefined behavior.
and
Let me combine both rules. The effect of an overflow or an underflow is the same: memory corruption and, therefore, undefined behavior. Let’s make a simple test with an int
array. How long will the following program run when I compile it with GCC?
// overUnderflow.cpp #include <cstddef> #include <iostream> int main() { int a[0]; int n = 0; while (true){ if (!(n % 100)){ std::cout << "a[" << n << "] = " << a[n] } a[n] = n; a[-n] = -n; ++n; } }
Disturbingly long. The program writes each one-hundredth array entry to std::cout
. See Figure 8.14.
Figure 8.14 Underflow and overflow of a C-array
Dividing by zero crashes with high probability the execution of your program.
auto res = 5 / 0; // crash
Dividing by zero may be fine in a logical expression.
auto res = false and (5 / 0); // fine
The result of the expression (5 / 0)
is not necessary for the overall result and is thus not evaluated. This technique is called short circuit evaluation and is a special case of lazy evaluation.
I write about the rule “ES.25: Declare an object const
or constexpr
unless you want to modify its value later on” in Chapter 12, Constants and Immutability.
The section Metaprogramming in Chapter 13 provides an introduction to template metaprogramming and constexpr
functions as a replacement for a function-like macro.
The rules related to expressions have a broad focus. Consequently, some of the rules have a strong overlap with already presented rules. Find more details in the referenced rules:
ES.56: Write std::move()
only when you need to explicitly move an object to another scope (see Parameter Passing: Ownership Semantics in Chapter 4)
ES.60: Avoid new and delete outside resource management functions (see R.12: Immediately give the result of an explicit resource allocation to a manager object)
ES.63: Don’t slice (see C.67: A polymorphic class should suppress copying)
ES.64: Use the T{e}
notation for construction (see ES.23: Prefer the {}
-initializer syntax)