Chapter 8

Expressions and Statements

Images

Cippi is back at school.

According to the C++ Core Guidelines, “expressions and statements are the lowest and most direct way of expressing actions and computation.” This section has about sixty-five rules that list best practices for expressions and statements in general and declarations in arithmetic expressions in particular.

First of all, I want to give you an informal definition of what expressions and statements are:

General

The C++ Core Guidelines have two general rules with a particular focus on expressions and statements.

ES.1

Prefer the standard library to other libraries and to “handcrafted code”

There is no reason to write a raw loop to sum up a vector of doubles:

int max = v.size();
double sum = 0.0;
for (int i = 0; i < max; ++i) sum += v[i];

Instead, use the std::accumulate algorithm from the Standard Template Library (STL). This clearly communicates your intent and makes the code more readable.

auto sum = std::accumulate(std::begin(v), std::end(v), 0.0);

Maybe your next task is to build the product of the doubles. Just invoke std::accumulate with the suitable lambda.

auto pro = std::accumulate(std::begin(v), std::end(v), 1.0,
                [](double fir, double sec){ return fir * sec; });

The solution is good but not perfect. The C++ standard already defines many function objects such as multiplication.

auto pro = std::accumulate(std::begin(v), std::end(v), 1.0,
                           std::multiplies<>());

This rule reminds me of a quote from Sean Parent at the C++ Seasoning conference in 2013: “If you want to improve the code quality in your organization, replace all your coding guidelines with one goal: Prefer an algorithm to a raw loop.” Or to say it more directly: If you write a raw loop, you probably dont know the algorithms of the STL well enough. The STL has more than 100 algorithms.

ES.2

Prefer suitable abstractions to direct use of language features

This is the next déjà vu. In one C++ seminar, I had a long discussion followed by an even more extended analysis of a few quite sophisticated and handmade functions for reading and writing std::strstreams. My students had to maintain a function, and after one week, they had no idea what was going on. The main reason why they got confused was that the functionality was not based on the right abstraction.

For example, consider this handmade function for reading a std::istream.

char** read1(istream& is, int maxelem, int maxstring, int* nread) {
   auto res = new char*[maxelem];
   int elemcount = 0;
   while (is && elemcount < maxelem) {
      auto s = new char[maxstring];
      is.read(s, maxstring);
      res[elemcount++] = s;
   }
   nread = &elemcount;
   return res;
}

In contrast, how easy is it to comprehend the following function?

std::vector<std::string> read2(std::istream& is) {
   std::vector<std::string> res;
   for (string s; is >> s;) res.push_back(s);
   return res;
}

The right abstraction often means that you don’t have to think about ownership such as in the function read2. This concern does hold for the function read1. The caller of read1 is the owner of the result and has, therefore, to delete it.

Declarations

First of all, here is how a declaration is defined in the C++ Core Guidelines:

A declaration is a statement. A declaration introduces a name into a scope and may cause the construction of a named object.

The rules for declarations are about names, the variables and their initialization, and macros.

Names

On the one hand, the following rules are obvious, and I describe them only briefly. On the other hand, I know many code bases that permanently break these rules. For example, I spoke with a former Fortran programmer who stated the following: Each name should have exactly three characters.

Let me first name the most important rule: Good names are probably the most important rule for good software.

ES.5

Keep scopes small

If a scope is small, you can put it on a screen and get an idea of what is going on. If a scope becomes too big, you should structure your code into functions or classes. Identify logical entities and use self-explanatory names in your refactoring process. Afterward, it becomes easier to think about your code.

ES.6

Declare names in for-statement initializers and conditions to limit scope

Since the first C++ standard, we can declare a variable in a for statement.

Since C++17, we can declare variables also in an if or a switch statement.

std::map<int,std::string> myMap;

if (auto result = myMap.insert(value); result.second) { 
   useResult(*result.first); 
   // ...
}
else {
   // ...
} // result is automatically destroyed

The variable result is only valid inside the if and else branch of the if statement. result does not pollute the outer scope and is automatically destroyed. Before C++17, you had to declare result in the outer scope.

std::map<int,std::string> myMap;
auto result = myMap.insert(value)
if (result.second){ 
   useResult(*result.first); 
   // ...
}
else {
   // ...
}

ES.7

Keep common and local names short, and keep uncommon and nonlocal names longer

This rule sounds strange, but we are already used to it. Giving a variable the name i or j or giving a variable the name T makes its intention immediately clear: i and j are indices, and T is a type for a template parameter.

template<typename T>
void print(std::ostream& os, const std::vector<T>& v) {
   for (int i = 0; i < v.size(); ++i) os << v[i] << '\n';
}

i is an okay name for a loop control variable, a poor name for a function parameter, and a terrible name for a global variable.

There is a meta-rule underlying this rule. A name should be self-explanatory. In a brief context, you understand at a glance what the variable means. This will not automatically hold for longer contexts; therefore, use longer names.

ES.8

Avoid similar-looking names

Can you read this example without any hesitation?

if (readable(i1 + l1 + ol + o1 + o0 + ol + o1 + I0 + l0)) surprise();

For example, I often have problems with the number 0 and the capital letter O. Depending on the font used, they look quite similar. A few years ago, it took me quite some time to log in to a server. My automatically generated password had a letter O.

ES.9

Avoid ALL_CAPS names

If you use ALL_CAPS, macro substitution may kick in because ALL_CAPS are commonly used for macros. The following code snippet may be a little surprising.

// somewhere in some header:
#define NE !=

// somewhere else in some other header:
enum Coord { N, NE, NW, S, SE, SW, E, W };

// third, somewhere in some poor programmer's .cpp:
switch (direction) {
case N:
   // ...
case NE:
   // ...
// ...
}

ES.10

Declare one name (only) per declaration

Let me give you two examples. Do you spot two issues?

char* p, p2;
char a = 'a';
p = &a;
p2 = a; 

int a = 7, b = 9, c, d = 10, e = 3;

p2 is just a char, and c is not initialized. With C++17, we acquired one exception to this rule: structured binding.

Now I can write the if statement with initializer in rule “ES.6: Declare names in for-statement initializers and conditions to limit scope” using a cleaner and more readable syntax.

std::map<int, std::string> myMap;

if (auto [iter, succeeded] = myMap.insert(value); succeeded) { 
   useResult(iter); 
   // ...
}
else {
     // ...
} // iter and succeeded are automatically destroyed

ES.11

Use auto to avoid redundant repetition of type names

If you use auto, changing your code may become a piece of cake.

The following code snippet only uses auto. You do not have to think about the types, and therefore, you cannot make an error. This means the type of res will be int at the end. Thanks to the typeid operator, you get a string representation of the type.

auto a = 5;
auto b = 10;
auto sum = a * b * 3;
auto res = sum + 10;
std::cout << typeid(res).name() << '\n'; // i

If you decide to change the literal b from int to double (1), or use instead of the int literal 3 a float literal 3.1f (2), res always has the correct type. The compiler automatically deduces the correct type.

auto a = 5;
auto b = 10.5;             // (1)
auto sum = a * b * 3;
auto res = sum * 10;
std::cout << typeid(res).name() << '\n'; // d

auto a = 5;
auto b = 10;
auto sum = a * b * 3.2f;   // (2)
auto res = sum * 10;
std::cout << typeid(res).name() << '\n'; // f

The GCC and the Clang compiler generated the type hints i, d, and f in the tree code snippets. The MSVC compiler would write more verbose type hints such as int, double, and float.

ES.12

Do not reuse names in nested scopes

For readability and maintenance reasons, you should not reuse names in nested scopes.

// shadow.cpp

#include <iostream>

int shadow(bool cond) {

   int d = 0;
   if (cond) {
      d = 1;
   }
   else {
      int d = 2; // declare a local scoped d; 
                 // hiding d of the parent scope
   d = 3;
   }            // the local scoped d is removed
   return d;
}

int main() {
   
   std::cout << '\n';
   
   std::cout << "shadow(true): " << shadow(true) << '\n'; 
   std::cout << "shadow(false): " << shadow(false) << '\n'; 
   
   std::cout << '\n';

}

What is the output of the program? Confused by the ds? Figure 8.1 shows the result.

Images

Figure 8.1 Reusing names in nested scopes

This was easy! Right? But the same behavior is quite surprising in a class hierarchy.

// shadowClass.cpp

#include <iostream>
#include <string>


struct Base {
   void shadow(std::string) {                  // (1) 
      std::cout << "Base::shadow" << '\n'; 
   }
};

struct Derived: Base {
   void shadow(int) {                          // (2)
      std::cout << "Derived::shadow" << '\n'; 
   }
};

int main() {
   
   std::cout << '\n';
   
   Derived derived;
   
   derived.shadow(std::string{});            // (3) 
   derived.shadow(int{}); 
   
   std::cout << '\n';

}

Both structs Base and Derived have a member function shadow. The one in the Base accepts a std::string (1) and the other one an int (2). When you invoke the object derived with a default-constructed std::string (3), you may assume that the base version is called. Wrong! The member function shadow is implemented in the class Derived. The member function of the base class is not considered during name resolution. Figure 8.2 shows the compilation error of GCC.

Images

Figure 8.2 Hiding member functions of a base

Thanks to the using declaration, the base variant of shadow is visible in Derived.

struct Derived: Base {
   using Base::shadow; 
   void shadow(int) {
      std::cout << "Derived::shadow" << '\n'; 
   }
};

After adding the using Base::shadow into Derived, the program behaves as expected. The guideline “C.138: Create an overload set for a derived class and its bases with using” showed the issue of shadowing in a class hierarchy. See Figure 8.3.

Images

Figure 8.3 Change visibility with a using declaration

Variables and their initialization

As in the previous section on names, the rules in this section regarding variables and their initialization are often quite obvious but sometimes provide precious insights. Consequently, I cover the intuitive rules quickly and write about the valuable insights in more depth.

ES.20

Always initialize an object

This is one of these elementary techniques that many professional C++ programmers get wrong. The simple question is: Which variable is initialized?

struct T1 {};

class T2{
public:
   T2() {}
};

int n;               // OK

int main() {
   int n2;          // BAD
   std::string s;   // OK
   T1 t1;           // OK
   T2 t2;           // OK
}

n has a global scope and has a fundamental type. Consequently, it is initialized to 0. The initialization does not happen for n2 because it has a local scope and is, therefore, not initialized. But if you use a user-defined type such as std::string, T1, or T2, it is initialized even in a local scope.

There is a simple fix to prevent this issue: Use auto. Now you cannot forget to initialize a variable.

struct T1 {};

class T2{
public:
   T2() {}
};

auto n = 0;

int main() {
   auto n2 = 0;
   auto s = ""s; 
   auto t1 = T1(); 
   auto t2 = T2();
}

ES.21

Don’t introduce a variable (or constant) before you need to use it

In the C standard C89, you must declare all of your variables at the beginning of a scope. We program in C++, not in C89.

ES.22

Don’t declare a variable until you have a value to initialize it with

If you don’t follow this rule, you may have a so-called use-before-set error. Have a look at the example from the guidelines.

int var; 

if (cond) set(&var); // some non-trivial condition
else if (cond2 || !cond3) {
   var = set2(3.14);
}

// use var

If cond3 holds but not cond, or cond2, then var is not initialized when it is used.

ES.23

Prefer the {}-initializer syntax

There are many reasons to use {}-initialization.

{}-initialization

While the first two arguments make C++ more intuitive, the last argument often prevents undefined behavior.

Always applicable

{}-initialization is always applicable. Here are a few examples:

// uniformInitialization.cpp

#include <map>
#include <vector>
#include <string>

// Initialization of a C-array
class Array {
public:
   Array(): myData{1,2,3,4,5} {}

private:
   const int myData[5];
};

class MyClass {
public: 
   int x;
   double y;
};

class MyClass2 {
   public:
      MyClass2(int fir, double sec): x{fir}, y{sec} {};
   private: 
      int x;
      double y;
};

int main() {
   
   // Direct initialization of standard containers
   int intArray[]= {1, 2, 3, 4, 5}; 
   std::vector<int> intArray1{1, 2, 3, 4, 5}; 
   std::map<std::string, int> myMap{ {"Scott", 1976}, 
                                        {"Dijkstra", 1972} };
   
   Array arr;
   
   // Default initialization of arbitrary objects 
   int i{};                        // i becomes 0
   std::string s{};                // s becomes ""
   std::vector<float> v{};         // v becomes an empty vector
   double d{};                     // d becomes 0.0
   
   // Direct initialization of an object with public members
   MyClass myClass{2011, 3.14}; 
   MyClass myClass1 = {2011, 3.14}; 
   
   // Initialization of an object using the constructor
   MyClass2 myClass2{2011, 3.14}; 
   MyClass2 myClass3 = {2011, 3.14}; 

}

You should never say always. There is a weird behavior, which is fixed in C++17.

Type deduction with auto

Always applicable? Yes, but you have to keep a special rule in mind. If you use automatic type deduction with auto in combination with {}-initialization, you get a std::initializer_list.

auto initA{1};           // std::initializer_list<int>
auto initB = {2};       // std::initializer_list<int>
auto initC{1, 2};      // std::initializer_list<int>
auto initD = {1, 2};   // std::initializer_list<int>

This counterintuitive behavior changes with C++17.

auto initA{1};             // int
auto initB = {2};          // std::initializer_list<int>
auto initC{1, 2};          // error, no single element
auto initD = {1, 2};       // std::initializer_list<int>
Most vexing parse

The most vexing parse is well known, and almost any professional C++ developer has already fallen into this trap. The following short program demonstrates the trap.

// mostVexingParse.cpp

#include <iostream>

struct MyInt {
   MyInt(int arg = 0): i(arg) {}
   int i;
};
   

int main() {
   
   MyInt myInt(2011);
   MyInt myInt2();
   
   std::cout << myInt.i;
   std::cout << myInt2.i;

}

This simple-looking program does not compile! See Figure 8.4.

Images

Figure 8.4 The most vexing parse

The error message is not very meaningful. The compiler can interpret the expression MyInt myInt2() as a call of a constructor or as a declaration of a function. When there is an ambiguity, it selects a function declaration. Consequently, the call myInt2.i is not valid.

Replacing round braces in the call MyInt myInt2() with curly braces, MyInt myInt2{}, solves the ambiguity.

// mostVexingParseSolved.cpp

#include <iostream>

struct MyInt {
   MyInt(int arg = 0): i(arg) {}
   int i;
};
   

int main() {
   
   MyInt myInt(2011);
   MyInt myInt2{};
   
   std::cout << myInt.i;
   std::cout << myInt2.i;

}
Narrowing conversion

Narrowing conversion is an implicit conversion of arithmetic values, including a loss of accuracy. That sounds extremely dangerous and is a common cause of undefined behavior.

The following code snippet exemplifies narrowing conversion for the two fundamental types char and int. It doesn’t matter whether I use direct initialization or copy initialization.

// narrowingConversion.cpp

#include <iostream>

int main() {
   
   char c1(999); 
   char c2 = 999;
   std::cout << "c1: " << c1 << '\n';
   std::cout << "c2: " << c2 << '\n';
   
   int i1(3.14); 
   int i2 = 3.14;
   std::cout << "i1: " << i1 << '\n';
   std::cout << "i2: " << i2 << '\n';

}

The output of the program shows both issues. First, the int literal 999 doesn’t fit into the type char; second, the double literal doesn’t fit into the int type. See Figure 8.5.

Images

Figure 8.5 Narrowing conversion

Narrowing conversion is not possible with {}-initialization.

// narrowingConversionSolved.cpp

#include <iostream>

int main() {

   char c1{999}; 
   char c2 = {999};
   std::cout << "c1: " << c1 << '\n';
   std::cout << "c2: " << c2 << '\n';
   
   int i1{3.14}; 
   int i2 = {3.14};
   std::cout << "i1: " << i1 << '\n';
   std::cout << "i2: " << i2 << '\n';

}

The program is ill formed because {}-initialization detects narrowing conversion. The compiler has at least to diagnose a warning. Most of the compilers treat narrowing conversion as an error. To be on the safe side, compile your program always with the narrowing flag set. Figure 8.6 shows the failing compilation with GCC.

Images

Figure 8.6 Narrowing conversion detected

ES.26

Don’t use a variable for two unrelated purposes

Do you like the following code?

void use() {
   int i;
   for (i = 0; i < 20; ++i) { /* ... */ }
   for (i = 0; i < 200; ++i) { /* ... */ } // bad: i recycled
}

I hope not. Put the declaration of i into the for loop and you are fine. i is now bound to the lifetime of the for loop.

void use() {
   for (int i = 0; i < 20; ++i) { /* ... */ }
   for (int i = 0; i < 200; ++i) { /* ... */ }
}

With C++17, you can declare variables directly in an if statement or a switch statement.

ES.28

Use lambdas for complex initialization, especially of const variables

I often hear the question: Why should I invoke a lambda function in place? This rule answers this question. You can put complex initialization steps in a lambda. The in-place invocation of a lambda is, in particular, valuable if your variable should become const.

If you don’t want to modify your variable after initialization, you should make it const. But sometimes, the initialization of the variable consists of more than one step. Consequently, you cannot make the variable const.

The widget x in the following example should be const after its initialization. It cannot be const because it is modified a few times during its initialization.

widget x; // should be const, but:
for (auto i = 2; i <= N; ++i) { 
   x += some_obj.do_something_with(i);
} 

// from here, x should be const,
// but we can't say so in code in this style

Now a lambda expression comes to our rescue. Use a technique called Immediately Invoked Lambda Expression (IILE).

Put the initialization stuff into a lambda expression, capture the environment by reference, and initialize your const variable with the in-place invoked lambda function.

const widget x = [&]{
   widget val; 
   for (auto i = 2; i <= N; ++i) {

      val += some_obj.do_something_with(i); 
   } 
   return val;
}();

Admittedly, it looks a little bit strange to invoke a lambda function just in place, but from the conceptional view, I like it. You put the whole initialization stuff just in the body of a lambda. The final pair of parentheses invokes the lambda.

Macros

If there is one unanimous consensus in the C++ standardization committee, then this is it: Macros must go. Macros are just text substitution without any C++ semantics. They transform the written code so that the compiler sees different code. This transformation is highly error prone and obscures the cause of the error.

But sometimes you have to deal with legacy code, which relies on macros. For completeness, the C++ Core Guidelines have four rules for macros.

Let me start with the don’ts. The following example shows the usage of the function-like macro max. I copied max from the param.h header file, which is part of the GNU C library.

// macro.cpp

#include <stdio.h>

#define max(a, b) ((a) > (b)) ? (a) : (b)

int main() {
   
   int a = 1, b = 2;
   printf("\nmax(a, b): %d\n", max(a, b));
   printf("a = %d, b = %d\n", a, b);
   
   printf("\nmax(++a, ++b): %d\n", max(++a, ++b)); // (1)
   printf("a = %d, b = %d\n\n", a, b);            // (2)

}

The output in (2) may surprise you. See Figure 8.7.

Images

Figure 8.7 Usage of the function-like macro max

The variable b is two times evaluated and, therefore, incremented twice. Use instead of the function-like macro max a constexpr function or a max function template.

template<typename T>
T max (T i, T j) {
   return ((i > j) ? i : j);
}

constexpr int max (int i, int j){
   return ((i > j) ? i : j);
}

The same argumentation applies to macros as constants.

#define PI 3.14             // bad

constexpr double pi = 3.14 // good

If, for whatever reason, you have to use or to maintain macros, write them ALL_CAPS and give them unique names. The following code snippet breaks both rules. forever is written in lowercase letters and the macro CHAR may conflict with some-one else using the name CHAR.

#define forever for (;;) 

#define CHAR

Expressions

There are about twenty rules related to expressions. They are quite diverse and over-lap with existing rules. Here I focus on the rules applying to complicated expressions, pointers, the order of evaluation, and conversions.

Complicated expressions

First and foremost, you should avoid complicated expressions.

ES.40

Avoid complicated expressions

What does complicated mean? Here is the example from the C++ Core Guidelines, including the explanation:

// bad: assignment hidden in subexpression
while ((c = getc()) != -1)

// bad: two non-local variables assigned in a subexpression
while ((cin >> c1, cin >> c2), c1 == c2)

// better, but possibly still too complicated
for (char c1, c2; cin >> c1 >> c2 && c1 == c2;)

// OK: if i and j are not aliased (names for the same data)
int x = ++i + ++j; 

// OK: if i != j and i != k
v[i] = v[j] + v[k];

// bad: multiple assignments "hidden" in subexpressions
x = a + (b = f()) + (c = g()) * 7;

// bad: relies on commonly misunderstood precedence rules
x = a & b + c * d && e ^ f == 7;

// bad: undefined behavior
x = x++ + x++ + ++x;

ES.41

If in doubt about operator precedence, parenthesize

On one hand, the guidelines say that if you are in doubt about operator precedence, use parentheses. On the other hand, they state that you should know enough not to need parentheses here. Finding the right balance is, therefore, the challenge and depends on the expertise of the users.

const unsigned int flag = 2;
unsigned int a = flag;

if (a & flag != 0)          // bad: means a&(flag != 0) 

if (a < 0 || a <= max) {   // good: quite obvious 
   // ...
}

For an expert, the expression may be obvious, but for a beginner, it may be a challenge.

I have only two tips in mind:

  1. If in doubt about precedence, use parentheses. The precedence table gives you all the details.

  2. Program for the beginners! Keep the precedence table under your pillow.

ES.42

Keep use of pointers simple and straightforward

Let me quote the C++ Core Guidelines: “Complicated pointer manipulation is a major source of errors.” Why should we care? Of course, our legacy code is full of pointer manipulations such as in the following code snippet.

void f(int* p, int count) {
   if (count < 2) return;
   
   int* q = p + 1; 
   
   int n = *p++; 

   if (count < 6) return;
   
   p[4] = 1; 
   
   p[count - 1] = 2; 
   
   use(&p[0], 3);
}

int myArray[100]; 

f(myArray, 100);

The main issue with these lines of code is that the caller must provide the correct length of the C-array. If not, undefined behavior kicks in.

Think about the last two lines of the code snippet for a few seconds. We start with a C-array and remove its type information by passing it to the function f. This process is called an array to pointer decay and is the reason for many errors. Maybe we counted the number of elements wrong or the size of the C-arrays changed. The result is the same in both cases: undefined behavior.

What should we do? We should use the appropriate data type. C++20 offers std:::span.

void f(std::span<int> a) {
   if (a.size() < 2) return;
   
   int n = a[0]; // OK
   
   std::span<int> q = a.subspan(1); 
   
   if (a.size() < 6) return;
   
   a[4] = 1; 
   
   a[count - 1] = 2; 
   
   use(a.data(), a.size());
}

std::span knows its size. I hear your complaint. C++20 is not an option for you. To our rescue, C++ has templates; therefore, it’s easy to overcome this restriction and write bounds-safe code.

 1 // at.cpp
 2 
 3 #include <algorithm>
 4 #include <array>
 5 #include <deque>
 6 #include <string>
 7 #include <vector>
 8  
 9 template <typename T>
10 void use(T*, int) {}
11
12 template <typename T>
13 void f(T& a) {
14
15   if (a.size() < 2) return;
16
17   int n = a.at(0);
18
19   std::array<typename T::value_type , 99> q;
20   std::copy(a.begin() + 1, a.end(), q.begin());
21
22   if (a.size() < 6) return;
23
24   a.at(4) = 1;
25
26   a.at(a.size() - 1) = 2;
27
28   use(a.data(), a.size());
29 }
30
31 int main() {
32
33   std::array<int, 100> arr{};
34   f(arr);
35
36   std::array<double, 20> arr2{};
37   f(arr2);
38
39   std::vector<double> vec{1, 2, 3, 4, 5, 6, 7, 8, 9};
40   f(vec);
41
42   std::string myString= "123456789";
43   f(myString);

44
45   // std::deque<int> deq{1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
46   // f(deq);
47
48 }

Now the function f works for std::arrays of different sizes and types (lines 34 and 37) but also for a std::vector (line 40) or a std::string (line 43). These containers have in common that their data is stored in a contiguous memory block. This is not the case for std::deque; therefore, the call a.data() in the comment (line 46) fails. The key observation in the example is that the at call on a container checks its boundaries and throws eventually a std::out_of_range exception.

The expression T::value_type helps to get the type of the elements of the container. T is a so-called dependent type because T is a type parameter of the function template f. This is the reason I have to give the compiler a hint that T::value_type is actually a type: typename T::value_type.

ES.45

Avoid “magic constants”; use symbolic constants

A symbolic constant is more explicit than a magic constant. The example in the C++ Core Guidelines starts with the magic constants 1 and 12 and ends with the symbolic constant first_month and last_month.

                    // don't: magic constants 1 and 12
for (int m = 1; m <= 12; ++m) std::cout << month[m] << '\n';
   
   
                  // months are indexed 1..12 (symbolic constant)
constexpr int first_month = 1;
constexpr int last_month = 12;
for (int m = first_month; m <= last_month; ++m) { 
   std::cout << month[m] << '\n';
}

ES.55

Avoid the need for range checking

If you don’t have to check the length of a range, you will not get an off-by-one error. Let’s sum up the elements of a std::vector.

// sumUp.cpp

#include <iostream>
#include <numeric>
#include <vector>

int main() {
   
   std::vector<int> vec{1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
   
   // bad
   int sum1 = 0;
   auto sizeVec = vec.size();
   for (int i = 0; i < sizeVec; ++i) sum1 += vec[i];
   
   std::cout << sum1 << '\n'; // 55
   
   // better
   int sum2 = 0;
   for (auto v: vec) sum2 += v;
   std::cout << sum2 << '\n'; // 55
   
   // the best
   auto sum3 = std::accumulate(vec.begin(), vec.end(), 0);
   std::cout << sum3 << '\n'; // 55

}

Iterating explicitly through a container is very error prone. In contrast, iterating implicitly with a range-based for loop is safe. Additionally, the algorithm std::accumulate of the STL documents its intention.

Pointers

The rules for pointers start with null pointers and continue with the deletion and dereferencing of pointers.

ES.47

Use nullptr rather than 0 or NULL

Why should you not use 0 or NULL to denote a null pointer?

  • 0: The literal 0 can be the null pointer (void*)0 or the number 0. This is dependent on the context. Consequently, what started as null pointer could end up as number.

  • NULL: NULL is a macro, and therefore, you don’t know what’s inside. A possible implementation according to cppreference.com could be the following:

#define NULL 0
//since C++11
#define NULL nullptr

The null pointer nullptr avoids the ambiguity of the number 0 and the macro NULL. nullptr is and remains of type std::nullptr_t. You can assign a nullptr to an arbitrary pointer. The pointer becomes a null pointer and points, therefore, to no data. You cannot dereference a nullptr. The pointer of this type can on one hand be compared with all pointers and can on the other hand be converted to all pointers. You cannot compare and convert a nullptr to an integral type. There is one exception to this rule. nullptr can be explicitly or contextually converted to bool. Hence, you can use a nullptr in a logical expression.

Generic code

Using the three kinds of null pointers in generic code shows immediately the flaws of the number 0 and the macro NULL. Thanks to template argument deduction, the literals 0 and NULL deduce to integral types. The information that both literals should be null pointer constants is lost.

// nullPointer.cpp

#include <cstddef>
#include <iostream>

template<class P >
void functionTemplate(P p) {
   int* a = p;
}

int main() {
   int* a = 0; 
   int* b = NULL; 
   int* c = nullptr;
   
   functionTemplate(0);          // (1)
   functionTemplate(NULL);      // (2)
   functionTemplate(nullptr);
}

You can use 0 and NULL to initialize the int pointer in (1) and (2). But if you use the values 0 and NULL as arguments to the function template, the compiler will loudly complain. See Figure 8.8.

Images

Figure 8.8 The null pointers 0, NULL, and nullptr

The compiler deduces 0 in the function template to type int; it deduces NULL to the type long int. This observation does not hold for nullptr. nullptr preserves its type std::nullptr_t through template argument deduction.

ES.61

Delete arrays using delete[] and non-arrays using delete

Explicit memory management and not using a container of the STL or a smart pointer such as std::unique_ptr<X[]> is very error prone:

void f(int n) {
   auto p = new X[n]; // n default constructed Xs
   // ...
   delete p; // error: just delete the object p, 
            // rather than deleting the array p[]
}

Deleting a C-array with an nonarray delete is undefined behavior.

If you have to manage raw memory, read the rules in the Allocation and Deallocation section of Chapter 7.

ES.65

Don’t dereference an invalid pointer

If you dereference an invalid pointer, your program has undefined behavior. The only way to avoid this behavior is to check your pointer before its usage.

void func(int* p) {
   if (!p) { 
      // do something special
   }
   int x = *p;
}

How can you overcome this issue? Don’t use a naked pointer. Use a smart pointer such as std::unique_ptr or std::shared_ptr if you need pointer-like semantics.

Order of evaluation

If you don’t apply the right order of evaluation in an expression, your program may end in undefined behavior.

ES.43

Avoid expressions with undefined order of evaluation

In C++14, the following expression has undefined behavior.

v[i] = ++i; // the result is undefined

This undefined behavior has been addressed in C++17. With C++17, the order of evaluation of the last code snippet is right to left; therefore, the expression has well-defined behavior.

Here are the additional guarantees we have with C++17:

  • Postfix expressions are evaluated from left to right. This includes function calls and member selection expressions.

  • Assignment expressions are evaluated from right to left. This includes compound assignments such as +=.

  • Operands to shift operators are evaluated from left to right.

Here are a few examples:

a.b
a->b
a->*b
a(b1, b2, b3)
b @= a
a[b]
a << b
a >> b

How should you read these examples? First, a is evaluated and then b.

The function call a(b1, b2, b3) is tricky. With C++17, we have the guarantee that each function argument is entirely evaluated before each of the other function arguments, but the order of the evaluation of the arguments is still unspecified.

Let me elaborate a little bit more on the last sentence.

ES.44

Don’t depend on order of evaluation of function arguments

In the last few years, I have seen many errors because developers assumed that the order of the evaluation of function arguments is left to right. Wrong! There is no such guarantee!

// unspecified.cpp

#include <iostream>

void func(int fir, int sec) {
   std::cout << "(" << fir << "," << sec << ")" << '\n';
}

int main(){
   int i = 0;
   func(i++, i++);
}

The order of the evaluation of the function arguments is unspecified. Unspecified behavior means that the behavior of the program may vary between implementations and the conforming implementation is not required to document the effects of each behavior.

Consequently, the output from GCC and Clang differs even if both compilers conform to the C++ standard (see Figure 8.9).

Images

Figure 8.9 Unspecified behavior

Conversions

Casting types is a common cause of undefined behavior. If necessary, use explicit casts.

ES.48

Avoid casts

Let’s see what happens if I screw up the type system and cast a double to a long int and to a long long int.

// casts.cpp

#include <iostream>

int main() {
   
   double d = 2;
   auto p = (long*)&d;
   auto q = (long long*)&d;
   std::cout << d << ' ' << *p << ' ' << *q << '\n';

}

The result with the Visual Studio compiler is not promising (see Figure 8.10).

Images

Figure 8.10 Wrong casts with the Visual Studio compiler

Nor is the result with the GCC or Clang compiler promising (see Figure 8.11).

Images

Figure 8.11 Wrong casts with the GCC or Clang compiler

What is terrible about the C-cast? You don’t see which cast is actually performed. If you perform a C-cast, a combination of casts is applied if necessary.

Roughly speaking, a C-cast starts with a static_cast, continues with a const_cast, and finally performs a reinterpret_cast.

ES.49

If you must use a cast, use a named cast

The principle from The Zen of Python, “explicit is better than implicit,” also applies to casts in C++: Use a named cast if necessary.

With C++11, we have the following six casts:

  • static_cast: converts between similar types such as pointer types or numeric types

  • const_cast: adds or removes const or volatile

  • reinterpret_cast: converts between pointers or between integral types and pointers

  • dynamic_cast: converts between polymorphic pointers or references in the same class hierarchy

  • std::move: converts to an rvalue reference

  • std::forward: converts an lvalue to an lvalue reference and an rvalue to an rvalue reference

I assume you are surprised that I presented std::move and std::forward as casts. Let’s have a closer look at the internals of std::move:

static_cast<std::remove_reference<decltype(arg)>::type&&>(arg)

What’s happening here? First, the type of the argument arg is deduced by decltype(arg). Afterward, all references are removed, and two new references are added. The function std::remove_reference is from the type-traits library. In the end, we always get an rvalue reference.

ES.50

Don’t cast away const

Casting away const is undefined behavior if the underlying object such as constInt is const and you try to modify the underlying object.

const int constInt = 10;
const int* pToConstInt = &constInt;

int* pToInt = const_cast<int*>(pToConstInt);
*pToInt = 12;        // undefined behavior

You can find the rationale for this rule in the C standard, which is also relevant for the C++ standard: “The implementation may place a const object that is not volatile in a read-only region of storage” (International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 9899:2011, subclause 6.7.3, paragraph 4).

Statements

Statements fall mainly into two categories: iteration statements and selection statements. The rules for both kinds of statements are quite clear. Consequently, I quote the rule of the C++ Core Guidelines and add a few pieces of information when necessary.

Iteration statements

C++ implements three iteration statements: while, do while, and for. With C++11, syntactic sugar is added to the for loop: range-based for loop.

std::vector<int> vec = {0, 1, 2, 3, 4, 5};
   
                     // for loop
for(std::size_t i = 0; i < vec.size(); ++i) {
   std::cout << vec[i] << ' ';
}
   
                     // range-based for loop
for (auto ele : vec) std::cout << ele << ' ';

Selection statements

if and switch are the selection statements of C++ that we inherited from C.

The next two rules related to switch statements need more attention than the ones before.

ES.78

Don’t rely on implicit fallthrough in switch statements

I saw switch statements in legacy code, which had more than 100 case labels. If you use non-empty cases without a break, the maintenance of these switch statements becomes a nightmare. Here is an example from the C++ Core Guidelines:

switch (eventType) {
case Information:
   update_status_bar();
   break;
case Warning:
   write_event_log();
   // Bad - implicit fallthrough

case Error:
   display_error_window();
   break;
}

Maybe you overlooked it. The Warning case has no break statement; therefore, the Error case is automatically executed.

Since C++17, we have a cure with the attribute [[fallthrough]]. Now you can explicitly express your intention. [[fallthrough]] has to be in its own statement line and immediately before a case label. [[fallthrough]] indicates to the compiler that a fallthrough is intentional. Consequently, the compiler may not diagnose a warning.

void f(int n) {
   void g(), h(), i();
   switch (n) {
      case 1:
      case 2:
         g();
         [[fallthrough]];  // (1)
      case 3: 
         h();              // (2)
      case 4: 
         i();
         [[fallthrough]];  // (3)
   }
}

The [[fallthrough]] attribute in (1) suppresses a compiler warning. That does not hold for (2). The compiler may warn. (3) is ill formed because no case label follows.

ES.79

Use default to handle common cases (only)

The program switch.cpp should exemplify this rule.

// switch.cpp

#include <iostream>

enum class Message{
   information,

   warning,
   error,
   fatal
};

void writeMessage() { std::cerr << "message" << '\n'; }
void writeWarning() { std::cerr << "warning" << '\n'; }
void writeUnexpected() { std::cerr << "unexpected" << '\n'; }

void withDefault(Message message) {
   switch(message) {
      case Message::information:
         writeMessage();
         break;
      case Message:: warning:
         writeWarning();
         break;
      default:
   writeUnexpected();
   break;
   }
}

void withoutDefaultGood(Message message) {
   switch(message) {
      case Message::information:
         writeMessage();
         break;
      case Message:: warning:
         writeWarning();
         break;
      default:
         // nothing can be done
         break;
   }
}

void withoutDefaultBad(Message message) {
   switch(message) {
      case Message::information:
         writeMessage();
         break;

      case Message::warning:
         writeWarning();
         break;
   }
}

int main() {
   
   withDefault(Message::fatal);
   withoutDefaultGood(Message::information);
   withoutDefaultBad(Message::warning);

}

The implementation of the functions withDefault and withoutDefaultGood are expressive enough. The maintainer of the function withoutDefaultGood knows because of the comment that there is no default case for this switch statement. Compare the functions withoutDefaultGood and withoutDefaultBad from a maintenance point of view. Do you know if the implementer of the function withoutDefaultBad forgot the default case or if the enumerator’s Message::error and Message::fatal were later added? To make sure, you have to study the source code or ask the original author of the code, if possible.

Arithmetic

The seven arithmetic rules provide a significant surprise potential. They focus on two topics: arithmetic with signed and unsigned integers, and typical arithmetic errors such as overflow/underflow and division by zero.

Arithmetic with signed/unsigned integers

Breaking these arithmetic rules often ends in unexpected results.

ES.100

Don’t mix signed and unsigned arithmetic

If you mix signed and unsigned arithmetic, you may not get the expected result.

// mixSignedUnsigned.cpp

#include <iostream>

int main() {
   
   int x = -3;
   unsigned int y = 7;
   
   std::cout << x - y << '\n'; // 4294967286
   std::cout << x + y << '\n'; // 4
   std::cout << x * y << '\n'; // 4294967275
   std::cout << x / y << '\n'; // 613566756

}

GCC, Clang, and the Microsoft compiler produce the same result.

ES.101

Use unsigned types for bit manipulation

Bit manipulations with bitwise operators (~, >>, >>=, <<, , &, &=, ^, ^=, |, and |=) have implementation-defined behavior when performed on signed operands. Implementation-defined behavior means that the behavior varies between implementations, and the implementation must document the effects of each behavior. Consequently, don’t perform bit manipulations on signed types, but use unsigned types instead:

unsigned char x = 0b00110010;
unsigned char y = ~x; // y == 0b11001101

ES.102

Use signed types for arithmetic

First, you should not do arithmetic with unsigned types because subtraction of two values often gives a negative value. Second, you should not mix signed and unsigned arithmetic according to the previous rule: “ES.100: Don’t mix signed and unsigned arithmetic.” Let’s see what happens when I break the rule.

GCC, Clang, and the Microsoft compiler produce the same result.

// signedTypes.cpp

#include <iostream>

template<typename T, typename T2>
T subtract(T x, T2 y) {
   return x - y;
}

int main() {
   
   int s = 5;
   unsigned int us = 5;
   std::cout << subtract(s, 7) << '\n'; // -2
   std::cout << subtract(us, 7u) << '\n'; // 4294967294
   std::cout << subtract(s, 7u) << '\n'; // -2
   std::cout << subtract(us, 7) << '\n'; // 4294967294
   std::cout << subtract(s, us + 2) << '\n'; // -2
   std::cout << subtract(us, s + 2) << '\n'; // 4294967294
   

}

ES.106

Don’t try to avoid negative values by using unsigned

There is an interesting relation. When you assign a -1 to an unsigned int, you get the largest unsigned int.

The behavior of arithmetic expression may differ between signed and unsigned types.

Let’s start with a simple program.

// modulo.cpp

#include <cstddef>
#include <iostream>

int main(){

   std::cout << '\n';
   
   unsigned int max{100000}; 
   unsigned short x{0}; 
   std::size_t count{0};
   while (x < max && count < 20) {
      std::cout << x << " "; 
      x += 10000; // (1) 
      ++count;
   }
   
   std::cout << "\n\n";

}

The crucial point of the program is that the successive addition to x in (1) does not trigger an overflow but a modulo operation if the value range of x ends. The reason is that x is of type unsigned short.

Making x signed changes the behavior of the program drastically.

// overflow.cpp

#include <cstddef>
#include <iostream>

int main() {
   
   std::cout << '\n';
   
   int max{100000}; 
   short x{0}; 
   std::size_t count{0};
   while (x < max && count < 20) {
      std::cout << x << " ";
      x += 10000; 
      ++count;
   }
   
   std::cout << "\n\n";

}

The addition now triggers an overflow. In Figure 8.12, I marked the key points with red circles.

Images

Figure 8.12 Modulo versus overflow with unsigneds and signeds

Typical arithmetic errors

The following three rules always result in undefined behavior.

ES.103

Don’t overflow

and

ES.104

Don’t underflow

Let me combine both rules. The effect of an overflow or an underflow is the same: memory corruption and, therefore, undefined behavior. Let’s make a simple test with an int array. How long will the following program run when I compile it with GCC?

// overUnderflow.cpp

#include <cstddef>


#include <iostream>

int main() {
   
   int a[0];
   int n = 0;
   
   while (true){
      if (!(n % 100)){
         std::cout << "a[" << n << "] = " << a[n] 
      }
      a[n] = n;
      a[-n] = -n;
      ++n;
   }

}

Disturbingly long. The program writes each one-hundredth array entry to std::cout. See Figure 8.14.

Images

Figure 8.14 Underflow and overflow of a C-array

ES.105

Don’t divide by zero

Dividing by zero crashes with high probability the execution of your program.

auto res = 5 / 0; // crash

Dividing by zero may be fine in a logical expression.

auto res = false and (5 / 0); // fine

The result of the expression (5 / 0) is not necessary for the overall result and is thus not evaluated. This technique is called short circuit evaluation and is a special case of lazy evaluation.

Related rules

I write about the rule “ES.25: Declare an object const or constexpr unless you want to modify its value later on” in Chapter 12, Constants and Immutability.

The section Metaprogramming in Chapter 13 provides an introduction to template metaprogramming and constexpr functions as a replacement for a function-like macro.

The rules related to expressions have a broad focus. Consequently, some of the rules have a strong overlap with already presented rules. Find more details in the referenced rules: