I assume that you already know many nonrules and myths about C++. Some of these nonrules and myths predate modern C++ and sometimes even contradict modern C++ techniques. Sometimes these nonrules and myths were best practices for writing good C++ code. The C++ Core Guidelines address the most resistant don’ts but also provide alternatives.
Don’t insist that all declarations should be at the top of a function |
This rule is a relict of the C89 standard. C89 doesn’t allow the declaration of a variable after a statement. This results in a significant distance between the variable declaration and its usage. Often the variable is not initialized. This is exactly what happens in the example provided by the C++ Core Guidelines:
int use(int x) { int i; char c; double d; // ... some stuff ... if (x < i) { // ... i = f(x, d); } if (i < x) { // ... i = g(x, c); } return ; }
I assume that you have already found the issue in this code snippet. The variable i
(the same holds true for c
and d
) is not initialized because it is a built-in variable used in a local scope, and therefore, the program has undefined behavior. If i
was a user-defined type such as std::string
, all would be fine. So, what should you do?
Place the declaration of i
directly before its first usage.
Always initialize a variable such as in int i{}
, or better, use auto
. The compiler cannot guess from a declaration such as auto i;
the type of i
and, therefore, rejects the program. To put it the other way around: auto
forces you to initialize variables.
Don’t insist to have only a single return-statement in a function |
When you follow this rule, you implicitly apply the first nonrule.
template<class T> std::string sign(T x) { std::string res; if (x < 0) res = "negative"; else if (x > 0) res = "positive"; else res = "zero"; return res; }
Using more than one return
statement makes the code easier to read and also faster.
template<class T> std::string sign(T x) { if (x < 0) return "negative"; else if (x > 0) return "positive"; return "zero"; }
What happens if automatic return-type deduction returns different types?
// differentReturnTypes.cpp template <typename T> auto getValue(T x) { if (x < 0) // int return -1; else if (x > 0) return 1.0; // double else return 0.0f; // float } int main(){ getValue(5.5); }
As expected, the program is not valid. See Figure 18.1.
Figure 18.1 Different return types in a function
The rule starts by stating the four main reasons against exceptions:
Exceptions are inefficient.
Exceptions lead to leaks and errors.
Exception performance is not predictable.
Exception handling run-time support takes too much space.
The C++ Core Guidelines have profound responses to these statements.
First, the efficiency of exception handling is compared to a program that just terminates or displays the error code. Often the exception-handling implementation is poor. Of course, a comparison makes no sense in such cases. I want to explicitly mention the Technical Report on C++ Performance (TR18015.pdf), which presents two typical ways used by compilers to implement exceptions:
The code approach, where code is associated with each try-block
The table approach, which uses compiler-generated static tables
Simply said, the code approach has the downside that even when no exception is thrown, the bookkeeping of the exception-handling stack must be performed and, therefore, code unrelated to error handling slows down. This downside does not apply to the table approach, because it introduces no stack or run-time costs when no exception is thrown. In contrast, the table approach seems to be more complicated to implement, and the static table can get quite big.
I have nothing to add to point two. Exceptions cannot be blamed for a missing resource management strategy.
Third, if you have hard real-time guarantees to fulfill so that an answer that is too late is a wrong answer, an exception implementation based on the table approach will not—as we saw—affect the run time of the program in the good case. Honestly, even if you have a hard real-time system, this hard real-time restriction typically applies to only a small part of your system.
Instead of arguing against the nonrules, here are the reasons for using exceptions:
Exceptions
Clearly differentiate between erroneous return and ordinary return
Cannot be forgotten or ignored
Can be used systematically
Let me add an anecdote about a situation that I once faced in a legacy code base. The system used error codes to signal the success or failure of a function. They checked the error codes. This was fine. But due to the error codes, the functions didn’t use return values. The consequence was that the functions operated on global variables and, consequently, had no parameters because they used the global variables anyway. The end of the story was that the system was not maintainable or testable, and my job was to refactor it.
To get more information about the correct handling of errors, read Chapter 11, Error Handling.
Don’t insist on placing each class declaration in its own source file |
The adequate way to structure your code is not to use files; the correct way is to use namespaces. Using a file for each class declaration results in many files and can make your program, therefore, harder to manage and slower to compile.
Obviously, the job of a constructor is straightforward: After the constructor is executed, you should have a fully initialized object. For that reason, the following code snippet from the C++ Core Guidelines is bad.
class Picture { int mx; int my; char * data; public: Picture(int x, int y) { mx = x, my = y; data = nullptr; } ~Picture() { Cleanup(); } bool Init() { // invariant checks if (mx <= 0 || my <= 0) { return false; } if (data) { return false; } data = (char*) malloc(x*y*sizeof(int)); return data != nullptr; } void Cleanup() { // (2) if (data) free(data); data = nullptr; } }; Picture picture(100, 0); // this will fail.. // (1) if (!picture.Init()) { puts("Error, invalid picture"); }
picture(100, 0)
is not initialized, and therefore, all operations on picture
in (1) operate on an invalid picture. The solution to this problem is as simple as it is effective: Put all initialization into the constructor.
class Picture { std::size_t mx; std::size_t my; std::vector<char> data; static size_t check_size(size_t s) { Expects(s > 0); return s; } public: Picture(size_t x, size_t y) : mx(check_size(x)) , my(check_size(y)) , data(mx * my * sizeof(int)) { } };
Additionally, data
is in the second example a std::vector
instead of a raw pointer. This means the cleanup function (2) from the first example is not necessary anymore because the compiler automatically cleans up. Thanks to the static function check_size
, the constructor can validate its arguments. But this is not the end of the benefits modern C++ gives us.
Often you use a constructor to set the default behavior of an object. Don’t do it. Directly set the default behavior of an object in the class body. Use constructors to vary the default behavior: “C.45: Don’t define a default constructor that only initializes data members; use member initializers instead.”
init
member functions are often used to put common initialization or validation routines into one place. You invoke them immediately after the constructor call. Fine, you follow the essential DRY (don’t repeat yourself) principle, but you automatically break another important principle: Objects should be fully initialized after the constructor call. How can you solve this riddle? Quite easily. Since C++11, we have had constructor delegation. This means that you put the common initialization and validation logic into one smart constructor and use the other constructors as a kind of wrapper constructor: “C.51: Use delegating constructors to represent common actions for all constructors of a class.”
Don’t place all cleanup actions at the end of a function and |
Okay, we can and should do better than the following code from the C++ Core Guidelines:
void do_something(int n) { if (n < 100) goto exit; // ... int* p = (int*) malloc(n); // ... exit: free(p); }
By the way, do you spot the error? The jump goto exit
bypasses the definition of the pointer p
.
What I often saw in legacy C code was code structured like this.
// lifecycle.c #include <stdio.h> void initDevice(const char* mess) { printf("\n\nINIT: %s\n",mess); } void work(const char* mess) { printf("WORKING: %s",mess); } void shutDownDevice(const char* mess) { printf("\nSHUT DOWN: %s\n\n",mess); } int main(void) { initDevice("DEVICE 1"); work("DEVICE1"); { initDevice("DEVICE 2"); work("DEVICE2"); shutDownDevice("DEVICE 2"); } work("DEVICE 1"); shutDownDevice("DEVICE 1"); return 0; }
This code is very error prone. Each usage of the device consists of three steps: initialization, usage, and release of the device. This is a job for RAII: “R.1: Manage resources automatically using resource handles and RAII (Resource Acquisition Is Initialization).”
// lifecycle.cpp #include <iostream> #include <string> class Device { public: Device(const std::string& res):resource(res) { std::cout << "\nINIT: " << resource << ".\n"; } void work() const { std::cout << "WORKING: " << resource << '\n'; } ~Device() { std::cout << "SHUT DOWN: "<< resource << ".\n\n"; } private: const std::string resource; }; int main() { Device resGuard1{"DEVICE 1"}; resGuard1.work(); { Device resGuard2{"DEVICE 2"}; resGuard2.work(); } resGuard1.work(); }
Initialize the resource in the constructor and release it in the destructor. First, you cannot forget to initialize the object, and second, the compiler takes care of releasing the resource. The output of both programs is equivalent (see Figure 18.2).
Figure 18.2. Automatic managing of a device
Protected data makes your program complex and error prone. If you put protected data into a base class, you cannot reason about derived classes in isolation and, therefore, you break encapsulation. You always have to reason about the entire class hierarchy.
Protected data means you have to answer at least these three questions.
Do I have to implement a constructor in a derived class to initialize the protected data?
What is the actual value of the protected data if I use it?
Who is affected if I modify the protected data?
Answering these questions becomes more and more complicated the deeper your class hierarchy becomes.
Protected data is a kind of global data within the scope of the class hierarchy. And you know mutable, shared state is terrible. It makes testing and concurrency quite tricky, for example.