Cippi admires the ISO Standard.
Despite the standard library’s crucial importance, this section is not exhaustive. Many rules are missing, the mentioned rules are often quite concise, other rules are already the topic of other parts of the C++ Core Guidelines. Consequently, I complement those rules with additional information when necessary.
Let me start with a significant rule.
I assume that you know about std::vector
. Why should you prefer std::vector
to a C-array?
std::vector
One of the big advantages of a std::vector
compared to a C-array is that the std::vector
automatically manages its memory. Of course, that holds true for all standard containers. The following program gives a closer look at the automatic memory management provided by std::vector
.
// vectorMemory.cpp #include <iostream> #include <string> #include <vector> template <typename T> void showInfo(const T& t, const std::string& name) { std::cout << name << " t.size(): " << t.size() << '\n'; std::cout << name << " t.capacity(): " << t.capacity() << '\n'; } int main() { std::cout << '\n'; std::vector<int> vec; // (1) std::cout << "Maximal size: " << '\n'; std::cout << "vec.max_size(): " << vec.max_size() << '\n'; // (2) std::cout << '\n'; std::cout << "Empty vector: " << '\n'; showInfo(vec, "Vector"); std::cout << '\n'; std::cout << "Initialized with five values: " << '\n'; vec = {1,2,3,4,5}; showInfo(vec, "Vector"); // (3) std::cout << '\n'; std::cout << "Added four additional values: " << '\n'; vec.insert(vec.end(),{6,7,8,9}); showInfo(vec,"Vector"); // (4) std::cout << '\n'; std::cout << "Resized to 30 values: " << '\n'; vec.resize(30); showInfo(vec,"Vector"); // (5) std::cout << '\n'; std::cout << "Reserved space for at least 1000 values: " << '\n'; vec.reserve(1000); showInfo(vec,"Vector"); // (6) std::cout << '\n'; std::cout << "Shrinked to the current size: " << '\n'; vec.shrink_to_fit(); // (7) showInfo(vec,"Vector"); }
To spare typing, I wrote the small function showInfo. showInfo
prints out the size and the capacity of a vector. The size of a vector is its number of elements; the capacity of a container is the number of elements a vector can hold without an additional memory allocation. Therefore, the capacity of a vector has to be at least as big as its size. You can adjust the size of a vector with its method resize
; you can adjust the capacity of a container with its member function reserve
.
But back to the program from top to bottom. I create an empty vector (1). Afterward, the program displays number of elements a vector can have (2). After each operation, I output its size and capacity. That happens for the initialization of the vector (3), the addition of four new elements (4), the resizing of the containers to 30 elements (5), and the reserving of additional memory for at least 1,000 elements (6). With C++11, you can shrink a vector with the member function shrink_to_fit
(7). That sets the vector’s capacity to its size.
Before I present the output of the program in Figure 16.1, I have a few remarks.
Figure 16.1 Automatic management of memory
The adjustment of the size and the capacity of the container is done automatically. I don’t have to use any memory operations like new
and delete
.
By using the member function vec.resize(n)
, the vector vec
gets default-initialized elements if n > vec.size()
.
By using the member function vec.reserve(n)
, the container vec
gets new memory for at least n elements if n > vec.capacity()
.
The call shrink_to_fit
is nonbinding. That means the C++ run time doesn’t have to adjust the capacity of a container to its size. But my usage so far of the member function shrink_to_fit
with GCC, Clang, or cl.exe has always freed unnecessary memory.
std::array
Okay, but what is the difference between a C-array and a C++-array?
std::array
combines the best of two worlds. On the one hand, std::array
has the size and the efficiency of a C-array; on the other hand, std::array
has the interface of a std::vector
.
My small program compares the memory efficiency of a C-array, a C++-array (std::array
), and a std::vector
. See Figure 16.2.
Figure 16.2 sizeof a
C-array, a C++-array, and a std::vector
// sizeof.cpp #include <iostream> #include <array> #include <vector> int main() { std::cout << '\n'; std::cout << "sizeof(int)= " << sizeof(int) << '\n'; std::cout << '\n'; int cArr[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; std::array<int, 10> cppArr = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; std::vector<int> cppVec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; std::cout << "sizeof(cArr)= " << sizeof(cArr) << '\n'; // (1) std::cout << "sizeof(cppArr)= " << sizeof(cppArr) << '\n'; // (2) // (3) std::cout << "sizeof(cppVec) = " << sizeof(cppVec) + sizeof(int) * cppVec.capacity() << '\n'; std::cout << " = sizeof(cppVec): " << sizeof(cppVec) << '\n'; std::cout << " + sizeof(int)* cppVec.capacity(): " << sizeof(int)* cppVec.capacity() << '\n'; std::cout << '\n'; }
Both the C-array (1) and the C++-array (2) occupy 40 bytes. That is precisely sizeof(int) * 10
. In contrast, the std::vector
needs an additional 24 bytes (3) to manage its data on the heap.
This was the C part of a std::array
, but the std::array
supports to a large extent the interface of a std::vector
. Supporting the interface of a std::vector
means, in particular, that std::array
knows its size.
Prefer using STL |
If you want to add elements to your container or remove elements from your container at run time, use a std::vector
; if not, use a std::array
. Additionally, a std::vector
can be much larger than a std::array
because its elements go to the heap. std::array
uses a buffer that is local to the context in which it is being used.
std::array
and std::vector
offer the following advantages:
The fastest general-purpose access (random access, including being CPU vectorization friendly)
The fastest default access pattern (begin-to-end or end-to-begin is CPU cache prefetcher friendly)
The lowest space overhead (contiguous layout has zero per-element overhead, which is CPU cache friendly)
std::array
and std::vector
support the index operator, which boils down to pointer arithmetic. Consequently, advantage 1 is obvious. Advantage 2 was discussed in the section about performance. Read the details in the rule “Per.19: Access memory predictably.” The last rule already covered advantage 3: “SL.con.1: Prefer using STL array or vector instead of a C array.” std::array
is comparable in size to a C-array, and std::vector
adds 24 bytes.
In the case of the C-array, there is no help: detecting a bounds error. Ignoring the bounds of a C-array can go unnoticed for too long. The rules “ES.103: Don’t over-flow,” and “ES.104: Don’t underflow” in Chapter 8, Expressions and Statements, clearly demonstrate the risks.
In the case of the C-array, there is no support to detect a bounds error. Many of the containers of the STL support an at
member function that checks boundaries. In the case of accessing a nonexisting element, a std::out_of_range
exception is thrown. The following containers have a boundary-checking at
member function:
Sequence container: std::array
, std::vector
, and std::deque
Associative container: std::map
and std::unordered_map
std::string
The std::string
in the next example shows the boundary check.
// stringBoundsCheck.cpp #include <stdexcept> #include <iostream> #include <string> int main() { std::cout << '\n'; std::string str("1123456789"); str.at(0) = '0'; // (1) std::cout << str << '\n'; std::cout << "str.size(): " << str.size() << '\n'; std::cout << "str.capacity() = " << str.capacity() << '\n'; try { str.at(12) = 'X'; // (2) } catch (const std::out_of_range& exc) { std::cout << exc.what() << '\n'; } std::cout << '\n'; }
Setting the first character of the string str
to ‘0’ (1) is fine, but accessing a character outside the size is an error. This even occurs if the access is within the capacity but outside the size of the std::string
.
The size of a std::string str
is the number of elements the str
has.
The capacity of a str
is the number of elements a str
could have without allocating additional memory.
Compiling the program with GCC 8.2 and executing it produces a quite explicit error message. See Figure 16.3.
Figure 16.3 Accessing a nonexisting element of a std::string
There are various kinds of text and various ways to present this text. Table 16.1 gives you a preview before I dive into the rules.
Table 16.1 Various kinds of text
Text |
Semantic |
Rule |
---|---|---|
|
Owns a character sequence |
|
|
Refers to a character sequence |
|
|
Refers to a single character |
|
|
Refers to byte values (not necessarily characters) |
To summarize, only std::string
is an owner. All the others refer to existing text.
Maybe you know another string that owns its character sequence: a C-string. Don’t use a C-string! Why? Because you have to manually take care of the memory management, the string termination character, and the length of the string.
// stringC.c #include <stdio.h> #include <string.h> int main( void ) { char text[10]; strcpy(text, "The Text is too long for text."); // (1) too long printf("strlen(text): %u\n", strlen(text)); // (2) missing '\0' printf("%s\n", text); text[sizeof(text)-1] = '\0'; printf("strlen(text): %u\n", strlen(text)); return 0; }
The simple program stringC.c
has undefined behavior (1) and (2). Compiling it with a rusty GCC 4.8 seems to work. See Figure 16.4.
Figure 16.4 Undefined behavior with a C-string
The C++ equivalent does not have the same issues.
// stringCpp.cpp #include <iostream> #include <string> int main() { std::string text{"The Text is not too long."}; std::cout << "text.size(): " << text.size() << '\n'; std::cout << text << '\n'; text +=" And can still grow!"; std::cout << "text.size(): " << text.size() << '\n'; std::cout << text << '\n'; }
In the case of a C++-string, you cannot make an error because the C++ run time takes care of the memory management and the termination character. Additionally, if you access the elements of the C++-string with the at
operator instead of the index operator, bounds errors are automatically detected. You can read the details on the at
operator in the rule “SL.con.3: Avoid bounds errors.”
A std::string_view
refers to the character sequence. To say it more explicitly: A std::string_view
does not own the character sequence. It represents a view of a sequence of characters. This sequence of characters can be a C++-string or C-string. A std::string_view
needs two pieces of information: the pointer to the character sequence and the length. It supports the reading part of the interface of std::string
. In addition to a std::string
, std::string_view
has two modifying operations: remove_prefix
and remove_suffix
.
std::string_view
shines brightly when it comes to memory allocation.
// stringView.cpp; C++20 #include <cassert> #include <iostream> #include <string> #include <string_view> void* operator new(std::size_t count) { // (1) std::cout << " " << count << " bytes" << '\n'; return malloc(count); } void getString(const std::string& str) {} void getStringView(std::string_view strView) {} int main() { std::cout << '\n'; std::cout << "std::string" << '\n'; // (2) std::string large = "0123456789-123456789-123456789-123456789"; std::string substr = large.substr(10); // (2) std::cout << '\n'; std::cout << "std::string_view" << '\n'; // (3) std::string_view largeStringView{large.c_str(), large.size()}; largeStringView.remove_prefix(10); // (3) assert(substr == largeStringView); std::cout << '\n'; std::cout << "getString" << '\n'; getString(large); getString("0123456789-123456789-123456789-123456789"); // (2) const char message []= "0123456789-123456789-123456789-123456789"; getString(message); // (2) std::cout << '\n'; std::cout << "getStringView" << '\n'; getStringView(large); // (3) getStringView("0123456789-123456789-123456789-123456789"); getStringView(message); // (3) std::cout << '\n'; }
I overloaded the global operator new
(1) to trace each memory allocation. Memory allocations take place in (2) but not in (3). See Figure 16.5.
Figure 16.5 No memory allocation with std::string_view
If you don’t follow this rule and use const char*
as a C-string, you may end up with a critical issue such as the following one.
char arr[] = {'a', 'b', 'c'}; void print(const char* p) { std::cout << p << '\n'; } void use() { print(arr); // undefined behavior }
arr
decays to a pointer when used as an argument of the function print
. The issue is that arr
is not zero terminated. The call print(arr)
has undefined behavior.
Use |
std::byte
(C++17) is a distinct type implementing the concept of a byte as specified in the C++ language definition. This means a byte is neither an integer nor a character. Its job is to access object storage. std::byte
’s interface consists of methods for bitwise logical operations.
template <class IntType> constexpr byte operator << (byte b, IntType shift); template <class IntType> constexpr byte operator >> (byte b, IntType shift); constexpr byte operator | (byte l, byte r); constexpr byte operator & (byte l, byte r); constexpr byte operator ~ (byte b); constexpr byte operator ^ (byte l, byte r);
You can use the function std::to_integer(std::byte b)
to convert a std::byte
to an integer type and the call std::byte{integer}
to do it the other way around. integer
has to be a non-negative value smaller than std::numeric_limits<unsigned_char>::max()
.
Use the |
Before C++14, there was no way to create a C++-string without a C-string. This is strange because we want to get rid of the C-string. With C++14, we got C++-string literals. They’re C-string literals with the suffix s: "cStringLiteral"s
.
Let me show you an example that makes my point: C-string literals and C++-string literals are different.
// stringLiteral.cpp #include <iostream> #include <string> #include <utility> int main() { std::string hello = "hello"; auto firstPair = std::make_pair(hello, 5); auto secondPair = std::make_pair("hello", 15); // (2) ERROR using namespace std::string_literals; // (1) // auto secondPair = std::make_pair("hello"s, 15); // (3) OK if (firstPair < secondPair) std::cout << "true\n"; // (4) }
I have to include the namespace std::string_literals
(1) to use the C++-string literals. Lines (2) and (3) are the critical lines in the example. I use the C-string literal "hello"
to create a C++-string (2). This is the reason that the type of firstPair
is of type (std::string, int)
, but the type of the secondPair
is (const char*, int)
. In the end, the program does not compile when I use (2). The program compiles and the comparison works when I use (3).
When you interact with the outside world, two input/output libraries come into play: the stream-based I/O library (short for iostream library) and the C-style I/O functions. Of course, you should prefer the iostream library. The C++ Core Guidelines give a good overview of iostreams: “iostreams
is a type safe, extensible, formatted and unformatted I/O library for streaming I/O. It supports multiple (and user extensible) buffering strategies and multiple locales. It can be used for conventional I/O, reading and writing to memory (string streams), and user-defined extensions, such as streaming across networks (asio: not yet standardized).”
First, here is a bad example from the guidelines: using character-level input for more than one character.
char c; char buf[128]; int i = 0; while (cin.get(c) && !isspace(c) && i < 128) buf[i++] = c; if (i == 128) { // ... handle too long string .... }
Honestly, this is a terrible solution for a simple job. Here is the right way to do it:
std::string s; std::cin >> s;
Each stream has a state associated with it, which is represented by flags. See Table 16.2.
Table 16.2 State of the stream
Flag |
Query of the flag |
Description |
Examples |
---|---|---|---|
|
|
No bit set |
|
|
|
End-of-file bit set |
|
|
|
Error |
|
|
|
Undefined behavior |
|
Operations on a stream have an effect only if the stream is in the std::ios::goodbit
state. If the stream is in the std::ios::badbit
state, it cannot be reset to the std::ios::goodbit
state.
// streamState.cpp #include <ios> #include <iostream> int main() { std::cout << std::boolalpha << '\n'; std::cout << "In failbit-state: " << std::cin.fail() << '\n'; std::cout << '\n'; int myInt; while (std::cin >> myInt){ std::cout << "Output: " << myInt << '\n'; std::cout << "In failbit-state: " << std::cin.fail() << '\n'; std::cout << '\n'; } std::cout << "In failbit-state: " << std::cin.fail() << '\n'; std::cin.clear(); std::cout << "In failbit-state: " << std::cin.fail() << '\n'; std::cout << '\n'; }
The input of the text wrongInput
causes the stream std::cin
to be in the std::ios::failbit
state. Consequently, wrongInput
and std::cin.fail()
cannot be displayed. First, you have to set the stream std::cin
to the std::ios::goodbit state
.
Why should you prefer iostreams to printf
? There is a subtle but critical difference between printf
and iostreams. The format string with printf
specifies the format, and the type of the displayed value, while the format manipulator with iostreams specifies only the format. To say it the other way around: The compiler deduces the correct type automatically in case of iostreams.
The following program makes my point clear. When you specify the wrong type in a format string, you get undefined behavior.
// printfIostreamsUndefinedBehavior.cpp #include <cstdio> #include <iostream> int main() { printf("\n"); printf("2011: %d\n",2011); printf("3.1416: %d\n",3.1416); printf("\"2011\": %d\n","2011"); // printf("%s\n",2011); // segmentation fault std::cout << '\n'; std::cout << "2011: " << 2011 << '\n'; std::cout << "3.146: " << 3.1416 << '\n'; std::cout << "\"2011\": " << "2011" << '\n'; std::cout << '\n'; }
Figure 16.6 shows how this undefined behavior manifests itself on my computer.
Figure 16.6 Undefined behavior with printf
You may assume that the compiler issues a warning in the case of a wrong format string, but you have no guarantee. Additionally, I know what often happens when the deadline has passed. You ignore the warnings and maybe decide to look into it later. Instead of facing the consequences of those errors later, avoid the errors in the first place.
Unless you use |
Per default, operations on the C++ streams are synchronized with the C streams. This synchronization happens after each input or output operation.
C++ streams: std::cin
, std::cout
, std::cerr
, std::clog
, std::wcin
, std::wcout
, std::wcerr
, and std::wclog
This synchronization allows mixing C++ and C input or output operations because operations on the C++ streams go unbuffered to the C streams. What is also important to note from the concurrency perspective is that synchronized C++ streams are thread safe. All threads can write to the C++ streams without any need for synchronization. The effect may be an interleaving of characters but not a data race.
When you set the std::ios_base::sync_with_stdio(false)
, the synchronization between C++ streams and C streams does not happen because the C++ streams may put their output into a buffer. Because of the buffering, the input and output operation may become faster. You have to invoke std::ios_base::sync_with_stdio(false)
before any input or output operation. If not, the behavior is implementation defined.
Why should you avoid std::endl
? Or to say it differently: What is the difference between the manipulators std::endl
and '\n'
?
std::endl:
writes a newline and flushes the output buffer
'\n':
writes a newline
Flushing the buffer is an expensive operation and should, therefore, be avoided. If necessary, the buffer is automatically flushed. Honestly, I was curious to see the benchmarks. To simulate the worst case, here is my program, which puts a line break (1) after each character.
// syncWithStdioPerformanceEndl.cpp #include <chrono> #include <fstream> #include <iostream> #include <random> #include <sstream> #include <string> constexpr int iterations = 500; // (2) std::ifstream openFile(const std::string& myFile){ std::ifstream file(myFile, std::ios::in); if ( !file ){ std::cerr << "Can't open file "+ myFile + "!" << '\n'; exit(EXIT_FAILURE); } return file; } std::string readFile(std::ifstream file){ std::stringstream buffer; buffer << file.rdbuf(); return buffer.str(); } template <typename End> auto writeToConsole(const std::string& fileContent, End end){ auto start = std::chrono::steady_clock::now(); for (auto c: fileContent) std::cout << c << end; // (1) std::chrono::duration<double> dur = std::chrono::steady_clock::now() - start; return dur; } template <typename Function> auto measureTime(std::size_t iter, Function&& f){ std::chrono::duration<double> dur{}; for (int i = 0; i < iter; ++i){ dur += f(); } return dur / iter; } int main(int argc, char* argv[]){ std::cout << '\n'; // get the filename std::string myFile; if ( argc == 2 ){ myFile= argv[1]; } else { std::cerr << "Filename missing !" << '\n'; exit(EXIT_FAILURE); } std::ifstream file = openFile(myFile); std::string fileContent = readFile(std::move(file)); // (3) auto averageWithFlush = measureTime(iterations, [&fileContent] { return writeToConsole(fileContent, std::endl<char, std::char_traits<char>>); }); // (4) auto averageWithoutFlush = measureTime(iterations, [&fileContent] { return writeToConsole(fileContent, '\n'); }); std::cout << '\n'; std::cout << "With flush(std::endl) " << averageWithFlush.count() << " seconds" << '\n'; std::cout << "Without flush(\\n): " << averageWithoutFlush.count() << " seconds" << '\n'; std::cout << "With Flush/Without Flush: " << averageWithFlush/averageWithoutFlush << '\n'; std::cout << '\n'; }
In the first case, I execute the program with std::endl
(3); in the second case, I execute it with '\n'
(4). When I perform the program with 500 iterations (2), I get the expected winner. '\n'
is about 10% to 20% faster on Linux (GCC) and Windows (cl.exe) than std::endl
.
Here are the concrete numbers.
GCC (see Figure 16.7).
Figure 16.7 Performance with/without flushing on Linux
cl.exe (see Figure 16.8).
Figure 16.8 Performance with/without flush on Windows
The standard library is an important part of the C++ standard: “ES.1: Prefer the standard library to other libraries and to ‘handcrafted code.’ ” This means that the rules in this book address various aspects of the library. Prominent examples are smart pointers in Chapter 7, Resource Management, or the threading components in Chapter 10, Concurrency.
Many rules present the pros of the STL containers over C-arrays. For completeness, here are a few of the rules: