Chapter 16

The Standard Library

Images

Cippi admires the ISO Standard.

Despite the standard library’s crucial importance, this section is not exhaustive. Many rules are missing, the mentioned rules are often quite concise, other rules are already the topic of other parts of the C++ Core Guidelines. Consequently, I complement those rules with additional information when necessary.

Containers

Let me start with a significant rule.

SL.con.1

Prefer using STL array or vector instead of a C-array

I assume that you know about std::vector. Why should you prefer std::vector to a C-array?

std::vector

One of the big advantages of a std::vector compared to a C-array is that the std::vector automatically manages its memory. Of course, that holds true for all standard containers. The following program gives a closer look at the automatic memory management provided by std::vector.

// vectorMemory.cpp
 
#include <iostream>
#include <string>
#include <vector>
 
template <typename T>
void showInfo(const T& t, const std::string& name) {
 
  std::cout << name << " t.size(): " << t.size() << '\n';
  std::cout << name << " t.capacity(): " << t.capacity() << '\n';
 
}
 
int main() {
 
  std::cout << '\n';
 
  std::vector<int> vec;                                      // (1)
 
  std::cout << "Maximal size: " << '\n';
  std::cout << "vec.max_size(): " << vec.max_size() << '\n'; // (2)
  std::cout << '\n';

  std::cout << "Empty vector: " << '\n';
  showInfo(vec, "Vector");
  std::cout << '\n';
 
  std::cout << "Initialized with five values: " << '\n'; 
  vec = {1,2,3,4,5};
  showInfo(vec, "Vector");                                   // (3)
  std::cout << '\n';
 
  std::cout << "Added four additional values: " << '\n';
  vec.insert(vec.end(),{6,7,8,9});
  showInfo(vec,"Vector");                                    // (4)
  std::cout << '\n';
 
  std::cout << "Resized to 30 values: " << '\n';
  vec.resize(30);
  showInfo(vec,"Vector");                                    // (5)
  std::cout << '\n';
 
  std::cout << "Reserved space for at least 1000 values: " << '\n';
  vec.reserve(1000);
  showInfo(vec,"Vector");                                    // (6)
  std::cout << '\n';
  
  std::cout << "Shrinked to the current size: " << '\n';
  vec.shrink_to_fit();                                       // (7)
  showInfo(vec,"Vector");
 
}

To spare typing, I wrote the small function showInfo. showInfo prints out the size and the capacity of a vector. The size of a vector is its number of elements; the capacity of a container is the number of elements a vector can hold without an additional memory allocation. Therefore, the capacity of a vector has to be at least as big as its size. You can adjust the size of a vector with its method resize; you can adjust the capacity of a container with its member function reserve.

But back to the program from top to bottom. I create an empty vector (1). Afterward, the program displays number of elements a vector can have (2). After each operation, I output its size and capacity. That happens for the initialization of the vector (3), the addition of four new elements (4), the resizing of the containers to 30 elements (5), and the reserving of additional memory for at least 1,000 elements (6). With C++11, you can shrink a vector with the member function shrink_to_fit (7). That sets the vector’s capacity to its size.

Before I present the output of the program in Figure 16.1, I have a few remarks.

Images

Figure 16.1 Automatic management of memory

  • The adjustment of the size and the capacity of the container is done automatically. I don’t have to use any memory operations like new and delete.

  • By using the member function vec.resize(n), the vector vec gets default-initialized elements if n > vec.size().

  • By using the member function vec.reserve(n), the container vec gets new memory for at least n elements if n > vec.capacity().

  • The call shrink_to_fit is nonbinding. That means the C++ run time doesn’t have to adjust the capacity of a container to its size. But my usage so far of the member function shrink_to_fit with GCC, Clang, or cl.exe has always freed unnecessary memory.

std::array

Okay, but what is the difference between a C-array and a C++-array?

std::array combines the best of two worlds. On the one hand, std::array has the size and the efficiency of a C-array; on the other hand, std::array has the interface of a std::vector.

My small program compares the memory efficiency of a C-array, a C++-array (std::array), and a std::vector. See Figure 16.2.

Images

Figure 16.2 sizeof a C-array, a C++-array, and a std::vector

// sizeof.cpp
 
#include <iostream>
#include <array>
#include <vector>
 
int main() {
 
  std::cout << '\n';
 
  std::cout << "sizeof(int)= " << sizeof(int) << '\n';
 
  std::cout << '\n';
 
  int cArr[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
 
  std::array<int, 10> cppArr = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
 
  std::vector<int> cppVec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
 
  std::cout << "sizeof(cArr)= " << sizeof(cArr) << '\n';     // (1)
 
  std::cout << "sizeof(cppArr)= " << sizeof(cppArr) << '\n'; // (2)
 
                                                             // (3)
  std::cout << "sizeof(cppVec) = " << sizeof(cppVec) + sizeof(int) 
                                   * cppVec.capacity() << '\n';
 std::cout << "              = sizeof(cppVec): " 
           << sizeof(cppVec) << '\n';
 std::cout << "              + sizeof(int)* cppVec.capacity(): " 
           << sizeof(int)* cppVec.capacity() << '\n';
 
 std::cout << '\n';
 
}

Both the C-array (1) and the C++-array (2) occupy 40 bytes. That is precisely sizeof(int) * 10. In contrast, the std::vector needs an additional 24 bytes (3) to manage its data on the heap.

This was the C part of a std::array, but the std::array supports to a large extent the interface of a std::vector. Supporting the interface of a std::vector means, in particular, that std::array knows its size.

SL.con.2

Prefer using STL vector by default unless you have a reason to use a different container

If you want to add elements to your container or remove elements from your container at run time, use a std::vector; if not, use a std::array. Additionally, a std::vector can be much larger than a std::array because its elements go to the heap. std::array uses a buffer that is local to the context in which it is being used.

std::array and std::vector offer the following advantages:

  1. The fastest general-purpose access (random access, including being CPU vectorization friendly)

  2. The fastest default access pattern (begin-to-end or end-to-begin is CPU cache prefetcher friendly)

  3. The lowest space overhead (contiguous layout has zero per-element overhead, which is CPU cache friendly)

std::array and std::vector support the index operator, which boils down to pointer arithmetic. Consequently, advantage 1 is obvious. Advantage 2 was discussed in the section about performance. Read the details in the rule “Per.19: Access memory predictably.” The last rule already covered advantage 3: “SL.con.1: Prefer using STL array or vector instead of a C array.” std::array is comparable in size to a C-array, and std::vector adds 24 bytes.

SL.con.3

Avoid bounds errors

In the case of the C-array, there is no help: detecting a bounds error. Ignoring the bounds of a C-array can go unnoticed for too long. The rules “ES.103: Don’t over-flow,” and “ES.104: Don’t underflow” in Chapter 8, Expressions and Statements, clearly demonstrate the risks.

In the case of the C-array, there is no support to detect a bounds error. Many of the containers of the STL support an at member function that checks boundaries. In the case of accessing a nonexisting element, a std::out_of_range exception is thrown. The following containers have a boundary-checking at member function:

  • Sequence container: std::array, std::vector, and std::deque

  • Associative container: std::map and std::unordered_map

  • std::string

The std::string in the next example shows the boundary check.

// stringBoundsCheck.cpp
 
#include <stdexcept>
#include <iostream>
#include <string>
 
int main() {
 
   std::cout << '\n';
  
   std::string str("1123456789"); 
 
   str.at(0) = '0';                                // (1)
 
   std::cout << str << '\n';

   std::cout << "str.size(): " << str.size() << '\n';
   std::cout << "str.capacity() = " << str.capacity() << '\n';
 
 try {
    str.at(12) = 'X';                              // (2)
 }
 catch (const std::out_of_range& exc) {
     std::cout << exc.what() << '\n';
 }
 
 std::cout << '\n';
 
}

Setting the first character of the string str to ‘0’ (1) is fine, but accessing a character outside the size is an error. This even occurs if the access is within the capacity but outside the size of the std::string.

  1. The size of a std::string str is the number of elements the str has.

  2. The capacity of a str is the number of elements a str could have without allocating additional memory.

Compiling the program with GCC 8.2 and executing it produces a quite explicit error message. See Figure 16.3.

Images

Figure 16.3 Accessing a nonexisting element of a std::string

Text

There are various kinds of text and various ways to present this text. Table 16.1 gives you a preview before I dive into the rules.

Table 16.1 Various kinds of text

Text

Semantic

Rule

std::string

Owns a character sequence

SL.str.1

std::string_view

Refers to a character sequence

SL.str.2

char*

Refers to a single character

SL.str.4

std::byte

Refers to byte values (not necessarily characters)

SL.str.5

To summarize, only std::string is an owner. All the others refer to existing text.

SL.str.1

Use std::string to own character sequences

Maybe you know another string that owns its character sequence: a C-string. Don’t use a C-string! Why? Because you have to manually take care of the memory management, the string termination character, and the length of the string.

// stringC.c
 
#include <stdio.h>
#include <string.h>
 
int main( void ) {
 
  char text[10];
 
  strcpy(text, "The Text is too long for text."); // (1) too long
  printf("strlen(text): %u\n", strlen(text)); // (2) missing '\0'
  printf("%s\n", text);
 
  text[sizeof(text)-1] = '\0';
  printf("strlen(text): %u\n", strlen(text));
 
  return 0;
 
}

The simple program stringC.c has undefined behavior (1) and (2). Compiling it with a rusty GCC 4.8 seems to work. See Figure 16.4.

Images

Figure 16.4 Undefined behavior with a C-string

The C++ equivalent does not have the same issues.

// stringCpp.cpp
 
#include <iostream>
#include <string>
 
int main() {
 
    std::string text{"The Text is not too long."}; 
 
    std::cout << "text.size(): " << text.size() << '\n';
    std::cout << text << '\n';
 
    text +=" And can still grow!";
 
    std::cout << "text.size(): " << text.size() << '\n';
    std::cout << text << '\n';
 
}

In the case of a C++-string, you cannot make an error because the C++ run time takes care of the memory management and the termination character. Additionally, if you access the elements of the C++-string with the at operator instead of the index operator, bounds errors are automatically detected. You can read the details on the at operator in the rule “SL.con.3: Avoid bounds errors.”

SL.str.2

Use std::string_view to refer to character sequences

A std::string_view refers to the character sequence. To say it more explicitly: A std::string_view does not own the character sequence. It represents a view of a sequence of characters. This sequence of characters can be a C++-string or C-string. A std::string_view needs two pieces of information: the pointer to the character sequence and the length. It supports the reading part of the interface of std::string. In addition to a std::string, std::string_view has two modifying operations: remove_prefix and remove_suffix.

std::string_view shines brightly when it comes to memory allocation.

// stringView.cpp; C++20
 
#include <cassert>
#include <iostream>
#include <string>
 
#include <string_view>
 
void* operator new(std::size_t count) {                   // (1)
  std::cout << " " << count << " bytes" << '\n';
  return malloc(count);
}
 
void getString(const std::string& str) {}
 
void getStringView(std::string_view strView) {}
 
int main() {
 
   std::cout << '\n';
 
   std::cout << "std::string" << '\n';
                                                          // (2)
   std::string large = "0123456789-123456789-123456789-123456789";
   std::string substr = large.substr(10);                 // (2)
 
   std::cout << '\n';
 
   std::cout << "std::string_view" << '\n';
                                                          // (3)

   std::string_view largeStringView{large.c_str(), large.size()};
   largeStringView.remove_prefix(10);                     // (3)
 
   assert(substr == largeStringView);
 
   std::cout << '\n';
 
   std::cout << "getString" << '\n';
 
   getString(large);
   getString("0123456789-123456789-123456789-123456789"); // (2)
   const char message []= "0123456789-123456789-123456789-123456789";
   getString(message);                                    // (2)
 
   std::cout << '\n';
 
   std::cout << "getStringView" << '\n';
 
   getStringView(large);                                  // (3)
   getStringView("0123456789-123456789-123456789-123456789");
   getStringView(message);                                // (3)
 
   std::cout << '\n';
 
}

I overloaded the global operator new (1) to trace each memory allocation. Memory allocations take place in (2) but not in (3). See Figure 16.5.

Images

Figure 16.5 No memory allocation with std::string_view

SL.str.4

Use char* to refer to a single character

If you don’t follow this rule and use const char* as a C-string, you may end up with a critical issue such as the following one.

char arr[] = {'a', 'b', 'c'};
 
void print(const char* p) {
    std::cout << p << '\n';
}
 
void use() {
    print(arr); // undefined behavior
}

arr decays to a pointer when used as an argument of the function print. The issue is that arr is not zero terminated. The call print(arr) has undefined behavior.

SL.str.5

Use std::byte to refer to byte values that do not necessarily represent characters

std::byte (C++17) is a distinct type implementing the concept of a byte as specified in the C++ language definition. This means a byte is neither an integer nor a character. Its job is to access object storage. std::byte’s interface consists of methods for bitwise logical operations.

template <class IntType> 
    constexpr byte operator << (byte b, IntType shift); 
template <class IntType> 
    constexpr byte operator >> (byte b, IntType shift); 
constexpr byte operator | (byte l, byte r); 
constexpr byte operator & (byte l, byte r); 
constexpr byte operator ~ (byte b); 
constexpr byte operator ^ (byte l, byte r);

You can use the function std::to_integer(std::byte b) to convert a std::byte to an integer type and the call std::byte{integer} to do it the other way around. integer has to be a non-negative value smaller than std::numeric_limits<unsigned_char>::max().

SL.str.12

Use the s suffix for string literals meant to be standard-library strings

Before C++14, there was no way to create a C++-string without a C-string. This is strange because we want to get rid of the C-string. With C++14, we got C++-string literals. They’re C-string literals with the suffix s: "cStringLiteral"s.

Let me show you an example that makes my point: C-string literals and C++-string literals are different.

// stringLiteral.cpp
 
#include <iostream>
#include <string>
#include <utility>
 
int main() {
 
    std::string hello = "hello"; 
    auto firstPair = std::make_pair(hello, 5);
 
    auto secondPair = std::make_pair("hello", 15);      // (2)  ERROR 
 
     using namespace std::string_literals;              // (1)
    // auto secondPair = std::make_pair("hello"s, 15);  // (3)  OK 
 
    if (firstPair < secondPair) std::cout << "true\n";  // (4)
 
}

I have to include the namespace std::string_literals (1) to use the C++-string literals. Lines (2) and (3) are the critical lines in the example. I use the C-string literal "hello" to create a C++-string (2). This is the reason that the type of firstPair is of type (std::string, int), but the type of the secondPair is (const char*, int). In the end, the program does not compile when I use (2). The program compiles and the comparison works when I use (3).

Input and output

When you interact with the outside world, two input/output libraries come into play: the stream-based I/O library (short for iostream library) and the C-style I/O functions. Of course, you should prefer the iostream library. The C++ Core Guidelines give a good overview of iostreams: “iostreams is a type safe, extensible, formatted and unformatted I/O library for streaming I/O. It supports multiple (and user extensible) buffering strategies and multiple locales. It can be used for conventional I/O, reading and writing to memory (string streams), and user-defined extensions, such as streaming across networks (asio: not yet standardized).”

SL.io.1

Use character-level input only when you have to

First, here is a bad example from the guidelines: using character-level input for more than one character.

char c;
char buf[128];
int i = 0;
while (cin.get(c) && !isspace(c) && i < 128)
    buf[i++] = c;
if (i == 128) {
    // ... handle too long string ....
}

Honestly, this is a terrible solution for a simple job. Here is the right way to do it:

std::string s;
std::cin >> s;

SL.io.2

When reading, always consider ill-formed input

Each stream has a state associated with it, which is represented by flags. See Table 16.2.

Table 16.2 State of the stream

Flag

Query of the flag

Description

Examples

std::ios::goodbit

stream.good()

No bit set

 

std::ios::eofbit

stream.eof()

End-of-file bit set

  • Reading beyond the last valid character

std::ios::failbit

stream.fail()

Error

  • False formatted reading

  • Reading beyond the last valid character

  • Opening a file failed

std::ios::badbit

stream.bad()

Undefined behavior

  • Size of stream buffer cannot be adjusted

  • Code conversion of stream buffer failed

  • A part of the stream throws an exception

Operations on a stream have an effect only if the stream is in the std::ios::goodbit state. If the stream is in the std::ios::badbit state, it cannot be reset to the std::ios::goodbit state.

// streamState.cpp
 
#include <ios>
#include <iostream>
 
int main() {
 
    std::cout << std::boolalpha << '\n';
 
    std::cout << "In failbit-state: " << std::cin.fail() << '\n';
 
    std::cout << '\n';
 
    int myInt;
    while (std::cin >> myInt){
       std::cout << "Output: " << myInt << '\n'; 
       std::cout << "In failbit-state: " << std::cin.fail() << '\n';
       std::cout << '\n';
     }
 
     std::cout << "In failbit-state: " << std::cin.fail() << '\n';
     std::cin.clear();

     std::cout << "In failbit-state: " << std::cin.fail() << '\n';
  
     std::cout << '\n';
 
}

The input of the text wrongInput causes the stream std::cin to be in the std::ios::failbit state. Consequently, wrongInput and std::cin.fail() cannot be displayed. First, you have to set the stream std::cin to the std::ios::goodbit state.

SL.io.3

Prefer iostreams for I/O

Why should you prefer iostreams to printf? There is a subtle but critical difference between printf and iostreams. The format string with printf specifies the format, and the type of the displayed value, while the format manipulator with iostreams specifies only the format. To say it the other way around: The compiler deduces the correct type automatically in case of iostreams.

The following program makes my point clear. When you specify the wrong type in a format string, you get undefined behavior.

// printfIostreamsUndefinedBehavior.cpp
 
#include <cstdio>
 
#include <iostream>
 
int main() {
 
    printf("\n");
 
    printf("2011: %d\n",2011); 
    printf("3.1416: %d\n",3.1416); 
    printf("\"2011\": %d\n","2011"); 
    // printf("%s\n",2011);    // segmentation fault
 
    std::cout << '\n';
    std::cout << "2011: " <<  2011 << '\n'; 
    std::cout << "3.146: " << 3.1416 << '\n';

    std::cout << "\"2011\": " << "2011" << '\n'; 
 
    std::cout << '\n';
 
}

Figure 16.6 shows how this undefined behavior manifests itself on my computer.

Images

Figure 16.6 Undefined behavior with printf

You may assume that the compiler issues a warning in the case of a wrong format string, but you have no guarantee. Additionally, I know what often happens when the deadline has passed. You ignore the warnings and maybe decide to look into it later. Instead of facing the consequences of those errors later, avoid the errors in the first place.

SL.io.10

Unless you use printf-family functions call ios_base::sync_with_stdio(false)

Per default, operations on the C++ streams are synchronized with the C streams. This synchronization happens after each input or output operation.

This synchronization allows mixing C++ and C input or output operations because operations on the C++ streams go unbuffered to the C streams. What is also important to note from the concurrency perspective is that synchronized C++ streams are thread safe. All threads can write to the C++ streams without any need for synchronization. The effect may be an interleaving of characters but not a data race.

When you set the std::ios_base::sync_with_stdio(false), the synchronization between C++ streams and C streams does not happen because the C++ streams may put their output into a buffer. Because of the buffering, the input and output operation may become faster. You have to invoke std::ios_base::sync_with_stdio(false) before any input or output operation. If not, the behavior is implementation defined.

SL.io.50

Avoid endl

Why should you avoid std::endl? Or to say it differently: What is the difference between the manipulators std::endl and '\n'?

  • std::endl: writes a newline and flushes the output buffer

  • '\n': writes a newline

Flushing the buffer is an expensive operation and should, therefore, be avoided. If necessary, the buffer is automatically flushed. Honestly, I was curious to see the benchmarks. To simulate the worst case, here is my program, which puts a line break (1) after each character.

// syncWithStdioPerformanceEndl.cpp
 
#include <chrono>
#include <fstream>
#include <iostream>
#include <random>
#include <sstream>
#include <string>
 
constexpr int iterations = 500;                             // (2)

std::ifstream openFile(const std::string& myFile){ 
 
  std::ifstream file(myFile, std::ios::in);
  if ( !file ){
    std::cerr << "Can't open file "+ myFile + "!" << '\n';
    exit(EXIT_FAILURE);
  }
  return file;
 
}
 
std::string readFile(std::ifstream file){ 
 
  std::stringstream buffer;
  buffer << file.rdbuf();
 
  return buffer.str();
 
}
 
template <typename End>
auto writeToConsole(const std::string& fileContent, End end){
 
  auto start = std::chrono::steady_clock::now();
  for (auto c: fileContent) std::cout << c << end;        // (1)
  std::chrono::duration<double> dur = std::chrono::steady_clock::now() 
                                     - start;
  return dur;
} 
 
template <typename Function>
auto measureTime(std::size_t iter, Function&& f){
  std::chrono::duration<double> dur{};
  for (int i = 0; i < iter; ++i){
    dur += f();
  }
  return dur / iter;
}
 
int main(int argc, char* argv[]){

  std::cout << '\n';
 
  // get the filename
  std::string myFile;
  if ( argc == 2 ){
    myFile= argv[1];
  }
  else {
    std::cerr << "Filename missing !" << '\n';
    exit(EXIT_FAILURE);
  } 
 
  std::ifstream file = openFile(myFile);
 
  std::string fileContent = readFile(std::move(file));
                                                        // (3)
  auto averageWithFlush = measureTime(iterations, [&fileContent] { 
    return writeToConsole(fileContent, 
          std::endl<char, std::char_traits<char>>); 
  }); 
                                                        // (4) 
  auto averageWithoutFlush = measureTime(iterations, [&fileContent] { 
    return writeToConsole(fileContent, '\n'); 
  }); 
 
  std::cout << '\n';
  std::cout << "With flush(std::endl) " << averageWithFlush.count() 
                                        << " seconds" << '\n';
  std::cout << "Without flush(\\n): " << averageWithoutFlush.count() 
                                      << " seconds" << '\n';
  std::cout << "With Flush/Without Flush: " 
            << averageWithFlush/averageWithoutFlush << '\n';
 
  std::cout << '\n';
 
}

In the first case, I execute the program with std::endl (3); in the second case, I execute it with '\n' (4). When I perform the program with 500 iterations (2), I get the expected winner. '\n' is about 10% to 20% faster on Linux (GCC) and Windows (cl.exe) than std::endl.

Here are the concrete numbers.

Images

Figure 16.7 Performance with/without flushing on Linux

Images

Figure 16.8 Performance with/without flush on Windows

Related rules

The standard library is an important part of the C++ standard: “ES.1: Prefer the standard library to other libraries and to ‘handcrafted code.’ ” This means that the rules in this book address various aspects of the library. Prominent examples are smart pointers in Chapter 7, Resource Management, or the threading components in Chapter 10, Concurrency.

Many rules present the pros of the STL containers over C-arrays. For completeness, here are a few of the rules: