Chapter 5.6

E.6: Use RAII to prevent leaks

Deterministic destruction

We have waxed lyrical about deterministic destruction already in this text, but it bears repeating that it is the single greatest feature of C++. Unfortunately, we have a small problem with object storage duration.

There are four classes of storage duration. An object with static storage duration will be created prior to execution of main() and be destroyed after main() returns. An object with automatic storage duration will be created as soon as it is declared and be destroyed the moment its name falls out of scope. An object with thread-local storage duration will be created when a thread begins and be destroyed when the thread ends.

Dynamic storage duration implies user-specified behavior. An object with dynamic storage duration will be created using the new operator and destroyed using the delete operator. These operators are part of the source code, written by the user. This means it is vulnerable to error. There are two types of error: using an object after it has been destroyed and losing an object entirely (by forgetting to use delete). It is the latter error that concerns us here. Losing an object is known as leaking.

Here is the simplest possible leak:

#include <vector>
int main() {
  new std::vector<int>;
}

This is perfectly legal code. I’ve just tried it on Compiler Explorer with several com-pilers and they all compile without warning. On execution, the new operator will call std::operator new, requesting the amount of memory required to create an instance of std::vector<int>, usually three words. This memory will be allocated and returned to the new operator, which will then call the std::vector<int> default con-structor to populate those three words. Then main() will exit.

The object was not bound to a name. It did not need to be. The new operator did its work as instructed. By not binding the result of the new operator to a name, it could not be deleted, so the leak was immediate. The memory was not freed auto-matically when main() exited, although it is most likely that the operating system performed appropriate cleanup when the process ended. It will not have invoked any destructors, but it will have reclaimed the memory.

Here is the next simplest leak:

#include <vector>
int main() {
  auto vi = new std::vector<int>;
}

As you can see, we bound the result of the new operator to a name. Unfortunately, the name fell out of scope before we deleted the object. The thing that was destroyed when that name fell out of scope was not the std::vector<int> object, but a pointer to it. The type of vi is std::vector<int>* and when such an object falls out of scope, the object pointed to is not destroyed, only the pointer itself.

This was a remarkably common problem prior to C++11 when smart pointers were introduced. Now we could say:

#include <vector>
#include <memory>
int main() {
  auto vi = std::unique_ptr<std::vector<int>>(new std::vector<int>);
}

This is rather a mouthful. Since the introduction of std::make_unique in C++14 we have been able to say:

#include <vector>
#include <memory>
int main() {
  auto vi = std::make_unique<std::vector<int>>();
}

Rather clearer, we hope you agree. The type of vi is no longer std::vector<int>*. It is now an object of type std::unique_ptr<std::vector<int>>.

The life cycle of vi is rather different now. Rather than being initialized with a memory address at creation and simply ceasing to be when the name falls out of scope, it is initialized not only with the memory address but also instructions on how to destroy the object. When vi falls out of scope, it invokes the destructor of the object it is bound to, which in turn destroys the std::vector<int> object.

This idiom, where the creation of an object also includes details of how to destroy it, is known as Resource Acquisition Is Initialization, or RAII. The life cycle of the memory is bound to the lifetime of an object. This phrase was coined by Bjarne Stroustrup and first appeared in his book The C++ Programming Language.

The concept of RAII is an extremely useful benefit of the destructor. It is not only applicable to memory, but also to any resource that has a life cycle that needs to be explicitly managed. We shall spend the rest of this chapter looking at an example.

We should remark that the earlier examples were manufactured to demonstrate memory leaks. All those examples could have been fixed by creating the vector on the stack, giving the object automatic storage duration.

You should always prefer automatic storage duration to dynamic storage dura-tion. You only need to use dynamic storage duration when you are creating objects whose lifetime must persist beyond the current scope. If you are reasonably new to C++ it is to be hoped that this is a little startling and somewhat curious. You may never have been introduced to the new operator and will always have used smart pointers to create large objects which can be cheaply passed around. You might have been told that you avoid memory leaks by avoiding the use of raw pointers. However, a memory leak is not the only kind of leak.

Leaking away files

Windows programmers will be familiar with the function CreateFile. This function creates or opens a file or I/O device and returns a handle to that object. That handle is of type HANDLE which is an alias to a void*. The handle is used with all function calls involving the file: ReadFile, WriteFile, SetFilePointer, and so on. When the object is no longer needed, a call to CloseHandle will release the resource back to the operating system.

The same is true for file handling in the standard library. If you choose to avoid the streams library, the function std::fopen creates or opens a file or I/O device and returns a pointer to an implementation-defined type called FILE. This pointer is used with all function calls involving the file: std::fread, std::fwrite, std::fseek, and so on. When the object is no longer needed, a call to std::fclose will release the resource back to the operating system.

You can see a very similar set of operations here. std::fread takes the FILE pointer as the final argument while ReadFile takes it as the first argument, and std::fread reads a number of objects of a given size while ReadFile reads a number of bytes, but the principle is the same: here is a handle, given by the operating system, for you to use while you engage in file manipulation.

These handles can leak, just as memory does. Returning to our example, here is the simplest possible leak:

#include <cstdio>
int main() {
  std::fopen("output.txt", "r");
}

std::fopen returns a FILE* which is not bound to a name and simply leaks away. We can repeat the second example too:

#include <cstdio>
int main() {
  auto file = std::fopen("output.txt", "r");
}

The FILE still leaks away. In this example it was bound to a name but std::fclose wasn’t called to release the resource back to the operating system.

Fortunately, the C++ Standard Library comes to the rescue with the iostreams library. This library offers a selection of objects with correctly managed life cycles. Just as std::unique_ptr releases the memory resource when it falls out of scope, so do the iostream library objects. For example:

#include <fstream>
int main()
{
  auto file = std::fstream{ "output.txt" };
}

To open a file, you pass a filename to a std::fstream object constructor. This will open the file, allow you to invoke member functions such as read, write, seekp, and seekg, and close the file when the destructor is invoked.

The iostreams library is not everyone’s cup of tea. It is designed with abstract base classes, burdening it with performance inefficiencies. It is a library of its time, that time being the early 1990s. We have learned many things about C++ library design since then, such as the value of composition, and were we to start again I imagine we would take a different approach. It is still a perfectly good library that delivers what it promises, but many programmers are tempted to take their own approach and write their own file-handling library from scratch.

There are easier ways to solve the problem of leaking files. One is to create an object like a std::unique_ptr, but rather than holding a pointer to memory, it holds the file instead. For example:

#include <cstdio>
class FILE_holder {
public:
  FILE_holder(std::FILE* f) : m_f(f) {}
  ~FILE_holder() { std::fclose(m_f); }
  operator std::FILE*() { return m_f; }

private:
  std::FILE* m_f;
};

int main()
{
  auto file = FILE_holder(std::fopen("output.txt", "r"));
}

No leaks here. There is another problem of course: assigning away from this object may result in the object being closed prematurely. In fact, what we want is something exactly like std::unique_ptr, but for objects created via std::fopen rather than via the new operator.

Fortunately, the committee thought of that. std::unique_ptr is a class template with not one but two parameters. The first parameter is the type of the object being contained, while the second parameter is the deleter. The second parameter defaults to std::default_delete, a very simple object, with a constructor and a parenthesis operator. A naïve implementation might look like this:

template<class T>
struct default_delete {
  constexpr default_delete() noexcept = default;
  template<class U>
  default_delete(const default_delete<U>&) noexcept {}
  void operator()(T* p) const { delete p; }
};

Rather than writing your own delete and using it when making std::unique_ptr instances, you can specialize the template for std::FILE. It is simple, as demonstrated below:

#include <memory>
#include <cstdio>

template <>
struct std::default_delete<std::FILE> {
  void operator()(std::FILE* f) { std::fclose(f); }
};

int main()
{
  auto file = std::unique_ptr<std::FILE>(std::fopen("output.txt", "r"));
}

The specialization simply replaces the parenthesis operator with a call to std::fclose, rather than calling the delete operator. When file falls out of scope, the std::unique_ptr object containing the std::FILE* is destroyed, closing the std::FILE* object on its way to oblivion.

Why are we bothering?

Ideally, all your classes that are associated with any resources should have construc-tors and destructors that correctly manage the life cycle of those resources. As we saw with std::FILE* we have an escape hatch for nonconforming objects, but what do we do when our resource is not exposed as a pointer?

It might have occurred to you that we seem to be going to a lot of trouble here to clean up after ourselves when surely the operating system does all that for us. When a process terminates, all the handles associated with that process are closed, all the memory is released, and everything is left ready for reuse. Why do we care about resource leaks when the environment is going to take care of that for us anyway?

There are a few reasons. The first is that if you leak resources fast enough there will come a point where you will request a resource and the operating system will decline your request, warning you that there are none left of whatever it is you are requesting. This is especially likely if your program is long lived, designed to run for the entire uptime of the computer on which it is deployed.

Second, it is a good habit to get into. If you decide that cleaning up after yourself is optional, that means you have put a decision point into your development cycle. Every time you create something that may leak, you will need to spend time deciding whether to spend time working out how to prevent it from leaking. Be in the habit of always cleaning up after yourself.

Third, in the case of files specifically, on some operating systems if a running application keeps a file open, the user is prevented from modifying, moving, or delet-ing the file until the program ends. This can be a cause of significant irritation—sometimes even causing users to reboot their machines to delete an unwanted file.

Finally, it is not necessarily safe to presume that the operating system will do all the cleanup. If you are using a legacy third-party library, leaking resources may have long-term consequences. Consider this fragment:

int open_database(const char*);
void close_database(int);

int main()
{
  auto db = open_database("//network/customer.db");
}

There exists a database somewhere remote from your machine. This code opens a connection and then leaks it. The operating system knows nothing about how to clean up after this leak. With any luck, the database server is a well-written piece of software that will hand out a connection and close the connection if it remains unused within a timeout period. That is not a safe assumption to hold, though.

However, we cannot specialize std::default_delete since open_database does not return a pointer. You may be tempted to use reinterpret_cast to turn the int into a pointer, but that would earn you a hard stare at code review time since you are flat-out lying to the compiler. The correct solution is to create a proxy, like this:

#include <memory>

int open_database(const char*);
void close_database(int);

struct DATABASE_PROXY {
  DATABASE_PROXY(int db_) : db(db_) {}
  operator int() { return db; }
  int db;
};

template <>
struct std::default_delete<DATABASE_PROXY> {
  void operator()(DATABASE_PROXY* p) { close_database(*p); }
};


int main()
{
  auto db = std::unique_ptr<DATABASE_PROXY>
               (new DATABASE_PROXY(open_database("//network/customer.db")));
}

The DATABASE_PROXY class wraps the returned value, allowing you to allocate a copy from the free store and pass it to the std::unique_ptr constructor. This also works for objects larger than an int, although one would hope that if a struct is being returned from a function, appropriate resource management will be taking place as part of its design.

This all seems a bit much: Future possibilities

Creating a struct just to specialize std::default_delete seems like a disproportion-ately large chunk of work. However, this comes in the category of “dealing with leg-acy code.” We have learned many things as a programming community over the past 40 years of C++ development, many of which we have carefully encoded into revi-sions of the language standard. There will always be a cost to accommodating code that seemed well written at the time, but which did not benefit from the discovery of subsequent idioms and practices.

For example, casting was a perfectly normal, acceptable way of dealing with conflict-ing types when writing C code. C++ strengthened the type system philosophically and practically with the introduction of casting keywords such as static_cast and reinter-pret_cast, both of which are quite ugly and both of which serve to draw attention to the fact that you are subverting an important part of the language, to wit, type safety.

The contemporary way of modeling RAII is through correct use of the construc-tor and destructor. All resources should be acquired in the constructors, released in the destructor, and managed correctly in the assignment operators. The resources should be abstracted in their own class with their own correct special functions so that client classes are spared the burden of managing them. This promotes the rule of five or zero.

Looking to the future, though, there is more explicit support available in C++ Extensions for Library Fundamentals, Version 3,1 at the section named [scopeguard]. This describes a header named <experimental/scope>, a name that will be modified should this feature be adopted into the standard, which offers four classes:

1. www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4873.html

template <class EF> class scope_exit;
template <class EF> class scope_fail;
template <class EF> class scope_success;
template <class R, class D> class unique_resource;

The first three classes are useful for RAII within a single scope. There can be many ways to exit a scope if a function is egregiously long, and these classes ensure that however a scope is exited, cleanup can take place. If you want to dif-ferentiate between exceptional exit and successful exit, that option is available with std::experimental::scope_fail and std::experimental::scope_success. For example:

void grow(vector<int>& v) {
  std::experimental::scope_success guard([]{
    std::cout << "Good!" << std::endl; });
  v.resize(1024);
}

There are two ways out of this function: either v.resize(1024) is successful or it throws. The std::experimental::scope_success object will write to the standard output only if the resize is successful.

std::experimental::unique_resource2 is very similar to std::unique_ptr. How-ever, unlike std::unique_ptr, std::experimental::unique_resource does not require the resource to be a pointer.

2. The std::experimental namespace is used to keep experimental features approved by the C++ Standards Committee. Entities that start life here may end up in the std namespace if there is enough positive feedback.

Revisiting the std::fopen example:

#include <experimental/scope>
#include <cstdio>

int main()
{

  using std::experimental::unique_resource;
  auto file = unique_resource(
      std::fopen("output.txt", "r"),
      [](auto fp){ std::fclose(fp); });
}

There is a problem, though: what if std::fopen failed? We need a way of signaling an invalid value for the result of std::fopen, or indeed for any resource we want to wrap in this way.

There is also an analogue to the std::make_unique function template, with a somewhat verbose function signature:

template <class R, class D, class S=decay_t<R>>
std::experimental::unique_resource<decay_t<R>, decay_t<D>>
  std::experimental::make_unique_resource_checked
      (R&& resource, const S& invalid, D&& d)
  noexcept(std::is_nothrow_constructible_v<decay_t<R>, R> &&
           std::is_nothrow_constructible_v<decay_t<D>, D>);

This is a function template which takes a resource type, an invalid value for the resource type, and a deleter function object, returning a std::experimental::unique_resource object. If the resource matches the invalid value, then the deleter function object will not be invoked.

Here is how we would rewrite the std::fopen example:

#include <experimental/scope>
#include <cstdio>

int main()
{
  auto file = std::experimental::make_unique_resource_checked(
      std::fopen("potentially_nonexistent_file.txt", "r"),
      nullptr,
      [](auto fptr){ std::fclose(fptr); });
}

Walking through this example, we call std::experimental::make_unique_resource_checked with the result from std::fopen, with the intent of calling std::fclose if the file opens successfully. If the value returned by std::fopen is nullptr, then the call to std::fclose is avoided.

Where can I get this?

These are very useful tools for your toolbox. Unfortunately, they are not necessarily provided by your implementation vendor. My preferred vendor does not ship <experimental/scope>, although there are some entries in the experimental directory.

This does not stop you from implementing it yourself, though. The full specifica-tion is provided in the Technical Specification, linked to in an earlier footnote. Search for [scopeguard]; the tag in brackets is known as the stable index. There you will find a complete specification of how these four classes and the nonmember function should work. It should take you less than 15 minutes to read. It will take you less time than you think to implement.

There are three benefits to implementing it yourself. The primary benefit is that you can start using the objects, and if they are adopted into the standard, which is quite likely, you will need to make minimal changes to your source code. The second-ary benefit is that you will start to learn how the standard is specified, and how to implement library features. The tertiary benefit is that if you encounter any mistakes or ambiguities in the Technical Specification, you can pass this information back to the editors and enable them to make fixes before it gets adopted into the standard. Once something is in the standard, fixing it is quite hard. It is, of course, preferable to spot all errors ahead of time, prior to deployment.

Finally, if you think this is a useful addition to the language, you should let the committee know and ask them to prioritize it for inclusion. Alternatively, if you think this is overengineered, overly elaborate, unnecessary, or not deserving of a place within the standard, you can also let the committee know by writing a paper presenting your arguments. The committee is made up of representatives from many nations and companies who volunteer their time to improve the C++ standard. They do what the rest of the world asks them to do, within the bounds of reasonable debate on desirability, feasibility, and achievement of consensus, to deliver the C++ standard that supports the needs of the C++ community. Be advised that progress can be glacial: getting agreement from over a hundred people from dozens of compa-nies, industries, and countries is a slow process.

It is important to stress the voluntary nature of this work. There is no member-ship test, no private invitation, no secret handshake: participation is achieved simply by turning up to committee meetings and helping the process along. A code of con-duct moderates all behavior and keeps proceedings open and transparent.

There are several routes to participation. In my case, as a UK resident, I contacted the British Standards Institute and requested details about joining the BSI C++ panel. (In Canada, where Kate lives, it’s the Standards Council of Canada you con-tact. In the US it is INCITS.) Each country has its own name for its National Stand-ards Development Organization (SDO), as well as its own cost structure with some being free to participate and others charging a membership fee. You may be able to contact your own standards institute and make inquiries. Even if you don’t want to participate in full, you should be able to make your views known to your national body.

Not every country in the world is represented in the C++ committee, but new countries are always welcome, and formal participation can be initiated and under-taken by anyone who is prepared to engage with their own nation’s standards body. Sometimes one person can simply bring a national body into being. In 2017 Hana Dusíková visited CppCon and gave a lightning talk about a Compile Time Regular Expression parser she had developed. She caught the attention of several commit-tee regulars, went on to form and convene the Czech national body, and now chairs Study Group 7, Reflection.

You can find out more about the standardization process by visiting http://isocpp.org/std. There, you can find out how to contact your national body, how to participate in standards development and in committee meetings, how to report defects, and how to submit proposals. You can also find the standing documents that describe How Things Are Done. You can see this page for the ISO Programming Language committee: https://www.iso.org/committee/45202.html. Under the link for Participating Members on this page, you will see which country has participa-tion, the SDO names of each country, and their participation status. In particular, P-members have voting rights and O-members do not. If your country is not on the list, then either your country has no SDO, or the SDO did not join this ISO Stand-ards Committee. Now you will know how much work you have in front of you.

Engaging with the committee and shaping the standard will help C++ continue to be the language you reach for to solve your software engineering problems.