In 2012 I facilitated a code review with a colleague, let’s call her Beth, which I set aside as a useful example of the field. The purpose of the function was to take a selection of icons arranged in a row along the bottom of the screen and spin through them by an arbitrary amount. We had just migrated to C++11.
After half an hour of hacking away at it, the function looked like this, having lost about a dozen spurious lines of code:
UIComponent* spin_cards(UIComponent* first, UIComponent* n_first, UIComponent* last) { if (first == n_first) return last; if (n_first == last) return first; UIComponent* write = first; UIComponent* next_read = first; UIComponent* read = n_first; while (read != last) { if (write == next_read) { next_read = read; } std::iter_swap(write++, read++); } spin_cards(write, next_read, last); return write; }
There were three questions I posed to Beth at this point. The first was, “Have you thought about using auto
for write
, next_read
, and read
?” She rolled her eyes and asked, “Why?” We then had a conversation about programming with interfaces rather than types. Eventually I asked her to humor me and make the change:
UIComponent* spin_cards(UIComponent* first, UIComponent* n_first, UIComponent* last) { if (first == n_first) return last; if (n_first == last) return first; auto write = first; auto next_read = first; auto read = n_first; while (read != last) { if (write == next_read) { next_read = read; } std::iter_swap(write++, read++); } spin_cards(write, next_read, last); return write; }
Then I asked, “What part of the UIComponent
interface are you using in this func-tion?” and she regarded me carefully. Along the way we had replaced a manual swap with std::swap
, and then replaced that with std::iter_swap
. I was waiting for the penny to drop, and for Beth to realize that UIComponent
was nothing more than a for-ward iterator, but I had to lead her to that point. After some umming and ahhing, she replied, “Actually, I could template this over a forward iterator. All I’m doing is assigning, comparing, dereferencing, and writing.” Thirty seconds later we had:
template <typename T> T* spin_cards(T* first, T* n_first, T* last) { if (first == n_first) return last; if (n_first == last) return first; auto write = first; auto next_read = first; auto read = n_first; while (read != last) { if (write == next_read) { next_read = read; } std::iter_swap(write++, read++); } spin_cards(write, next_read, last); return write; }
I suggested she choose a less specific and more descriptive name. While she reflected on the difficulty of naming, I asked her how long it had taken her to write and test the function. “About two hours,” she replied. “There were some stupid edge cases that kept biting me.”
“Mmm,” I said. “Perhaps you could call this function rotate
.” She looked at me incredulously and somewhat crestfallen as she realized that the brief called for an application of std::rotate
. We compared our final incantation with the implemen-tation’s definition, and found they were very, very similar. You can of course find a description of std::rotate
at https://en.cppreference.com/w/cpp/algorithm/rotate.
My favorite part of engineering is eliminating code. Fewer lines of code lead to a smaller cognitive burden and easier apprehension of the codebase. The aim of the engineer should be to express the solution domain as simply and easily as possible. Paying engineers “by the line” is a surefire way to destroy your project.
Consider the difference between the input and output of a compiler. You will typi-cally pass in a translation unit with hundreds of lines of code and be presented with thousands of assembly instructions wrapped up in an object file. Those assembly instructions each trigger activity of greater complexity on the CPU. The assembly is more complex, more detailed than the C++ source code. The CPU activity is yet more complex than the assembly instructions. The entire effort of software engineer-ing and language design is the reduction of complexity and detail.
When I was learning my trade in the 1980s, I was presented with the concept of high-level languages like Sinclair BASIC and BBC BASIC, and then low-level lan-guages like Z80 assembly and 6502 assembly. The level in question is the level of abstraction. The distance between BASIC and assembly is significant. When I first started learning C, having absorbed BASIC and assembly, it was presented as a low-level language, although not as low level as assembly. I treated it like an assembly macro programming language. I could see which assembly instructions would be generated by which C constructs. I had the immediacy of assembly available to me, with the naming clarity of BASIC. I could name variables, like in BASIC, but I could easily infer how much memory they would take up. I could create control loop struc-tures without having to fiddle about with the correct flag testing to decide whether to jump back or continue.
Fewer lines of code lead to a smaller cognitive burden and easier apprehension of the codebase. The aim of the engineer should be to express the solution domain as simply and easily as possible. Paying engineers “by the line” is a surefire way to destroy your project.
The complexity of assembly was ame-liorated by nicer identifiers and clearer structure, both helping to hide that detail. C raised the level of abstraction.
When I first came across C++, it was like a macro programming language for C. Some of the fiddly things I had been doing, like wrapping up malloc
and init
into a single function, were being done for me by the C++ compiler in the form of new key-words. C++ raised the level of abstraction.
In both cases, raising the level of abstraction did not come at a measurable performance cost in the general case. The compilers were not perfect, and on several occasions, I was left open-mouthed at the seeming idiocy of the compiler writer, but fortunately I could resort to a truly nonportable solution and drop back down to a lower level of abstraction by insert-ing hand-rolled artisanal assembly language instructions. The last time I remember outperforming the compiler was in about 1991. Since then, I have been able to rely on it to do a better job than me.
Importantly, if I were to go back to hand-rolling assembly, I would make a much worse job of it, because the complexity of CPUs has increased profoundly. When I was writing assembly, there was one core, and there were no caches and therefore no data prefetching nor branch prediction. The decision to use contiguous data structures rather than node-based data structures was a meaningful one. Memory read/write operations took about the same time as an instruction execution cycle, so big-O notation meant something.
Raising the level of abstraction improves portability. My beautifully cunning assembly language would only work on one processor family. In the early 1980s that was a significant problem because the two competing home computers that were the leading game platforms ran on Z80 and 6502 processors. By the mid-1980s matters were simplified by the 68000 processor being chosen for both the Atari ST and the Commodore Amiga, as well as the early Apple Macs.
The purpose of an API is to reduce complexity by hiding details behind an inter-face, raising the level of abstraction. The purpose of language is to hide the complex-ity of individual comprehension of meaning behind corresponding linguistic tokens. Raising the level of abstraction is fundamental to the role of software engineer.
I hope I have convinced you of the importance of raising the level of abstraction as part of the practice of programming. Let’s look at some examples.
Reviewing the opening story, you can see that we started with a solution target-ing a specific problem, that of rotating the order of a set of icons. During the code review, there was an important moment when we introduced std::swap
. The code had looked like this:
UIComponent* spin_cards(UIComponent* first, UIComponent* n_first, UIComponent* last) { if (first == n_first) return last; if (n_first == last) return first; UIComponent* write = first; UIComponent* next_read = first; UIComponent* read = n_first; while (read != last) { if (write == next_read) { next_read = read; } UIComponent tmp = *write; *write++ = *read; *read++ = tmp; } spin_cards(write, next_read, last); return write; }
You can see the manual swap at the end of the while
loop. I asked Beth the question, “Did you choose not to use std::swap
?” and she replied, “I didn’t see the point. UIComponent
is a trivial object, swapping it is trivial, why call a function?”
My immediate response was “declaration of intent.” There is a Core Guideline for this, P.3: “Express intent.” Yes, it is obvious what was going on; we have all manu-ally written a swap by assigning to a temporary and so on. By replacing that with a single word, swap
, we make it even clearer at no extra cost. The function call will be substituted for the implementation by even the simplest of compilers. Additionally, if UIComponent
stops being a trivial object and merits its own specialization of swap
, the code benefits from that change in detail. Eliminating that little piece of detail and replacing it with a function template reduced complexity, raised the level of abstrac-tion, and made the code more expressive. It also added some future-proofing.
Of course, we already have a large source of useful function templates in the standard library, particularly those in the <algorithm>
header. There is a very high chance that if you find yourself writing a loop, you will be able to replace that loop with one of the function templates defined in this header. Indeed, if you cannot do that, perhaps you have found a new algorithm. You should share it with the world.
We really are quite serious.
There are likely to be some omissions from the <algorithm>
header. For exam-ple, C++20 saw the standardization of std::shift_left
and std::shift_right
. This shifts elements in a range, and it was something I found myself doing quite a lot. It never occurred to me that plenty of other people might be doing something so fun-damental. When I saw the paper come before the committee, I felt quite foolish: there was a lightbulb moment as I realized how much I would use this function.
Some of the function templates in the <algorithm>
header make use of other func-tions from the same header. For example, std::minmax
will make use of std::min
and std::max
. Every opportunity is taken to raise the level of abstraction, resulting in reuse of code and thus as few new lines of code as possible.
Once you start replacing your loops with existing function templates, your code becomes markedly more expressive and less prone to bugs. If you are unwittingly try-ing to write std::find_if
, then replacing your attempt with a tested and debugged version, written by people with a greater understanding of the compilation environ-ment than you, is surely the right thing to do. Of course, you first must realize that you are indeed attempting to rewrite one of the <algorithm>
functions before you can do that. The best indicator of that is the use of a loop.
There are a lot of algorithms. Look at this function:
bool all_matching(std::vector<int> const& v, int i) { for (auto it = std::begin(v); it != std::end(v); ++it){ if (*it != i) { return false; } } return true; }
It’s a simple function to check whether or not all the items in a vector of int
s have the same value. It’s a rather useful function, so useful in fact that a more general version of it already exists in the <algorithm>
header:
template< class InputIt, class UnaryPredicate > constexpr bool all_of( InputIt first, InputIt last, UnaryPredicate p );
or, if you want to be dizzyingly modern and use ranges:
template< ranges::input_range R, class Proj = std::identity, std::indirect_unary_predicate< std::projected<ranges::iterator_t<R>,Proj>> Pred > constexpr bool all_of( R&& r, Pred pred, Proj proj = {} );
The name all_of
is more familiar to the programming community than all_match-ing
. It is also more abstract. Do not reinvent this particular wheel, and watch out for raw loops. I am not the first person to say this: Sean Parent gave a legendary talk in 2013 called Code Seasoning.1
1. https://channel9.msdn.com/Events/GoingNative/2013/Cpp-Seasoning
Particularly, consider Core Guideline T.2: “Use templates to express algorithms that apply to many argument types.” The standard algorithms are immaculate examples of this approach.
I rejoice when I find an abstraction. It means that I have isolated part of the problem domain and that I am in a position to name it. If I am particularly successful it will be a name that crops up in the codebase in all sorts of places.
In addition to function templates, the standard library contains some useful class templates. There are rather fewer of these than there are function templates. The most widely used are probably std::vector
, std::string
, and std::unordered_map
. There are a few proposals in the pipeline that seek to add additional containers, but these are offered rather less frequently than function templates.
In any programming course you should be offered a module on data structures and algorithms. This is the bread and butter of programming. By identifying the broad range of data structures and algorithms in the vocabulary of programmers, you can learn when to create a novel structure or adapt or simply use an existing one. This is a great power, only available to you because you have names for things.
The most cursory of glances through the standard library will reveal that a lot of it is delivered as class and function templates. This is because templates allow you to name things without worrying about types. You may dislike the name “vector,” I certainly do, but it contains no information about what it might contain. It is not relevant to the container. It just contains things in a particular way, guaranteeing that they will be contained contiguously in memory.
Following the conventions of the standard library containers allows you to cre-ate data structures that both are appropriate for your solution domain and will play nicely with the existing algorithms. Define iterators as part of your container so that you will be able to delegate searching and sorting to the standard library, or offer spe-cializations optimized for your data structure. Define the appropriate member types in the same way that std::vector
and std::unordered_map
do. Provide a std::swap
specialization and, from C++20, a three-way comparison operator. This makes your containers easier to reuse and understand. It allows your clients to think about how their data is stored, rather than how to correctly operate the container, or what the details about the type of data being stored are.
This independence from types lets your code describe what it is doing rather than what it is doing it with. This support is also offered by the auto
keyword. This removes unnecessary implementation detail from your reading of the code. In fact, the full definition of std::string
is
namespace std { template< class CharT, class Traits = std::char_traits<CharT>, class Allocator = std::allocator<CharT> > class basic_string; using string = basic_string<char>; }
That is quite a lot of text to hold in your head when all that you are interested in is std::string
. std::char_traits
is not something that you should ever have to write or worry about. Writing allocators is similarly not something you want to become involved with unless you are working closer to the metal than most people, so std::allocator
will ordinarily be entirely sufficient.
The widespread use of std::string
also addresses one of the reasons listed in the Core Guidelines for using templates to raise the level of abstraction of code: reuse. I rejoice when I find an abstraction. It means that I have isolated part of the problem domain and that I am in a position to name it. If I am particularly successful it will be a name that crops up in the codebase in all sorts of places.
std::string
is a great name. There are all sorts of other things it could have been called. For example, the using declarations might have been skipped by the standard-ization committee, and we would be burdened with saying std::basic_string<char>
and then applying our own using declarations, creating an alphabet soup of identical types such as char_string
, ch_string
, string_char
, string_c
, and so on. String is a widely used concept across many languages to indicate a collection of characters for representing text. There are even puns on the name available. A rope, or cord, is a data structure composed of smaller strings that is used to manipulate very long strings such as articles or books. If you have ever written a text editor you will prob-ably have used a rope to enable rapid insertion and deletion.
std::string
is a standout success story when it comes to naming in the standard. Sadly, there are other places where we have fallen short.
It is worth talking about why I dislike the name “vector” and how raising the level of abstraction of your code with a template is a two-step process.
I come from a mathematical background, taking a mathematics degree from Sus-sex University. My strongest field was abstract algebra. I remember the curriculum for my first term, consisting of calculus, analysis, linear algebra, and introduction to philosophy (it was a slightly unusual British university for its time). Linear algebra was taught by Professor Martin Dunwoody, and every lecture was an utter pleasure as the scales fell away from my eyes about how everything works.
The fundamental object of linear algebra is the vector. This is an n-tuple of values with a set of arithmetic operations: addition, scalar multiplication, and vector mul-tiplication, known as the inner product. These arithmetic operations allow you to do all sorts of things: solve simultaneous equations, model Cartesian geometry—in fact, most of modern mathematics derives from linear algebra.
This is a rather different thing from std::vector
. This is a dynamic array of any-thing at all, rather than a fixed array of arithmetic values. There are no operators defined upon it, so you cannot add or multiply them. In fact, the only similarity between a std::vector
and a vector is the containment of several things. This puts the language in an odd place, by misnaming one of the most widely used containers.
In my day job I simulate fictional worlds using linear algebra to model the physi-cal reality of them. To be fair, I can simply define my own vector type. For example, in my codebase gdg::vector_3f
is a vector of three floats, which is used for locating something in three-dimensional space, as well as modeling its velocity and accelera-tion. However, when you see std::vector
in a codebase it doesn’t mean what any mathematician might think it means, and although we have become so used to that as to be blind to it, it is a bump in the road for those coming to C++ from mathemat-ics or other languages. Perhaps dynarray
would have been a better name. Forming an abstraction is a two-step process. First, you identify the abstraction by examining its API, its inputs and outputs, and its relationship to your problem domain. Then you give it a name. Check the literature for prior art and be mindful of the responsibility you bear and consequences you may unleash when you name something.
In summary:
Function and class templates hide detail, and thus complexity, behind mean-ingful names.
Many function templates in the standard library can replace loops.
Many algorithms in the standard library can inform existing data structures.
Expressing abstractions as class templates enables wider reuse, particularly when well named.