Chapter 2.2

I.23: Keep the number of function arguments low

How much should they earn?

Here’s a function declaration:

double salary(PayGrade grade, int location_id);

It’s nice to see “double salary” at the start of a line of code; sadly, the purpose of this function is not to double your salary, but to report a salary for a particular pay grade at a particular location. Also, as remarked earlier, you should use integral types for money, so we shall redeclare this function, thus:

Click here to view code image

int salary(PayGrade grade, int location_id);

You might see this sort of thing in a government contract: civil servant salaries are often pegged to a pay grade but will vary by location. In the UK, for example, some civil servants in London are paid more because of the increased cost of living in the capital.

This is a perfectly normal, ordinary function. It doesn’t appear to rely on external information and in all likelihood simply queries a database table to retrieve the data. It has the look and feel of a first attempt at a function: simple, unambiguous, quiet, unassuming.

Then one day, by act of parliament or some other great immovable object of irre-sistible progress, the requirements change, and salaries are now calculated based not only on pay grade and location, but also on length of service. Perhaps as an aid to retention, it has been decided that a small multiplying factor should be included for long-serving members of staff.

This is not a problem: we can simply add a parameter to the function to represent length of service. This is measured in years, so an int will do. We now have

Click here to view code image

int salary(PayGrade grade, int location_id, int years_of_service);

You might be looking at the pair of ints and answering the question “what could possibly go wrong” with “parameter inversion.” You make a mental note to see how many different locations there are, with a view to creating an enumeration so that the second parameter can be converted into a more narrowly focused type.

Time passes and, with a crashing inevitability, pay mechanisms change again, and another consideration emerges. This one is based on team size. There has been grumbling that management pay grades don’t reflect the additional burden of man-aging a team of more than 10 people. Rather than add a new pay grade, which would represent an enormous amount of bureaucratic negotiation, a new rule has been added for employees in the management pay grades that, as with long service, adds a small multiplying factor. This is a simple matter of passing in the number of reports, so we add a fourth parameter.

Click here to view code image

int salary(PayGrade grade, int location_id, int years_of_service,
           int reports_count, bool* large_team_modifier);

The reports_count boundary depends on the pay grade. Above a certain pay grade, the multiplier comes into effect for managers of larger teams. However, the informa-tion about whether the multiplier is applied is important to other functions. After extensive discussion on a lengthy email thread about the merits of returning std::pair<int,bool> versus adding a pointer to bool to the parameter list, team std::pair loses the day and the function signature is now amended to

Click here to view code image

int salary(PayGrade grade, int location_id, int years_of_service,
           int reports_count, bool* large_team_modifier);

It turns out there are several dozen locations whose boundaries are rather flexible, so an enumeration isn’t appropriate for the second parameter type.

This function is now rather overworked. The presence of three consecutive ints in the function signature is a trap for the unwary. If we have five reports after seven years of service, swapping those two values around would likely yield a credible but incorrect value. This kind of bug is desperately hard to spot as word-blindness starts affecting your reading of the code.

In addition, not only is it querying a database for information, but it is also per-forming a couple of additional calculations, one of which is conditional on other state. The function is called salary, which seems innocuous, but in fact a fair amount of activity is going on under the hood. If this is performance-sensitive code, that may be relevant.

Simplifying matters through abstraction

The guideline suggests that the two most common reasons for functions having too many parameters are

Missing an abstraction
Violating “one function, one responsibility”

Let’s look at those in detail.

The purpose of the salary function is to calculate a value given some state. The function started off taking two pieces of state and grew as requirements changed. However, one thing that remained constant was that the function calculated a salary according to an employee’s details. On reflection, once that third parameter made an appearance in the function signature, the smart thing to do would have been to encapsulate the parameters into a single abstraction and call it SalaryDetails.

Once you have a collection of state serving a purpose, possibly with some relationships between them, there is a chance that you have discovered an abstraction. Collect that state together into a single class, give it a name, and form those relationships into class invariants.

This is what is meant by missing an abstraction. Once you have a collection of state serving a purpose, possibly with some relationships between them, there is a chance that you have discovered an abstraction. Collect that state together into a single class, give it a name, and form those relationships into class invariants.

Applying this process to the salary function, we now have a struct called SalaryDetails that looks like this:

Click here to view code image

struct SalaryDetails
{
  SalaryDetails(PayGrade grade_, int location_id_, int years_of_service_,
                int reports_count_);

  PayGrade pay_grade;
  int location_id;
  int years_of_service;
  int reports_count;
};

and a function signature that looks like this:

Click here to view code image

int salary(SalaryDetails const&);

This is only a partial improvement. There are still three ints in the constructor ready to trap the unwary. Indeed, there is a Core Guideline warning against this practice, I.24: “Avoid adjacent parameters that can be invoked by the same arguments in either order with different meaning.” However, techniques exist, such as strong typing, to mitigate this problem, so all is not lost.

As changes were made to the original salary requirements function, those changes could be reflected in the SalaryDetails struct instead. Indeed, you might decide to make salary a member function of the SalaryDetails abstraction. You could also make large_team_modifier a predicate, that is, a function that returns true or false, and create a class:

Click here to view code image

class SalaryDetails
{
public:
  SalaryDetails(PayGrade grade_, int location_id_, int years_of_service_,
               int reports_count_);
  int salary() const;
  bool large_team_manager() const;

private:
  PayGrade pay_grade;
  int location_id;
  int years_of_service;
  int reports_count;
};

Client code would now look like this:

Click here to view code image

auto salary_details = SalaryDetails(PayGrade::SeniorManager, 55, 12, 17);
auto salary = salary_details.salary();
auto large_team_manager = salary_details.large_team_manager();

If you decide against the member function approach, then the member data would be public and the client code would look like this:

Click here to view code image

auto salary_details = SalaryDetails(PayGrade::SeniorManager, 55, 12, 17);
auto salary = calculate_salary(salary_details, &large_team_manager);

Let’s reiterate what went on there. Functionality was required to produce a value from some state. The quantity of state grew. That state was abstracted into a class, and member functions were added to reflect what was originally wanted from the function.

It’s worth taking a moment here to consider where the data being used to call the function came from. We explicitly provided 55, 12, and 17 in the example, but that would be an unlikely use case. It is more likely that there is an Employee class containing this information and it was simply being passed to the salary function, perhaps like this:

Click here to view code image

for (auto const& emp : employees)
auto final_salary = calculate_salary(
   PayGrade::SeniorManager, emp.location, emp.service, emp.reports);

When I see a function call like that, I immediately wonder why it isn’t a member function of the data source’s class. In this case I would be asking “why isn’t salary a member function of the Employee class?”

Perhaps the author is unable to modify the Employee class; it may be in third-party code. In that case, it is better to pass the entire Employee class to the salary function via a const reference and let the function query the class rather than the engineer calling the function, like this:

Click here to view code image

for (auto const& emp : employees)
auto final_salary = calculate_salary(PayGrade::SeniorManager, emp);

Both solutions reduce the quantity of parameters that the salary function takes, and that’s the aim of this guideline.

Do as little as possible, but no less

Does this mean that you should always convert a bundle of parameters to a class?

It does not. The guideline says keep the number of function arguments low. If you are working within the x64 ABI, there is a four-register fast-call calling convention by default. A four-parameter function will execute slightly faster than a function taking a class by reference. It is up to you to decide if the trade-off is worth it. Of course, if you have a dozen parameters, then creating a class to encapsulate the state is an obvi-ous choice. There is no hard-and-fast rule, just a guideline that should be interpreted in the context in which you are working.

The second part of the guideline discussion focuses on violating the “one func-tion, one responsibility” rule. This is a simple rule that says that a function should do one thing only, which enables it to do it well. The poster child for violation of this principle is the realloc function. There is a good chance that if you are a well-behaved C++ programmer you will never have encountered this beast. It exists in the C Standard Library, where it is declared in the header <stdlib.h> with this signature:

Click here to view code image

void* realloc(void* p, size_t new_size);

It does a number of things. Principally, it resizes the memory pointed to by p. More precisely, it will grow or contract the raw memory block pointed to by p to new_size bytes, allocating a new block if it can’t grow the existing block, and then copying the contents of the old memory block to the new memory block. This copy ignores the semantics of copy construction and simply copies the bytes, which is why a well-behaved C++ programmer is unlikely to encounter it.

If you pass zero to the new_size parameter, the behavior is implementation defined. You might think that it would simply free the block entirely, but that is not necessarily the case.

The function returns the address of the new block of memory, or the address of the expanded block of memory. If a bigger block is requested and there is not enough memory to allocate this new block, the return value is the null pointer.

There are two things going on here. The memory is resized, and the contents may be moved. Different levels of abstraction are being mixed up, and the second action is conditional on the first failing. If I were to offer this functionality from scratch, I would offer a single function, called resize, like this:

Click here to view code image

bool resize(void* p, size_t new_size);

This would simply attempt to grow or shrink the block. If it failed, I could then allocate a new block myself and move everything over. There would be no need for implementation-defined behavior. I would be respecting levels of abstraction by sep-arating reallocation from moving stuff around.

The principle of “one function, one responsibility” is related to cohesion. In soft-ware engineering, cohesion refers to the degree to which things belong together. We hope that the example above demonstrates high cohesion in a function. High cohe-sion is a good thing. It leads to improved readability and reusability while diminish-ing complexity. When you cannot give your function a good name, there is a good chance that it is doing too many things. Naming is hard, particularly when you are naming something complicated.

In the case of a class, high cohesion implies that the member functions and data are intimately related. Cohesion is increased when the member functions carry out a small number of related activities with a small set of data rather than with unrelated sets of data. High cohesion and loose coupling often go together. You can read more about them in the book Structured Design,¹ a text over forty years old but still worth reading today.

1. Yourdon, E, and Constantine, L, 1978. Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design (2 ed.). New York: Yourdon Press.

Real-life examples

If you squint in the right way, you will see that the second guideline discussion is really another way of expressing the first. When a function accumulates responsibili-ties, it costs abstraction, becoming a very specific entry in the glossary of the prob-lem domain. Let us look at the examples shown in the guideline itself.

The first of these is the merge function. This behemoth has the following signature:

Click here to view code image

template <class InIt1, class InIt2, class OutIt, class Compare>
constexpr OutIt merge(InIt1 first1, InIt1 last1,
                      InIt2 first2, InIt2 last2,
                      OutIt dest, Compare comp);

While the signature is arguably readable, I had to abbreviate it slightly to fit on the page, type it several times, correct several errors, and proofread it with care and attention before getting it right, even while copying it from cppreference.com.

There is scope for identifying abstractions. The function takes a selection of itera-tors which mark a pair of ranges. Since C++20, we are now able to identify ranges explicitly as a pair of iterators marking the beginning and the end of the range. This allows us to simplify the function by bundling together the first four iterator parameters:

Click here to view code image

template <class InRng1, class InRng2, class OutIt, class Compare>
constexpr OutIt merge(InRng r1, InRng r2, OutIt dest, Compare comp);

The detail of the range is at a lower level of abstraction. We are merely interested in merging two ranges. Another way of defining a range is as a pointer to the beginning of some items, along with the number of items. The second example in the guideline offers this function signature:

Click here to view code image

void f(int* some_ints, int some_ints_length);

Another object introduced in C++20 is std::span. The Core Guidelines are accom-panied by a support library, called the Guidelines Support Library. This is a collec-tion of classes defined in the namespace gsl that can be used to support enforcement of some of the Core Guidelines. std::span was developed from gsl::span, and it is precisely as described above: a pointer to some data, and a length, bundled together into a single object, yielding the following signature:

Click here to view code image

void f(std::span<int> some_ints);

In both examples, we have induced abstractions from the parameters.

There is another way of identifying a range, which is with a pointer and a sen-tinel value. This method is built into the language in the form of the string literal. This is expressed as a pointer to an array of characters with a null terminator. That terminator is the sentinel character. The more specialized version of std::span, std::string_view, can be constructed from a pair of iterators, a pointer and a count, and a pointer and a sentinel.

Summary

There are many ways of building abstractions, but often there comes a time when the client wants more from their function. We cannot simply say, “Nope, I’m not going to add another parameter, that violates Core Guideline I.23,” and fold our arms, glaring at them. This is not very kind to your colleagues. What, then, do we do when there are no further meaningful abstractions to be induced?

The presence of a surplus of parameters is a signal that complexity is getting out of control and that something needs attention. If there are no more abstractions to be induced from them, then perhaps the problem lies with the function identi-fier itself. If a lot is being demanded of it, then it is clearly a very important identi-fier in the problem domain. Rather than try and encapsulate that importance in a single abstraction, perhaps it is time to consider broadening the function. Perhaps the function should be overloaded according to how its use is intended, or varying function names can be used to accommodate the different types of operation. Maybe a function template is required. There are always other options besides adding addi-tional parameters.

The important thing to remember, whatever the reason for adding additional parameters to a function, is that it is not to be undertaken lightly and should be considered a last resort, and a place to start reconsidering the nature of the function itself. Keep the parameter count low, separate similar parameters, minimize com-plexity, and rejoice in the discovery of abstractions.

Multiple parameters increase the burden of understanding on the user.
Gather parameters together in structs, perhaps with a view to discovering a latent abstraction.
View multiple parameters as a sign that the function may be trying to do too much.