6
Advanced Object-Oriented Programming

You are ready to learn about more methods and techniques of object-oriented programming in C++. Congratulations! In this chapter, we will present programming topics such as processing regular expression s, implementing state machines, building our own libraries, the filesystem, system clock and time measurement, ranges and functional programming, expression parsing, syntax trees, and the composite and interpreter design patterns, all complete with code examples.

6.1 Functional Objects

We have already implemented many classes and created many objects. Usually, these have been related to real objects expressed with nouns, such as TQuadEq or TLongNumber. However, objects can also represent actions expressed with verbs, such as do_this or compare_that. In such cases, the central part of the object is a function: do_this or compare_that. Sometimes these are accompanied by data. Although not obligatory, if the main goal of a class is to represent a particular action or a function, this is usually implemented using the overloaded functional operator (). Objects of such classes are called functional objects or functors. A call to such an overloaded operator () resembles a call to a simple function, as mentioned in Section 3.14.9. In this section, we will investigate an interrupt-like component to handle various actions that are invoked in response to different external events. Such a component may be a part of an embedded system or, with minor modifications, can serve as a menu-handling module, for instance.

Figure 6.1 shows an event array with attached lists of handlers, performing various actions on an event. The actions are in the form of user-derived class es from the THandler base class with the overloaded virtual operator (). Actions for each event, such as a system interrupt, are arranged as dynamic lists. That is, actions can be added to and also removed from the lists.

Listing 6.1 presents code implementing the event system shown in Figure 6.1. All the definitions are enclosed inside the EventSystem namespace. The THandler class, whose the definition starts on line [6], constitutes the base class for all actions in the system. Not surprisingly, actions are implemented by overloading functional operators in the classes derived from THandler. For this purpose, on line [16], THandler defines the overloaded operator () as a pure virtual function, which is indicated with the = 0 suffix (Section 5.3.1.2). This also means THandler is intended only to be a base class that defines an abstraction for the entire hierarchy (here, this is an abstraction of calls). As we already know, the base class's destructors should also be defined virtual. In THandler, this is achieved on line [11]. But interestingly enough, ~THandler is not only virtual but a pure virtual function, containing a full body implementation on line [12]. This is sometimes useful – and thanks to this, we will be able to trace calls to the THandler destructor.

Schematic illustration of an array of lists of handlers with various actions invoked in response to external events. The actions can be user-defined by inheriting from the THandler base class. Such action handlers can then be attached to or detached from the lists of actions associated with each event in the system. There can be empty lists as well.

Figure 6.1 An array of lists of handlers with various actions invoked in response to external events. The actions can be user-defined by inheriting from the THandler base class. Such action handlers can then be attached to or detached from the lists of actions associated with each event in the system. There can be empty lists as well.

Let's take a closer look at the syntax of the functional operator and its calls. Figure 6.2a shows the syntax of the definition of the functional operator. The syntax of its call is shown in Figure 6.2b, whereas Figure 6.2c shows a call with the explicit operator keyword.

Schematic illustration of the (a) Syntax of the definition of an overloaded functional operator. (b) Short syntax for calling a
functional operator from an object. (c) A qualified call to the functional operator.

Figure 6.2 (a) Syntax of the definition of an overloaded functional operator. (b) Short syntax for calling a functional operator from an object. (c) A qualified call to the functional operator.

Continuing in the EventSystem namespace, on line [18], the definition of TEventHandlers begins. Using the scoped enum class E Event, on line [22], example events are defined (these can represent exceptions, main menus, etc.). Note that kSentinel is not an event name but the last value (a sentinel ) in this enum. Then, the HandlerList and EventVector data structures are defined with familiar Standard Library (SL) containers [25–29]. HandlerList stores smart pointers to objects to facilitate managing their lifetime.

The fEventVector object, defined on line [32], will store the lists of handlers. This object is used by the AttachHandler_4_Event function, defined on line [36], to attach a new_handler to the event_to_attach event. Note that new_handler is passed by value since its held object will be moved by a call to the emplace:back member on line [41]. This means after we exit AttachHandler_4_Event, whatever object was passed as new_handler will be empty.

Next is a series of three definitions for the overloaded virtual void operator (). The first, defined on line [45], takes a reference list _of_handlers to HandlerList as its parameter and calls operator () on each of the handler objects that HandlerList contains. That is, on line [48], first an object is dereferenced with the * operator from the smart pointer, and then its operator () is invoked.

operator () on line [51] does the same thing but provides an identifier for an event. Note that the qualified call on line [54] is to operator () from line [45], as illustrated in Figure 6.2c.

Finally, the third operator (), defined on line [58], traverses all of the events on line [60] and, for each list associated with that event, calls all of the registered handlers. Line [61] calls operator () from line [45], but this time using the this pointer.

Let's now test this code in the EventHandlers_Test function, whose definition begins on line [65]. First, the theHandlerBoard object is created on line [68]. Then, two local handler classes HA and HB, both derived from THandler, are defined on lines [74, 90]. Their structure is straightforward: HA prints its passed text, while HB converts its passed integer from decimal to hex. What is interesting is the way these text and integer objects are passed to HA and HB. Since virtual functions must have the same list of input parameters, it is not possible to change them from class to class. Therefore, additional parameters, if necessary, must be passed through the constructors to the classes. With this technique, we can overcome the limitations associated with passing parameters to virtual functions.

Also note that in both classes, the overloaded operator () functions have the additional override suffix. This relatively new feature of C++ that we saw in Section 4.11 lets us ensure that operator () overrides the operator () from the base THandler class. If, for instance, their formats do not match, then the compiler reports an error (which is easy to see after changing void to, for instance, int ). Hence, this optional override specifier lets us protect against such inconsistency-related errors.

On line [107], the ha object is created with a text argument. Then, on line [108], its functional operator is invoked. Note that line [108] looks exactly like any other call to a function with no arguments. Hence, its name – a functional object.

A temporary HB object (no name) is created and immediately invoked on line [110]. Soon we will see the results of this call.

Then, on lines [116–126], two groups of handler lists are created for the kReset and kExternal events. Each handler object is first created by a call to the std::make_unique function, providing suitable construction parameters, and then immediately passed as an argument to the AttachHandler_4_Event function for the chosen event.

Finally, all actions from the handlers are invoked on line [129] by calling operator () with no arguments.

Let's now examine the output obtained after building and running the previous code:

The handlers' actions are as expected. However, it is interesting to observe the order and type of the called destructors. These are automatically invoked due to the operation of the TH_UP smart pointers (Section 5.3). Notice that for each object, first its destructor and then its base class destructor are called, in that order. Hence, for each of the deleted objects, we register two calls to the destructors. The temporary object HB, created on line [110], is destroyed after exiting this line, rather than waiting until the end of the test function.

Finally, notice that THandler realizes what is called a command design pattern (an action or transaction pattern). Usually it cooperates with such patterns as the factory pattern (Section 5.3.1.2) and composite pattern (Section 6.6.4).

6.2 Example Project – Extending the Currency Search in XML Files, and Using State Machine and Regular Expressions with the regex Library

In Section 3.18, we saw that the TCurrencyExchanger class can store a number of currencies represented by TCurrency objects. It lets us add new currencies to the repository, as well as drive them in and out to any stream, such as the keyboard and screen or a file. However, entering them by hand would be very tedious. Also, currency exchange ratios tend to change quite often, so we need an automatic mechanism for loading their current values. This can be done by downloading a web page or a file containing all the necessary information. Example content of such a file containing currency data, encoded in the Extensible Markup Language (XML) format (see www.w3.org/XML and

https://en.wikipedia.org/wiki/XML) can look like Figure 6.3 (downloaded from www.nbp.pl).

Snapshot of an example XML file from the National Bank of Poland, containing information about currency buying and selling ratios for various world currencies. All information about a single currency is contained in few consecutive lines, starting and ending with special tags as outlined in the red box. Due to international
characters, processing should be done with wide strings.

Figure 6.3 An example XML file from the National Bank of Poland, containing information about currency buying and selling ratios for various world currencies. All information about a single currency is contained in few consecutive lines, starting and ending with special tags as outlined in the red box. Due to international characters, processing should be done with wide strings.

This XML file, although readable, looks quite noisy, partly because most of the words are in Polish. But we can easily spot interesting information placed on consecutive lines in specific XML tags. Because we will be processing general text, rather than std::string, for the correct representation, we need to use std::wstring, which is able to store Unicode characters. However, the main question at this stage is how to find and select interesting information in the XML file. We can do so using the regular expression search and match (regex) package, which is now part of the set of C++ libraries.

6.2.1 Pattern Matching with the Regular Expression Library

Regular expressions are used to specify text patterns: for example, to search for a specific phrase or a file name. These are used, for instance, in the grep Linux command (Dalheimer and Welsh 2006). C++ contains a rich regex library that allows for text pattern matching with regular expressions. To start, we present handy definitions of the most frequently used regex symbols and rules, with suitable examples. These operate with the default ECMA script regex syntax, although other variants, such as Linux grep, are also possible after setting the correct flag.

Returning to our project, let's create regular expression s that match the tag-delimited lines with information about currency. The first line includes a currency name, possibly containing international letters as well as whitespace, enclosed between <nazwa_waluty> and </nazwa_waluty> tags as follows:

To easily find a corresponding regular expression, it is useful to split the pattern into parts:

Schematic illustration of the split patterns of the regular expressions.

This is matched by the following regular expression

The leftmost part will exactly match the series of letters <nazwa_waluty>. Then, the expression (.+), which contains special characters from Tables 6.1 and 6.2, matches any character (denoted with a dot .) repeated one or more times (denoted by the + symbol). This is enclosed in parentheses () to group all matched characters, since after a match, we wish to read them together as a currency name. Finally, </nazwa_waluty> literally matches the end tag for this entry. Taken all together, the <kod_waluty>USD</kod_waluty> pattern should be matched.

A little bit more demanding is the pattern for the selling rate (i.e. “kurs kupna”):

We would like to see it in the following form

Schematic illustration of the split patterns of the regular expressions.

since in this case, we also want to replace the comma (,), used in Poland to separate the integer from the fraction part of a number, with a dot (.) for easier conversion from a string to a value. A possible regular expression for this task may look like the following:

Here, the [[:d:]]+ pattern matches any digit repeated one or more times.

Table 6.1 Special characters and groupings of regular expression s (default ECMA standard).

image
image

Table 6.2 Frequently used regular expression character sets (ECMA).

6.2.2 State Machine Pattern

Each set of lines shown in the red box in Figure 6.3 presents information about a currency.

These lines are consistent and should be matched in order, one after the other. Thus, we need to write code that first matches a given line, then goes on to match the second line, and so on. This order of actions can be hardcoded with nested if - else statements . But a more versatile technique is to use a state machine approach. A state machine is a kind of a mathematical model that builds on states and transitions between them. Usually, special start and final states are also selected. Figure 6.4 depicts a UML state diagram, sometimes called an automaton, for consecutively matching all the lines with information about a currency. Actually, of the five lines in the red box in Figure 6.3, only four convey important information.

Schematic illustration of the UML state diagram for retrieving currency information from an XML file. The system enters an idle state and processes one line after another

Figure 6.4 UML state diagram for retrieving currency information from an XML file . The system enters an idle state and processes one line after another.

6.2.3 Implementing the Extended Class

The previous ideas have been implemented in the XML_CurrencyExchanger class, which extends the TCurrencyExchanger base class explained in the previous section. Here we clearly see the benefits of object-oriented programming and inheritance . XML_CurrencyExchanger inherits all the functionality of the base class TCurrencyExchanger. All it needs to add is the FillCurrencyTableFrom virtual function, whose role is to fill the currency table from a specified XML file . A UML class diagram with the most important members and relations of these classes is shown in Figure 6.5. Since TCurrencyExchanger and its derived XML_CurrencyExchanger aggregate TCurrency objects, we use an arrow that starts with a diamond (see Table 4.4).

Let's now analyze the code shown in Listing 6.2, from the XML_CurrencyExchanger.h header file. Line [1] contains #pragma once to ensure that a given header file is included only once in a given translation unit . The other possibility is to #define a unique preprocessor constant and then check whether it has already been defined, as presented in Appendix A.1. On line [6], the regex is included; and on line [8], std::wgrex is introduced into the scope of this header. Then our CppBook namespace is opened between lines [13–65].

Schematic illustration of the UML class diagram of the TCurrencyExchanger base class and its derived XML_CurrencyExchanger. TCurrencyExchanger aggregates objects of the TCurrency class,
as indicated by an arrow with a diamond. Only the most important class members are shown.

Figure 6.5 UML class diagram of the TCurrencyExchanger base class and its derived XML_CurrencyExchanger. TCurrencyExchanger aggregates objects of the TCurrency class, as indicated by an arrow with a diamond. Only the most important class members are shown.

The definition of the XML_CurrencyExchanger class extends from lines [18–62]. It is publicly inherited from the TCurrencyExchanger class, which was discussed in the previous section. On line [25], the ESearchStates scoped enum (class) is defined. It contains identifiers for states for our state machine, as will be shown.

Lines [31–37] define the four wregex regular expression s . Their operation should be clear at this point. The only thing we need to explain is the construction of the wide raw string literal (constant) using the

construction, which can be a little surprising. Wide characters are identified with the L prefix here. Such a construction can contain characters that are not interpreted. Thus, for example, backslashes need not be preceded by backslashes, etc.

Finally, on line [60], the FillCurrencyTableFrom member function is introduced. It is declared virtual so that it can be overridden in derived class es, if necessary. The definition of this function is contained in the XML_CurrencyExchanger.cpp file, presented in Listing 6.3.

On line [13] of the FillCurrencyTableFrom function, an inFile file is open with the supplied path. Note that in the domain of wide characters, all streams and containers should be set. This is done using a dual set of objects whose names start with the “w” prefix. For instance, instead of string, wstring is used on line [25] and wsmatch on line [28], etc. On line [31], wstring objects are defined, which will be used to store parts of the matched expression.

The main action consists of reading a line of wide characters, checking for potential patterns, and conditionally changing the states based on the matches. This is governed by the while loop, starting on line [34]. As long as there are unprocessed lines in inFile, the getline function copies a consecutive line into the line string and returns true. Inside the loop, the state machine operates.

One of the methods to implement a state machine of a fixed size, such as the one presented in Figure 6.4, is to use the switch statement (Section 3.13.2.1), which will switch between consecutive states until a final state is reached. The initial state, held in the state variable, is set to ESearchStates::kIdle. So, line [41] executes first, checking for a potential match of the f_curr_name_pat pattern. If this pattern is matched by the regex _search function operating on the current line, the state is changed to the next state, as shown in Figure 6.4. The regex_search function takes three arguments:

  1. The input stream of characters (a line object, in our example)
  2. Output results if a match found ( match_res )
  3. Regular expression to be matched, of a wregex type

The results of a successful match are stored in the match_res object. It contains the entire matched pattern, as well as sub-matches arising from the groups defined in the regular expression . The structure of match_res is shown here:

Schematic illustration of the structure of match_res.

m[ 0 ] is the entire matched pattern, m[ 1 ] is the first subexpression, and m.size() denotes the number of subexpressions; m.prefix() and m.suffix() denote all characters in the input stream before or after the matched pattern, respectively.

Operation on substrings can be observed on line [75]. In this case, buy_rate_str is a concatenation of the first and second matched substrings: the integer and fraction parts of a number, joined with a dot instead of a comma as discussed earlier.

Once we enter the final state, ESearchStates::k_SellingRate_Processed, all of the information about the currency has been collected, and the currency object is ready to be added to the database of currencies. The new currency is added on line [102] using the AddCurrency function. Wide strings are converted to floating-point numerical values using the std::stod function. This is one of the very useful string-conversion functions shown in Table 3.5. Since string conversion can throw an exception, e.g. on an empty string, the wrong format, etc., adding a new currency is encompassed by the try - catch statement on line [104]. Finally, if a new currency object is added successfully, the state machine takes on the ESearchStates::kIdle state again on line [109], and the entire mechanism is ready to process data, searching for more currency entries in the input XML file.

A sibling function to regex _search is regex_match . The main difference is that regex_match tries to match the entire input to the pattern.

The C++ regex library has much more functionality than we have outlined here. More information can be found in the literature (Stroustrup 2013; Josuttis 2012) and on the Internet. For more complex state machine implementations, see the Meta State Machine (MSM; www.boost.org/doc/libs/1_70_0/libs/msm/doc/HTML/index.html) from the Boost C++ libraries (Boost.org 2020).

We will summarize what have we learned about in this section:

  1. Basics of the C++ regex library and regular expressions
  2. regex _search functions for regular expression searches and regex_match for matches
  3. Reading and processing XML file s
  4. State machines and UML state diagrams

6.2.4 Project Extension – Loading Currency Information from the Internet

In the previous section, we saw how to find and load information about currencies from an XML file . However, it would be good to have an automatic mechanism for downloading such an XML file from a bank via an Internet connection. Such an operation is certainly possible with a call to a specific HTTP or FTP library. But a subtle problem is that these differ from system to system. So, if we stick to one platform, such as Windows or Linux, that's fine; but what if we wish to write multisystem software that will work equally well on Windows, Linux, Mac, or a mobile? We can imagine a solution that would conditionally compile various versions of the code, depending on the platform type detected at compile-time . A more flexible solution would be to create two software components: one that defines an unchanging interface used by all clients, and a second containing specific implementation suitable on a specific platform. Such a code structure, although it requires slightly more code, gives a very desirable separation of abstraction (i.e. a common calling framework) from implementation (i.e. all the code necessary to perform actions). Figure 6.6 shows a UML class diagram presenting the proposed solution.

Such decoupling of abstraction from implementation is known as the handle-body (or bridge) design pattern (Gamma et al. 1994).

The external call can be done exclusively to an object of the HTTP _File_Handle class through the exposed Load_HTTP_File function, whose role is to download a given file from a given web page. But the class does not know how to do this for all the possible OSs. For this purpose, it has an associated hierarchy of body classes that specialize in HTTP file downloads for specific systems. In a concrete system, a body class specialized for this system is created and maintained by HTTP_File_Handle. Whenever the HTTP_File_Handle::Load_HTTP_File function is called, the action is delegated to the corresponding Load_HTTP_File function in one of the selected body classes, as illustrated in the UML sequence diagram in Figure 6.7. Implementation of the hierarchy of the body classes shown in Figure 6.6 is shown in Listing 6.4.

Schematic illustration of the UML class diagram of the handle-body design pattern used to decouple abstraction, represented by the handle class HTTP_File_Handle, from the platform-dependent implementation by the body class hierarchy with the HTTP_File_Body base class and its derived classes. An interface for external calls is available only through the handle part.

Figure 6.6 UML class diagram of the handle-body (bridge) design pattern used to decouple abstraction, represented by the handle class HTTP_File_Handle, from the platform-dependent implementation by the body class hierarchy with the HTTP _File_Body base class and its derived class es. An interface for external calls is available only through the handle part.

On line [16], we see a declaration of a pure virtual Load_HTTP _File function (see Section 4.11). This means it has to be overridden in the derived class es, as is done for HTTP_File_Body_Windows on line [28], HTTP_File_Body_Linux on line [37], and HTTP_File_Body_MacOS on line [47]. Each class is responsible for implementing Load_HTTP_File in a different OS. An overridden function must be an exact copy of its prototype in the base class: a copy-and-paste. However, since in practice functions take dozens of parameters of various types, and the hierarchies frequently are deep, it is not unusual to mistakenly break a function override due to inconsistencies in function names or parameters. To remedy this situation, the override (contextual) keyword is used. Its use is not compulsory but is recommended, since it forces the compiler to check whether a function decorated with this keyword truly overrides a function in one of the base classes, as discussed in Section 6.1. From the other end, there is the final (contextual) keyword, which lets us stop a chain of virtual functions and overriding (Stroustrup 2013).

Since we are designing a hierarchy of classes with virtual functions, which will also allow us to manipulate derived objects via pointers to the base class, it is important to define the destructors of the base classes as virtual, as on line [20]. We can also declare such virtual destructors in derived class es, but this is not necessary when a virtual destructor is defined in the base class. The = default instructs the compiler to generate a default implementation.

Schematic illustration of the UML sequence diagram of the handle-body design pattern.

Figure 6.7 UML sequence diagram of the handle-body design pattern.

Neither class in the HTTP _File_Body hierarchy needs to know what will be using it. On the other hand, HTTP_File_Handle needs to know its body hierarchy and properly create and maintain a necessary body object that will be used for further actions. An implementation of the HTTP_File_Handle class is shown in Listing 6.5.

On line [54], the using directive is used to define BodyPtr, which is a smart unique_ptr, discussed in Section 5.3. A pointer fBodyObj of this type is defined on line [56] to point at the body object from the HTTP _File_Body class hierarchy. However, which one is chosen is decided in the HTTP_File_Handle class constructor, defined on lines [62–83]. This is an example of polymorphism . A parameter of the EHandledSystems class enumerated type, defined on line [60], tells what system should be chosen. For instance, for the Windows OS, line [67] is executed, which creates the HTTP_File_Body_Windows object and assigns it to the fBodyObj pointer. The benefit of a smart pointer is that once it is attached to an object, we do not need to bother how to delete the owned object – it will be deleted automatically together with fBodyObj when the object itself is deleted. This has an obvious consequence – the lifetime of the HTTP_File_Body object is strictly coupled with the lifetime of the HTTP_File_Handle handle object. Thus, this relation is called a composite aggregation, as shown in Figure 6.6 and discussed in Section 4.9.

On line [88] of the HTTP _File_Handle class, the Load_HTTP_File function is defined. It simply takes two wstring parameters and, on line [92], delegates an action to the analogous Load_HTTP_File function – but in the associated body object, held in the fBodyObj member. For this to be safe, the latter needs to be properly initialized. For the debug version, this is checked on line [90] in the precondition, as postulated by the “programming by contract” methodology (Section 4.7.3). Take advantage of such debug checks as often as possible since in practice they can save a lot of trouble when it comes to making code running properly. Take our word for it!

Once again, since HTTP _File_Handle contains a virtual function and can be derived from in the future, its destructor is defined as virtual on line [97]. Also, = default is used.

Finally, let's take a look at the implementation of one of the body objects. More precisely, the only thing to implement is the HTTP _File_Body_Windows ::Load_HTTP_File function in this case. On the Windows OS, the urlmon library and header need to be included – this is done on lines [3–11]: #include for the header and #pragma comment(lib, "urlmon.lib") for the library. The latter is one of the ways a library can be added to Microsoft Visual C++ projects. However, this happens only if the _WIN32 flag is set, which is true only when compiling for Windows.

Finally, on line [21], the Windows -specific URLDownloadToFile function is called to download the file. Since this function expects C-style string pointers, c_str needs to be called from each of the supplied wstring objects. This also happens if the _WIN32 flag is set, i.e. exclusively in the Windows OS. In other systems, the code simply boils down to the return false statement . Other systems have their own compile flags set. This is how the entire selection mechanism works in our project.

The downside of this handle-body (bridge) based solution is that in a given OS, only one path is compiled and executed. All of the other N − 1 classes, where N denotes the total number of supported OSs, are inactive and protected from attempted compilation by the #if - #else - #endif constructions. But the solution is clear and easily extensible without touching the rest of the software components, which communicate only through the handle class.

The previous solution might look like overkill for a simple file-download problem. Indeed, very simple tasks can be done succinctly with only a few lines, and proliferating classes might seem not necessary. But code and interdependencies usually tend to grow, and such a separation of abstraction from implementation almost is always beneficial. Various other techniques are comparably good: for instance, instead of the dynamic object-oriented inheritance used in this example, a static solution employing class templates and compile-time computations offers many benefits.

Nevertheless, our handle-body design pattern (bridge) is one of the most frequently used constructions in many software packages. It is useful to understand its structure and properties in order to spot it in new software designs or software refactoring tasks. Separating an abstraction (handle) from the implementation (body) in general layered architectures increases the number of classes but also adds a lot of flexibility to the solution. A related pattern is the strategy pattern, which lets us dynamically choose an algorithm based on external conditions. For example, there are many sorting algorithms that perform differently on small and large databases. So, the correct sorting algorithm is chosen based on the amount of work to do. Also interesting in this context is the Pimpl (pointer to implementation) pattern (Meyers 2014). However, its main purpose is to decouple source code compilation dependencies by hiding the implementation.1

In this section, we learned about the following:

  1. The handle-body (bridge) design pattern to decouple abstraction from implementation
  2. Elements of multiple-platform programming
  3. Loading HTTP files from the Internet on different platforms

6.2.5 Launching the Extended Version of CurrencyCalc

In Section 3.18.5, we discussed the CurrencyExchanger component that let us launch our first version of the CurrencyExchanger application. It operates in a terminal window and allows a user to do basic currency calculations. In previous sections, we extended this version to allow for automatic downloads of current exchange ratios from the Internet. So, let's make this upgraded CurrencyExchanger 2.0 version run.

We will further exploit the technique of grouping component elements in namespace(s) rather than classes, as discussed in Section 3.18.5. We have already created the CurrExchanger namespace for the basic version of CurrencyExchanger. Hence, in Listing 6.6, line [5], we continue in this direction. But since we wish to distinguish the two versions from each other, we separate them by opening a new OnLine_CurrExchanger namespace on line [9], nested inside CurrExchanger. To some extent, this is an alternative to create a CurrExchanger class from which OnLine_CurrExchanger would be derived. The two approaches can be seen as alternatives with drawbacks and benefits on both sides. In the former, we simply do not explicitly create the objects.

Lines [12, 15] define the wstring objects local to the OnLine_CurrExchanger namespace. They simply hold the Internet and local default addresses of the XML “from” and “to” files. As already pointed out, the downloaded XML file provides access to current exchange ratios. Moreover, the two addresses can be easily accessed and changed if necessary. On line [18], our local OS is indicated by defining the local variable kMyOS.

The CurExchanger_DownloadCurrencies function is defined on lines [19–23]. Its role is to create an HTTP _File_Handle object named http_client and initialize it appropriately on the selected platform. It is then used to download the XML file from the Internet location and save it locally.

The downloaded XML file is then used to initialize XML_CurrencyExchanger, as we will see. But at this point, it is wise to save the current currency data in a separate initialization file. It can be used in cases when e.g. there is no Internet connection and an XML file cannot be downloaded. This is accomplished by the CurExchanger_SaveInitFile function, defined on line [24].

In the previous code, the std::transform function on lines [30–32] needs some explanation (see also Table 3.6). As we already know, each element of the std::map object contains key and value parts. The role of std::transform is to copy only the value part of each of the CurrencyMap objects to the output file, identified with the outFile object. This CurrencyMap object is accessed via the local reference cur_map. The range to copy from is passed by the first two parameters, i.e. cur_map.cbegin() and cur_map.cend(). These return constant iterators (which here begin with the letter c in front) to the beginning and end of this map collection. The output goes to the outFile using the ostream_iterator, set on line [31] to process TCurrency and wide characters. Each currency is saved using an automatic call to operator << (discussed earlier) on lines [7–26] in Listing 3.33. Each such entry is separated by the wide-new-line symbols L"\n", as provided to the ostream_iterator. Last but not least is the lambda function supplied on line [32] as the last parameter to std::transform. It is called from inside std::transform with the vt reference to consecutive elements of cur_map. The role of this lambda is simple: skip the key and return only the value part of the element, identified by vt.second.

The definition of the CreateCurExchanger function starts on line [40]. It is similar to the function with the same name presented in Section 3.18.5. But the two differ in the type of object they return, which in this case is XML_CurrencyExchanger rather than CurrencyExchanger. Also, they differ in the way they are initialized, in this case through the XML file downloaded from the Internet. However, as previously, since the object creation process can fail, an std::optional object defined on line [34] is returned from CreateCurExchanger. It lets us pass a valid object if an object is created successfully, or it is empty if for some reason an object cannot be supplied.

The two parameters passed to CreateCurExchanger are http_XML_FileName, which holds the Internet address of the XML file with current data for currency exchange ratios, as well as initFileFullPath, holding the full path to the local initialization file. This path is used on line [47] to create the name of another local file – the one that will store the downloaded XML file. These files, i.e. the initialization and local XML files, contain data about currencies, but in different formats. To create a new path, we use the filesystem path object and its parent_path, as well as the overloaded / (slash) operator, as presented in Table 6.3.

On lines [49, 50], the XML file is downloaded using the CurExchanger_DownloadCurrencies function. If and only if this succeeds, the XML file is used to initialize the currencyExchangerObj object by calling its FillCurrencyTableFrom member. If the file download fails, FillCurrencyTableFrom is not called due to the evaluation order of logical expressions (see Section 3.19.2). If these two steps succeed, then on line [55], the reference currency is added (the Polish zloty in this example). Next, on line [58], the new version of the initialization file is saved. This can be used in the future if an Internet download is not possible. Finally, on line [60], this branch returns a valid object: a copy of the local currencyExchangerObj object created on line [42].

If the Internet download fails, then on line [67], we try to use a local initialization file to initialize the currencyExchangerObj object. If such a file cannot be opened, then we signal the situation by returning an empty std::optional object on line [68]. Assuming the initialization file is available, it is used on line [71] to read currency data. This is exactly like the previous version of the CreateCurExchanger function (Section 3.18.5). Then, a valid XML_CurrencyExchanger object is returned on line [74]: a copy of the local currencyExchangerObj object created on line [42].

On line [77], a new version of the Run function is defined. On line [84], the CreateCurExchanger function is called, which returns the XML_CurrencyExchanger object. However, on line [87], the CurrExchanger ::Run function is called, which we defined in Section 3.18.5. Interestingly, it is passed the new object of the XML_CurrencyExchanger type, whereas this version of Run was defined to accept a constant reference to the TCurrencyExchanger object. In an object-oriented framework, this is fine, since an object of the derived class can be used any place an object of its base is used. This is an example of polymorphism and the Liskov substitution rule, discussed in Section 4.1. Also note that this “old” version of Run calls other “old” functions, such as DisplayAllCurrencies and UserInterface. These also work with this new version of the passed object.

On the other hand, if CreateCurExchanger fails to create the XML_CurrencyExchanger object, then an empty all_my_options is returned, and an error message is printed on line [91].

In this section, we learned about the following:

  1. Using the filesystem path object
  2. Composing a namespace inside another namespace (nested namespaces )
  3. Using std::transform to copy elements of std::map to a file
  4. Polymorphism when passing a derived object to a function defined to accept an object of the base class

6.2.6 Building a Static Library and a Terminal Window Application

We could easily add the main function to the previous example to create a terminal window version of our application. But since our project is growing and we have further plans to use the CurrencyCalc component from Figure 3.29 in various projects, it is a good idea to split it from the main function and put into a separate static library . The new arrangement of the software components is shown in UML component diagram in Figure 6.8.

To build a static library out of an application project, we need to do the following:

  1. Remove the main function (i.e. the entire file containing main). It can then be used in other projects, as we will discuss.
  2. Change the CMakeLists.txt file to generate a static library project,2 as discussed in Appendix A.6.1.
  3. Run CMake .exe, build a library project, and then build the library.
Schematic illustration of the UML component diagram of the basic CurrencyExchanger.

Figure 6.8 UML component diagram of the basic CurrencyExchanger.

This way, we can build our first library, named CurrencyCalc _Lib . The new CurrencyCalc_Terminal component is linked to CurrencyCalc_Lib, as shown in Figure 6.8. It contains only the main function, as shown in Listing 6.7.

To enter the previous functions and namespaces into scope of main, a #include would be appropriate. However, since inside main we wish to call the Run function, from line [77], it is also OK to provide its declaration, as provided on lines [103–109]. Here we see the structure of the nested namespaces, i.e. OnLine_CurrExchanger with Run, declared inside CurrExchanger . This enables us to call Run from line [115] inside main, leaving to the linker the task of finding and gluing these functions together. The role of the setmode function called on line [114] is to prepare the terminal to output Unicode wide characters, commonly used by the XML_CurrencyExchanger object. Last but not least, we must remember to return an integer value from the main, as on line [116].

In this section, we learned about the following:

  1. Introducing function declarations in nested namespaces
  2. Creating a static library
  3. Creating a terminal window application that links to the static library

6.2.7 C++ Filesystem

File processing belongs to the basic services of every OS running on a computer equipped with data storage devices, such as hard or solid-state drives, flash memory, etc. However, each system defines these operations in a different way, which makes writing multiplatform software cumbersome. We discussed this problem in Section 6.2.4 when trying to write a universal software component to download files using the Internet HTTP protocol. But in C++17+, a common platform was defined for file and directory processing in the std::filesystem namespace. To use it, we need to #include the filesystem header (Cppreference.com 2018). The most common objects and operations of the C++ filesystem library are summarized in Table 6.3.

Most of the objects and functions presented in Table 6.3 have additional options. Also, there are many subtleties related to parameters of the processed path, file, and directory objects. These can be further explored e.g. in (Cppreference.com 2018).

Table 6.3 Common objects and functions of the C++ filesystem library.

Object Description Examples
path path is a class that represents paths to files, directories, and their links in various systems. It behaves like a container and supports many useful functions for splitting the path and concatenating path parts.
Selected members of the path class:
image
To use the new filesystem features of C++, the filesystem header needs to be included. fs is a short name for the std::filesystem, defined as follows:
1   #include <filesystem>
2
3   namespace fs = std::filesystem;

The following function on line [6] converts a wide string std::wstring into a fs::path object. Then, on lines [9–10], each part of the path is printed on the screen.
4   void DisplayPathParts( const wstring & inPath_str )
5   {
6     const fs::path inPath( inPath_str );
7
8     // We can iterate through different parts of a path
9     for( const auto & p : inPath )
10        wcout << p << "\n";
11  }
For the example input pathb
D:\Research\BC++\Projects\CCppBookCode\ReadMe.txt
the following output is obtained:
D:
\
Research
BC++
Projects
CCppBookCode
ReadMe.txt
directory_iterator

recursive _directory_iterator
Allows for a straightforward iteration through the file tree. The second version does an automatic recursive traversal. Order of iteration is not specified. However, it is guaranteed that each object in the filesystem is accessed only once.
image
The following function performs a recursive traversal of a directory provided as an input wstring object. Its action is to print out a file directory structure, although the output can easily be changed to fulfill other requirements.
 1   void RecursivelyTraverseDirectory( const wstring & inDirPath_str )
 2   {
 3      const fs::path inDirPath( inDirPath_str );
 4
 5      if( fs::exists( inDirPath ) && fs::is_directory( inDirPath ) )
 6
 7         for( auto iter =  fs::recursive_directory_iterator( inDirPath );
 8                 iter !=  fs::recursive_directory_iterator(); ++ iter )
 9
10           wcout << std::wstring( 3 * iter.depth(), L' ' )
11                 << ( fs::is_directory( * iter ) ? L"[+]" : L"|--" )
12                 << iter->path().filename() << endl;
13
14   }

The input directory path name, passed as a wstring object, on line [3] is converted to the inDirPath object of the fs::path class. Line [5] checks to see whether inDirPath exists and whether it represents a directory. If this is true, then on lines [7–8], a loop recursively traverses the directory, advancing the iter object of the fs::recursive_directory_iterator class, and initialized with the inDirPath. Finally, on lines [10–12], the directory structure is printed out. iter.depth returns the depth of the part of the directory tree that is currently being processed. An object of the directory tree is accessed with * iter. If it is a directory, then [+] is printed; otherwise, |-- is printed. The name of a directory tree object is obtained by accessing path and then calling filename functions.
For an example directory with a C++ project, the previous function produces output like the following:
|--CMakeLists.txt
[+]include
   |--range.h
|--ReadMe.txt
[+]src
   |--CCppBookCode.cpp
   |--ClassRelations.cpp
   |--Dec2Roman.cpp
exists Returns true if a file or directory exists or false otherwise.
is_directory Returns true if a path represents a directory.
file_size Returns the file size in bytes. The following function prints basic information about the supplied path to a directory or a file:
 1   void DisplayDirFileInfo( const wstring & inPath_str )
 2   {
 3       const fs::path inPath( inPath_str );
 4
 5       wcout << L"exists - \t\t" << fs::exists( inPath ) << endl;
 6       
 7       // Let's call all important members of fs::path 
 8       wcout << L"filename - \t\t"<< inPath.filename() << endl;
 9       wcout << L"stem - \t\t"              << inPath.stem() << endl;
10       wcout << L"extension - \t\t"          << inPath.extension() << endl;
11
12       // If a file, then print its size in bytes
13       if( fs::is_regular_file( inPath ) ) o << val << sep;
14       std::cout << "file size - \t\t"   << fs::file_size( inPath ) << endl;
15
16       wcout << L"parent_path - \t\t"    << inPath.parent_path() << endl;
17       wcout << L"relative_path - \t\t"  << inPath.relative_path() << endl;
18
19       wcout << L"root_directory - \t\t" << inPath.root_directory() << endl;
20       wcout << L"root_name - \t\t"      << inPath.root_name() << endl;
21       wcout << L"root_path - \t\t"<< inPath.root_path() << endl;
22   }

To call file_size on line [14], an object is checked on line [13] with is_regular_file to see if it is a valid file. On lines [8–10], as well as [16–21], numerous members of path are called to get parts of the path. For an example C:\Temp\Readme.txt input file in the test system, the following is obtained:
 id="c06-line-0087">exists -            1
 id="c06-line-0088">filename -          ReadMe.txt
 id="c06-line-0089">stem -              ReadMe
 id="c06-line-0090">extension -         .txt
 id="c06-line-0091">file size -         44
 id="c06-line-0092">parent_path -       C:\Temp
 id="c06-line-0093">relative_path -     Temp\ReadMe.txt
 id="c06-line-0094">root_directory -    \
 id="c06-line-0095">root_name -         C:
 id="c06-line-0096">root_path -         C:\
is_regular_file Returns true if a path represents a regular file (i.e. not a directory and not a link).
file_status Returns an object with information about a file.
copy Copies files and/or directories with their contents. The following function shows how to create two directories [13, 16] and then create a file in one of them [19], copy that file to the second directory [22], move that file back to the first directory [27], and finally remove all files and directories [31]. The last function should be used with great care. To concatenate elements of a path, on line [9], the overloaded /= operator of the path class is used.
1   void CreateDirAndFiles(const wstring& inDirPath_str,
                           const wstring& subDir)
2   {
3      try
4      {
5         // There are conversions std::(w)string <==> fs::path
6         fs::path inDirPath( inDirPath_str );
7
8         // Use overloaded operator /= to
9         inDirPath /= subDir;   // add sub directory
10
11        // Create directories
12        fs::path sub_1_path { inDirPath / L"SubDir_1" };
13        fs::create_directories( sub_1_path );  // create subDir_1
14
15        fs::path sub_2_path { inDirPath / L"SubDir_2" };
16        fs::create_directories( sub_2_path );  // create subDir_2
17
18        // Create a new file
19        wofstream( sub_1_path / L"file_1.txt" ) << L"Fox";
20
21        // Create a second directory & file by fs::copy
22        fs::copy(sub_1_path / L"file_1.txt",   // from
23                 sub_2_path / L"file_2.txt",   // to
24                 fs::copy_options::overwrite_existing );
25
26        // Move file_2.txt to the first directory
27        fs::rename( sub_2_path / L"file_2.txt",  // from
28                    sub_1_path / L"file_2.txt" );// to
29
30        // Remove all dir and files - be careful!
31        fs::remove_all( inDirPath );
32    }
33    catch( fs::filesystem_error & error )
34    {
35    wcout << error.what() << endl;
36    }
37  }

Certain fs functions, such as copy, rename, and remove_all, have several different call versions. Some of them can also throw fs::filesystem_error type exceptions related to error situations. Therefore, the try-catch statement should be used, as on line [33].
The previous function can be called to print information about the current file path.
38  CreateDirAndFiles( fs::current_path(), L"Playground" );
remove
remove_all
Removes a file or a directory with its contents.
create_directory
create_directories
Creates a directory or directories.
rename Renames or moves a file.
current_path Returns a current path (system dependent).
space:info A class to convey space information for a directory. Data members are as follows:
image
The fs::space:info object holds information about capacity, free space, and available space of a directory. For a given directory path, the space information can be obtained with the fs::space function.
 1 void DisplaySpaceInfo( const wstring & inPath_str )
 2 {
 3    fs::space:info dir_space = fs::space( inPath_str );
 4
 5    wcout << L"Space info for: "   << inPath_str << endl;
 6
 7    wcout << L"capacity - \t\t"    << dir_space.capacity << endl;
 8    wcout << L"free - \t\t"          <<  dir_space.free << endl;
 9    wcout << L"available - \t\t"  <<  dir_space.available << endl;
10 }
For the C:\Temp directory, the following information was output on a test system:
Space info for:  C:\Temp
capacity -       987150917632
free -           638753398784
available -      638753398784

a See the DisplayDirFileInfo function.

b To create a wstring object, use wstring( L"D:\\Research\\BC++\\Projects\\CCppBookCode\\ReadMe.txt" ). Remember that to enter one backslash symbol \ in the text, you need to type \\ in the string literal.

6.2.8 User Interface

In the previous sections, we saw the evolution of the CurrencyExchange project: its two versions, both encompassed in the hierarchical namespaces CurrExchanger and its nested OnLine_CurrExchanger . CurrExchanger defines very basic functionality, whereas OnLine_CurrExchanger extends the functionality to automatically load currency information from the Internet. Both were tested as standalone terminal applications. We also saw how to split the project into a static library named CurrencyCalc _Lib and another component, CurrencyCalc_Terminal, which builds a terminal window application (Section 6.2.6). Now it is time to take the next step and to add a graphical user interface (GUI ) to this project. Hence, we will create a third software component to launch CurrencyExchanger as a GUI application: CurrencyCalc_GUI. The UML component diagram in Figure 6.8 will now change as shown in Figure 6.9.

In this section, we are mostly interested in the CurrencyCalc_GUI component, which will be able to draw graphical widgets, as well as process system events, as will be shown. However, the GUI components depend heavily on the type of OS we are working in. This means if we tried to write our CurrencyCalc _GUI for two or three different OSs, we would need to write two or three quite different versions of the component. On the other hand, even if our task was much simpler and only one OS was our target, using the raw API would not be the most productive option. A much better solution is to use one of the GUI libraries that create a type of interface between a user application like CurrencyCalc and one or many different OSs. Appendix A.4 contains a brief overview of the GUI libraries for C++ projects. Alas, every rose has its thorns, and each has its own peculiarities and limitations. In our example, we choose to test the FLTK library mostly because it is free, it is written entirely in C++, and it runs on many OSs.

Schematic illustration of the UML component diagram of the various software components used in our CurrencyCalc project. Because the currency exchange engine is extracted into a separate library, we can easily construct a terminal window or GUI application without interference.

Figure 6.9 UML component diagram of the various software components used in our CurrencyCalc project. Because the currency exchange engine is extracted into a separate library, we can easily construct a terminal window or GUI application without interference.

So, let's take a closer look at the UML component diagram shown in Figure 6.9. The most important change is that we have separated all the components strictly related to currency exchange from user-related input-output operations. Now these components are enclosed in a separate library, whereas user actions are moved to different application projects. As a result, we achieve the following important goals:

  1. Code readability – It is now much easier to concentrate on various levels of our system. That is, when working on further mechanisms for currency exchange, we will touch only the CurrencyCalc _Lib library. When working on user interface s, we will not need to change anything in the currency exchange library
  2. Code reusability – It is much easier to reuse our CurrencyCalc _Lib library in any other project (with or without a user interface) that requires currency exchange actions. It is also much easier to extend the functionality of the library

Hence, a rule of thumb is not to get involved with the cout and cin objects in classes. Input/output operations should be implemented using the external overloaded operator << and operator >>.

Our code will finally display graphics. To do so, we need to install the FLTK library. Details of this operation are described on the FLTK web page (FLTK 2019) and depend on our OS and installed tools. However, the following short guide can be useful:

  1. Download FLTK from its website and unpack it with the FLTK and CurrencyCalc directories at the same level.
  2. Go into the FLTK directory, and launch cmake . [dot] to build the FLTK project in the current directory.
  3. Open the project, and build the library. When finished, the FLTK/Lib/Debug directory should contain the fltkd.lib library (verify its build date and time).

6.2.8.1 Definition of the CC_GUI Class

Listing 6.8 shows the definition of the CC_GUI class that defines a user interface for our CurrencyCalc application, and Listing 6.9 contains its full implementation. We will briefly explain what these do.

As usual, at the beginning of a new definition, we should inform the compiler about the classes we wish to use. In this example, this is done in two forms. First, on lines [1–5], the header files are included, which contain the definitions of some SL containers as well as the definitions of the XML_CurrencyExchanger and functions defined in the CurrencyCalcPlayGround header. As a result, all classes defined in this unit will know about XML_CurrencyExchanger but not vice versa, as shown by the arrows in Figure 6.9. Lines [8–12] contain the forward declarations. This is the second way to introduce externally defined classes into the scope of the current translation unit . But in this case only pointers and references to these external classes can be used, since their definitions are not provided. Then, on line [14], a namespace shortcut OL_CE is introduced, to increase readability and save on typing.

The definition of the CC_GUI class starts on line [18]. Thanks to the aforementioned forward declarations, on lines [23–26] we can define pointers to components (widgets) of the GUI. After initialization, these will point at the real widget objects. Although using pointers is not the preferable indirect access technique in modern C++, it is the only option in legacy code like the FLTK library. In many real situations, we face the problem of connecting the two worlds; what is important is to do so safely.

Unlike indirect access by pointers, a connection to the XML_CurrencyExchanger component is achieved by the fXML_CurrencyExchanger reference defined on line [33] (see Figure 6.9). Once again, we use the using declaration to locally shorten the external name to XCE. A real XCE object is provided in the class constructor, as shown on line [40]. Its second, default parameter constitutes the default currency for the application. On line [42], the virtual destructor is defined to allow for inheritance from the CC_GUI class.

The following code in CC_GUI contains declarations of interesting member functions, which will play a role in the actions of our GUI interface. We will introduce them briefly here, and their definitions are provided in Listing 6.9. First is the theButtonCallback static function, declared on line [61]. A static member function in a class is defined for the class and not for its objects (Section 4.5). In other words, there is one such function instance regardless of the number of object instances of this class. Hence, it is defined for a class and not for a particular object. This means it can be passed (via a function pointer) to interfaces that expect a bare function pointer. However, static members in a class should be used with the utmost care since they break class safety if used in a multitask environment (thread safety ). So, why do we define such a function in the CC_GUI class? Because this function needs to be provided as a parameter to one of the objects of the FLTK interface as a callback function . It is a function that is called by a component if an action occurs, such as a button click in our case. Fortunately, this is the only role of theButtonCallback: once called, it passes control to the Action_On_Button function declared on line [78]. Unlike theButtonCallback, this function is declared virtual, and it is executed on behalf of the current object (Section 4.11).

Lines [96, 112] declare helper functions. These constitute interfaces between the FLTK legacy code and our classes. Finally, on line [131], a third virtual function Create_GUI is declared. This is the primary function whose role is to join all of the necessary actions to display a running user interface on the computer's screen.

6.2.8.2 Definitions of Members of the CC_GUI Class and the Callback Mechanism

Listing 6.9 starts a series of definitions of the member functions of the CC_GUI class. In this case, we have a long list of FLTK header files, which we omitted here; in the CurCalc_GUI.cpp file, these appear instead of line [1]. Since we plan to use some of the SL containers, lines [4–6] provide useful using declarations. This is almost always a much better approach than simply opening everything for everybody by writing using namespace std (see our comment in Section 3.14.6). Then, on line [9], the CurCalc_GUI.h header is included. It contains the definition introduced in Listing 6.8. On line [10], the StringConverters.h header is included. It contains possible conversions between the std::string and std::wstring objects (Josuttis 2012). These are necessary to adapt FLTK (which operates with std::string ) for our XML_CurrencyExchanger component, which utilizes std::wstring and allows for internationalization.

Lines [13–15] define the class constructor, and line [20] is reserved for the class virtual destructor, for the reasons we have discussed.

Let's take a closer look at the theButtonCallback static function whose definition starts on line [24]. As already mentioned, this is a function that will be registered to the FLTK framework to be called in response to the event of a button click by the user. It accepts two parameters: widgetPtr, which will point at the widget object from FLTK; and obj, which conveys whatever parameter we pass when registering theButtonCallback to the FLTK callback mechanism . And here is a trick: for the future obj, we will pass a pointer this to the CC_GUI object. This way, a double dispatch mechanism will be achieved, as illustrated in Figure 6.10.

Such a communication mechanism based on passing a function through its pointer may be encountered in older legacy code such as FLTK. Other platforms rely on similar ideas of signals and/or message passing, but embedded in the object-oriented framework (Appendix A.4).

Programmatically, to perform the previously described action, the passed obj pointer needs to be cast to the CC_GUI pointer type. This is achieved on line [27] with the reinterpret_cast operator, described in Table 3.15 (GROUP 2). Then, the call to the virtual Action_On_Button occurs on line [30].

The Action_On_Button function, defined on line [35], performs the action associated with currency exchange computations. Since an exception can happen here, the code is encompassed within the try-catch statement (Section 3.13.2.5), starting on line [37]. On line [42], the value to be converted is read as a raw C string. Hence, it needs to be converted, first to std::string and then to double with the stod function, as on line [45]. The next piece of information for the currency exchange is the code of the destination currency. This is retrieved on line [49] by reading information stored along with the menu items of the fChoiceWidget. The Code_2_CurrencyKey function performs the correct conversion from a numerical code into std::wstring, which is necessary to call the interface of our CurrencyExchanger component.

Schematic illustration of the explanation of the callback mechanism in the FLTK framework. A callback function
theButtonCallback is registered that also provides a pointer to its CC_GUI object. The OS notifies FLTK about a mouse click, which FLTK translates into a button click; FLTK launches the registered theButtonCallback function, providing a pointer to the FLTK button widget and previously provided pointer to the CC_GUI object. This object is used to call the virtual function Action_On_Button, which
finally performs our defined action on a button click.

Figure 6.10 Explanation of the callback mechanism in the FLTK framework. A callback function theButtonCallback is registered that also provides a pointer to its CC_GUI object. The OS notifies FLTK about a mouse click, which FLTK translates into a button click; FLTK launches the registered theButtonCallback function, providing a pointer to the FLTK button widget and previously provided pointer to the CC_GUI object. This object is used to call the virtual function Action_On_Button, which finally performs our defined action on a button click.

With two input values in the from_val and to_code local variable s, on line [53] the main conversion function Convert is called from the fXML_CurrencyExchanger object. If it is successful, the results are written to the corresponding widgets on lines [56, 59], after which the main window is redrawn on line [61] to display the results.

If an error occurs somewhere in the computations, it is caught thanks to the catch statement on line [67], and the whole function returns false on line [69].

CurrencyKey_2_Code on line [83] and Code_2_CurrencyKey on line [99] are examples of helper interfacing functions that convert information between the formats specific to the FLTK and CurrencyExchange components. Such interfacing functions or interfacing objects, called adapters, are frequently encountered in real programming due to various platforms and data formats. Here, for instance, on line [85], std::wstring is converted to std::string. Such a conversion is not always possible since std::wstring can contain more information than std::string. But in this case, a compromise is to use only Latin letters, which can usually be converted. Then, the code letters in plain ASCII format are encoded on line [86] into one unsigned long value stored in the menu items of the FLTK menu widget.

Code_2_CurrencyKey performs the opposite action: on lines [102–105], consecutive bytes from the unsigned long value are converted to consecutive code letters of the arr array . These are converted on line [106] to std::string and finally to std::wstring using the to_wstring function.

The main role of the Create_GUI function, which begins on line [111], is to create and position various widgets and associate data with them. To simplify things, the positions and dimensions of the widgets are hardcoded in this function. We start with the dimensions of the main window, as set on lines [115, 116]. But in more advanced programs, it is a good idea to store such data in a different place, called application resources, which can be read and processed separately if necessary. As a result, the code is not cluttered with data, and the data values can be changed without affecting the code. However, the price is additional code to maintain such flexibility.

The main window object main_win of the Fl_Windows class is created on line [118]. On line [120], it is assigned to the pointer fMainWindow. As a result, the function Action_On_Button can redraw the window. This works fine, but we have to be sure the object that is pointed to, such as main_win, exists (i.e. is not destroyed) at least as long as the reference to it, such as fMainWindow, is in use. This is OK here since as long as the GUI is in action, control does not go out of the Create_GUI function. But in larger systems, and especially in parallel programming, maintaining proper object lifetimes is an important and complicated task.

Beginning on line [125], we encounter constants that define the spacing and dimensions of widgets, such as theEdit of type Fl_Float_Input, created and initialized on lines [136, 137]. This widget is responsible for entering and displaying the value of a currency to be exchanged. On line [139], a pointer is assigned to it to facilitate further access.

A little more work is necessary to create and properly initialize the combo widget, whose position is defined starting at line [143]. Once again, it is relative to the position of the theEdit widget. As a result, if we change the position of the previous widgets, the others will be adjusted automatically.

The task we face now is how to set up a combo widget in FLTK, represented by the Fl_Choice class, with the currency names stored in the CurrencyCalc component. If we succeed, then a user who clicks on the combo will see the names of the currencies they can choose from for exchange. We can access our XML_CurrencyExchanger object using the fXML_CurrencyExchanger reference. With this at hand, on line [148], the XCE::CurrencyMap object is accessed and referenced with the curMap object, which should not be empty.

At this point, two vectors are created: on line [152], menuItemVec to store FLTK's Fl_Menu_Item objects; and on line [155], menuItemTexts to store the std::string objects with the currency names. The problem is that the Fl_Menu_Item object stores only a C-style pointer to the text, whereas the text itself has to be stored somewhere else. Again, we face the problem of synchronizing object lifetimes . In other words, we have to ensure that a pointer stored by a menu item always points at a valid object. Another problem is that menu items expect ASCII text, whereas our CurrencyExchanged object holds Unicode characters (i.e. std::wstring ). Hence, on lines [157–159], we copy the currency names from curMap to the menuItemTexts vector, at the same time transforming std::wstring objects to std::string. Since menuItemTexts is empty at first, we have to use the back_inserter function on line [157]. On the other hand, text conversion is done with the lambda function (Section 3.14.6) defined on lines [158, 159].

Then, on lines [162–165], the Fl_Menu_Item objects are created, feeding their constructors with the pointer to the previously prepared currency name and currency code returned on line [165] by the CurrencyKey_2_Code function. Note that on line [162], to access the key and currency simultaneously, we use structured binding (Section 3.17). However, only key is used in the loop.

Finally, menuItemVec has to be delimited with a sentinel object to indicate its end. Such sentinels are a common technique to indicate the end of a series of objects. The simplest examples are the C-like strings ending with the value 0 to indicate an end-of-string (EOS), as discussed in Appendix A.2.2. Here, on line [169], a sentinel object (the one with all its members set to 0) is pushed back.

Starting at line [172], the theChoiceWidget object of type Fl_Choice is created and initialized with the carefully created menuItemVec object.

In the last part of the Create_GUI function, starting on line [180], the theStaticEdit object is created. Then, lines [195–202] create theButton, of Fl_Return_Button type. To set up the callback mechanism s on a button click, on line [205], theButton is connected to the theButtonCallback and this objects (Figure 6.10).

All the widgets have been set up, so we are almost done. Lines [209, 210, 214] let the FLTK framework render them. After their execution, the window appears.

6.2.8.3 Launching the GUI-Based Application

The last thing to do is to launch our application in the main function, as shown in the Listing 6.10.

Lines [1–5] have the necessary #include s. The definition of the main function starts on line [8]. CloseConsoleWindow is the first function, called on line [10]; its role is to hide the black terminal window. (Its code can be found in the source file.) Then, line [12] creates an alias fs to the filesystem namespace. This is used on lines [14, 15] to create a path to the initialization file, which is expected in the same directory as the executed application, conveyed by the current_path function.

On line [18], the curExchObj object of the XML_CurrExch_Optional type is returned. If the operation was successful, it contains a valid XML_CurrencyExchanger object. But if for some reason the curExchObj optional object is empty, such as if there is no initialization file or no Internet connection, as verified on line [19], the application exits with a short message issued by the fl_alert on line [20]. Note that to shorten the code, the return statement yields the error code -1, which is provided as the rightmost value by the comma ( , ) operator (see GROUP 16, Table 3.15).

Finally, on line [23], the CC_GUI object is created and linked with the XML_CurrencyExchanger object by dereferencing curExchObj. The GUI is launched on line [26]; the results are shown in Figure 6.11. The last call returns if the application is closed by a user.

Snapshot depicts the main window of the CurrencyCalc application rendered by the FLTK interface.

Figure 6.11 Main window of the CurrencyCalc application rendered by the FLTK interface.

In this section, we learned about the following:

  1. Creating simple GUI applications with the FLTK library
  2. Using the callback mechanism based on a simple function pointer registration
  3. Managing object lifetime and object inter-dependencies
  4. Writing interface functions to legacy code

6.3 System Clocks and Time Measurements

Time measurement is one of the most intrinsic features of every microprocessor system. Precise timing is necessary for all digital circuits, as well as for task scheduling by an OS. We also saw current time measurement in the DisplayCurrentTime function (Section 3.18.5). But the facilities available to access and measure time depend on the platform (i.e. the OS), as well as on the programming language and its libraries. C++ offers several time-related objects, mostly in the std::chrono namespace. The most frequently used members and their properties are listed in Table 6.4. More detailed information can be found in the literature (Stroustrup 2013) or on the Web (Cppreference.com 2019a).

Table 6.4 Basic time operations from the std::chrono namespace (in the headers <ctime> and <chrono> ).

Object Description Examples
std::chrono: : system_clock A wall clock time from the system-wide realtime clock. It may not be monotonic (time can be adjusted at any moment). The following shows a function to return a string with the current date and time:
1   // Returns a string with current date & time
2   string  GetCurrentTime( void )
3   {
4      using timer = std::chrono::system_clock;   // Short name for a timer
5
6      std::time_t time_point = timer::to_time_t( timer::now() );
7
8      return std::ctime( & time_point );        // Converts to string
9   }
On line [4], various types of clock can be chosen, as shown at left. The choice of a clock is application-driven: for a precise time duration, use high_resolution_clock . For a time display, system_clock can be used. The functions now and to_time_t are members of a chosen clock class. The first function records a current time point. The second function converts it to an older object std::time_t, which is easier to convert to a string via construction of the std::ctime object on line [8]. The result of calling GetCurrentTime can look like the following:
Wed Jan 2 23:45:46 2019
std::chrono: : steady_clock A monotonic clock (not adjustable). The time points of this clock cannot decrease as physical time progresses. Hence, this clock is suitable for measuring intervals.
std::chrono: : high_resolution_clock The clock with the shortest tick period available on a platform.a
Clock members:
image
std::time_t An arithmetic type to hold time as a number of seconds since 00:00, Jan 1 1970 UTC.
std::ctime Used to convert time expressed in epochs to calendar local time.
std::chrono::
duration
A class to represent time intervals. The function fun_perform_timer launches a function supplied as its parameter and measures its performance time. To pass the function, we use the template class std::function [1, 4]. In this version, the mechanism is set to process functions taking an int parameter and not returning a value. Hence, void( int ) is set as a template parameter of std::function (Section 3.14.9). But C++ is flexible enough to convey a function with any set of parameters, as we will show.
 1  using fun_void_int = std::function< void( int ) >;
 2
 3  // Launches fun( fun_param ) and measures its time
 4  void fun_perform_timer( fun_void_int fun, int fun_param )
 5  {
 6   using timer = std::chrono::high_resolution_clock;  // == steady_clock
 7
 8   // ----------------------------
 9   auto timer_start = timer::now();    // Catch start of time
10   
11   fun( fun_param );                   // run function here …
12
13   auto timer_end = timer::now();     // Catch the end of time
14     // ----------------------------
15
16   std::chrono::duration< double > comp_period = timer_end - timer_start;
17
18  cout << "Computation time = " << comp_period.count() << " [s]" << endl;
19
20   cout << "Computation time = " << std::chrono::duration_cast
     < std::chrono::seconds >( comp_period ).count() << " [is]" << endl;
21
22   cout << "Computation time = " << std::chrono::duration_cast
     < std::chrono::microseconds >( comp_period ).count()
     << " [us]" << endl;
23  }
std::chrono::
duration_cast
Converts a time duration to other units. On line [6], high_resolution_clock is selected. The start time point is caught on line [9], the test function is called on line [11], and then the end time is caught on line [13]. A time duration object is created on line [16]. This is displayed on lines [18, 20, 22] in different units.The previous time-measuring function is verified in the following context:
24   void arithm_fun( int p ) // A test function
25   {
26      for( auto a : range( p ) )
27          cout << a * a << " ";
28   }
29
30   void fun_perform_timer_test( void )
31   {
32       fun_perform_timer( arithm_fun, 23 ); // Call a measure with p = 23
33   }
After calling fun_perform_timer_test on a test computer, the last three lines of the output look like this:
Computation time = 0.00313335 [s]
Computation time = 0 [is]
Computation time = 3133 [us]

a In Visual Studio 2019 on Windows 10, this is the same as std::chrono ::steady_clock .

6.4 (image) Time Measurement for Function Execution

Measuring the time required for function execution looks simple – we need to mark start and end times and compute their difference. We will show how to write a wrapper function that encompasses these operations for any function we wish to measure. However, precisely measuring the time consumed by the execution of a few lines of code or a function is not as easy as it might look at first glance. First of all, a function does not run by itself; other OS processes are running in the background, to operate the mouse and connect to the Internet, for example. The background conditions may change from run to run, so in practice, we measure the overall time taken to execute our function and whatever runs with it.

In this section, we extend the fun_perform_timer function presented in Table 6.4. The main issue with this function is that it can measure the execution time of any function as long as the function takes an int as a parameter and returns void. But we'd like to measure the execution time of any function – a function that can take and return whatever arguments it wants. C++ comes with mechanisms that allow for passing an arbitrary number of arguments of arbitrary types.

In this section, we will learn about:

  1. Variadic arguments (bonds)
  2. Forwarding universal references with std::forward < decltype ( ) >

As we know, since C++17, a namespace can be created inside another namespace, such as in Listing 6.11, lines [1, 4]. In this nested namespace, a constant kIterations is defined on line [6] that will control the number of function calls during its measurement of execution time s. The definition of the measureFuncAvTiming lambda function for execution time measurement starts on line [22]. The kIteration constant is passed to the iter in the caption. Then two interesting arguments are passed:

  1. func – A forward (universal) reference conveying a function object to be executed iter number of times to measure performance time
  2. func_params – A variadic list of parameters that will be passed to the func. The variadic list of parameters to a function or template lets us pass any number of parameters

A compile-time invariant is checked on line [24]. It ensures that the number of iterations is larger than zero. However, as we already know, unlike the familiar assert fuse, which is checked at runtime, static_assert is verified during compilation. So, if the condition of static_assert evaluates to false, an error will be reported, and compilation will stop.

On line [28], the current timestamp is caught by the time_start object by calling the timer::now function. This and other time-related functions are summarized in Section 6.3.

The function undergoing the performance test is launched iter number of times, as orchestrated by the range for loop on line [31]. Its range is provided by the auxiliary range class, which will be presented in Section 6.5.

On lines [32, 34], func is executed with func_params as its arguments. However, many variants of func and func_params are possible; therefore, we use std::forward with a deduced type of func, obtained by a call to decltype < func >. A similar construction is used on line [34] to pass the variadic parameters. This is perfect forwarding, as explained in Section 4.5.1.

Briefly, remember that any function func can be passed with any number of arguments of any type. What we want is to pass each lvalue-ref argument as an lvalue-ref, and each rvalue-ref as an rvalue-ref. Perfect forwarding does exactly this (Meyers 2014). Otherwise, the rvalue parameters could be passed through lvalue-refs, what we want to avoid.

By taking the difference between the end and start timestamps, the measureFuncAvTiming function returns the duration in milliseconds of an average call to func. A value in milliseconds or other time measurement unit can be obtained by calling the std::chrono ::duration_cast operator (Section 6.3).

What remains is to write a simple function like the following

and, with the following code, measure its average execution time per single call:

On line [62], the function funTimer, which is the same as the CppBook ::LTimer::measureFuncAvTiming lambda function but with a shorter name, launches MathFun_RandTest with its parameters several times. MathFun_RandTest simply computes an average of the 1 234 567 random numbers drawn from the Bernoulli distribution. For this purpose, the Mersenne random engine is initialized on line [49] with the value returned by the functional operator called on behalf of the random_device temporary object (see Section 3.17.3). Finally, the last average value (should be close to p, right?) and the average single execution timing are printed on the screen, as follows:

As we can appreciate, the returned value 0.699677 is close to the 0.7 set on line [50].

6.5 Range Class

C++ 11 introduced a very useful new form of the for loop, the range for, as discussed in Section 3.13.2.2. In conjunction with the auto keyword (Section 3.7), it greatly facilitates writing loops, especially when accessing elements of the containers, such as std::vector or and std::list . It is now very easy to traverse each element from a given set of values, such as

But things get slightly more cumbersome if we wish to traverse a set of values generated by a more complicated formula. For example, starting from 0.0, we wish to generate all consecutive values not exceeding 100.0 and spread by 2.56. With the “standard” for loop, this is as simple as writing the following lines of code:

However, it would be much more convenient if we could harness the new form of the for loop and still be able to generate all types of series, like the previous one. For this purpose, we want an iterator with the following syntax

where from stands for a starting value, to is the end value (exclusive), and step is a value that increases for each step in the iteration. In addition, shortcuts can be introduced; for example, to step by 1, we can use

or

to generate all values in the set [ 0, to ) with a step of 1. Interestingly, SL does not provide such a helper (in some frameworks it is called a linspace). We can use only std::iota to fill an SL container with consecutive values incremented by 1 each time. To remedy this, the range class was developed. Before we go to its definition, let's see what it can do by examining the examples in Listing 6.12.

On line [1], we include the range class via its header file. On lines [6–11], a file is created; and if that happens successfully, some text is put into it. Then, a vector of float s is created and initialized with the consecutive values 0, …, 255 (inclusive). This is done by assigning to the vector the object range( 256.0f ). This works since range implements a conversion operator to std::vector, as we will discuss. Finally, on line [14], the entire vector is copied, i.e. streamed out to the created file represented by the ostr object. In this case, the output looks as follows ([…] inserted here to save space):

Let's move to the second example, shown on lines [17–21]:

The main difference is the range generated by calling range<>( 0, 100, 13 ). In this case, the values 0, 13, 26, …, 91 are copied to the v_size_t vector. A slight difference is using the range<> type instead of the pure range. Adding the empty type <> allows us to use range with its default type for iterations: size_t. The output for the second example is as follows:

In the next example, we use the simple range object assigned to the vvv object with the auto type. In the consecutive for loop, another auto variable a iterates through all values of vvv :

The example shows the operation of the range in two loops, iterating through the float and int values. The first loop traverses the values from 0 up to but not including 256.0, with a step of 16.5. The inner loop goes from the integer −2 up to +16 with a step of 3. In other words, at each step, the value of step is added to the current value of the internal state of the range iterator:

The shortened output of the fourth example is as follows:

Let's now examine the internals of the range class, shown in Listing 6.13. It is a template class, where a template parameter T defines the type of an iterator. Thus, if we wish to traverse integer values, we can substitute int for T ; for floating-point values, T can be float or double, as shown in the previous examples. But the default value is size_t, which denotes a type of nonnegative integer values with a range that can represent the size of any object in bytes (Stroustrup 2013).

range contains three private data members, defined on line [4]. They represent the constant start, end, and step values for the range. Then, on lines [8–30], two constructors are defined; due to the default 1 for the step parameter, we can create the three types of range objects, as already described. Note that for proper operation of the range class, the step parameter cannot be 0; otherwise, the iterations will be infinite. This, in addition to notes in the class comments, is checked in the build version of the software using the assert command. If it happens at runtime, a std::out_of_range exception is thrown.

The important part of the range class is its nested private class range_iter . It resembles the proxy pattern defined for the TLongNumberFor class in Section 4.15.5.1. But in this case, range_iter provides a suitable interface required for the iterators. When constructed on line [39], it accepts two values, which are then copied to its fVal and kSteps member data. fVal stores the current state (a “position”) of an iterator. kSteps, being constant, is used when an iterator is progressing to the next position. range_iter overloads a number of operators. The first two, defined on lines [40, 41], are simple conversions to T and its reference, which is a common type for the range and range_iter classes. The overloaded dereferencing operator, shown on line [42], lets us read values. On the other hand, the increment operator on line [43] lets us advance the iterator. The most intriguing is the comparison operator defined on lines [51–58]. Its operation is divided into two directions: negative and positive values of the kStep member. In either case, true is returned if and only if our fVal state is before than fVal in the object we compare to (e.g. the end iterator), in accordance with the direction of iterations. Also note that for iteration with floating-point values, we cannot use the simple equation condition.

range_iter objects are returned by two members of the range class: the begin (line [61]) and end (line [62]) functions, which are the standard way of expressing the beginning and end of any of the SL collections. begin sets the internal state of the iterator to the kFrom value, whereas end sets its state to the kEnd value of the range class.

Here we will make a small digression on iterators . These are ubiquitous programming components whose primary role is to traverse collections of data, whatever it may be. Basically, there are two types of iterators: internal and external . We have already seen and used SL iterators that let us traverse vectors, strings, and maps, for instance. These are external iterators whose primary benefit is that they can be nested over a collection. On the other hand, the range_iter inner class is an example of an internal iterator. Its role here is as expected – it serves as a return (proxy ) value to the interface composed of the begin and end members, which lets us use the range class, for instance, in the range for loop (Section 3.13.2.2).

The last overloaded operator is a conversion to vector< T >, defined in range on lines [67–73]. With the help of the two lambda functions it lets us convert the range class implicitly to std::vector with consecutive values as generated by the iteration process. Hence, range can also be used to initialize vectors with iterated values, as shown in the previous examples.

6.5.1 Functional Programming and the Ranges Library

C++ has is a more profound concept of a range, which differs from what we presented in the previous section. We talk in this section about ranges, i.e. the relatively new programming method aimed mostly at improving chained operations on collections using pipes . We have seen many SL operations in which the common interface is a pair of iterators indicating the beginning and end of the elements to be processed, i.e. the range: for instance, std::find, std::copy, and std::sort, presented in Table 3.6. So, what is wrong with passing the range as two separate values? The main obstacle is encountered when one function tries to pass another a range that is the result of an operation. In many contexts, it would be beneficial to have one structure containing all the information instead of two separate iterators passed as the result of one function as an input to the other function without creating local objects to store the intermediate results. For example, the following code

in the range framework could be simplified as follows:

Such a compact structure is also less error-prone: for example, it disallows erroneously passing iterators to different collections. Such functionality has been developed in the form of libraries. One of them is range-v3, developed by Eric Niebler (2019), which was recently accepted to become part of the new standard C++20. Together with the C++ concepts, it will allow us to write clearer, more compact code. Some exercises with these new techniques are included in the “Questions and Exercises” section of this chapter.

More information on functional programming, ranges, and other inspiring new programming methods in C++ can be found in the book by Ivan Čukić (2019).

6.6 Example Project – Parsing Expressions

In Section 3.19, we discussed C++ operators and expressions . We have used them extensively as building blocks of larger language constructs. But how does the compiler, which is itself a program, recognize whether a+b*c is a valid expression and, if so, how it should be interpreted? Certainly, compilers constitute the art of software design; but in this section, we will look at how to write a simple interpreter that can tell if an expression is correct and another interpreter that can evaluate the value of an expression. For this purpose, we will examine methods of formal language description called formal grammars . Then, we will see how to parse an expression in order to tell if it does or does not comply with the grammar rules. We will also learn how to build special data structures that represent expressions in the form of parsing (syntax) trees . Finally, we will learn how to traverse such a tree with a visitor pattern to draw a tree structure or to evaluate the value of an expression. The discussion will allow us to practice previously learned and new programming techniques that involve object-oriented features of modern C++. Summarizing, in this section, we will learn:

  1. What a formal grammar is, and how to describe arithmetic expressions
  2. How to get rid of left recursion in a grammar to facilitate its implementation with the interpreter design pattern
  3. What a parsing tree is and how it can be implemented with the composite design pattern
  4. How to extend the interpreter to build a parsing tree that reflects the structure of an expression
  5. How to traverse a tree with the visitor pattern to evaluate an expression (double-dispatch technique)
  6. How to build a stack data structure to store smart pointers
  7. Non-owning access to objects with pointers and owning access through the smart pointers
  8. Reverse Polish notation (RPN) to facilitate expression evaluation
  9. The prototype pattern to clone compound objects
  10. Runtime type identification (RTTI) in action with dynamic_cast

This plan includes many design patterns, of which the composite and visitor can be especially useful in applications not necessarily related to computer language analyzers. Let's start with a short introduction to the methods of defining grammatical rules that govern computer language constructs.

6.6.1 Defining Language Expressions with Formal Grammar Rules

The C++ operators were presented in Section 3.19. We observed that in addition to the function of an operator, what is important is its precedence as well as its associativity. These are arbitrarily defined by the standard to be unambiguous and to obey the rules indicated in Table 3.15. However, how can these and other language rules be described formally? For this purpose, formal grammars have been devised. These are necessary to obtain a common language definition that us to implement compilers that operate as expected by the users. The subjects of formal grammars and compiler design constitute the core of computer science; they are also very advanced topics treated in many papers and books (Aho et al. 2006). They are outside the scope of this book, but implementing a simple interpreter that copes with basic expressions will demonstrate many useful programming techniques.

A grammar is a collection of grammar productions described using terminal and nonterminal symbols.3 The following productions

let us formally describe any expression containing single digits and four binary operators, such as the following

as well as to tell whether an expression like this one fulfills the rules:

If so, then we say that the expression is compliant with the syntax. Otherwise, we have a syntax error . In the previous grammar productions, red symbols denote terminal symbols, i.e. those that are directly typed on a terminal (hence their name). Nonterminal symbols are denoted with black capital letters. They correspond to the formal, or abstract, concepts that form the grammar. Finally, the | symbol denotes a logical alternative between the productions. Hence, D -> 0 | 1 means either 0 or 1 fulfills the D production, and so on.

These grammar rules reflect operator precedence and associativity . The process of fitting grammar rules to an expression is called parsing. An example of parsing one of the previous expressions with the previous grammar rules is shown in Figure 6.12.

Parsing is preceded by a lexical analysis that splits an expression into meaningful language categories such as numbers and operators called tokens . There are special programs, such as lex and flex, that do the job. But the good news is that for most simple grammars, any regular expression engine, such as the C++ regex library discussed in Section 6.2.1, can be used if provided with proper pattern definitions.

We are ready to implement a program that performs the lexical analysis and parsing in accordance with the previous grammar rules. In the first attempt, we may consider implementing nonterminal symbols as functions. On the other hand, the terminal symbols can be directly matched from a string representing the input expression. Hence, the first rule from the expression group could be implemented as follows:

Schematic illustration of the parsing the expression 3+(4/5-6). An expression is syntax-compliant if the entire expression can be completely derived from the grammar rules. Otherwise there is a syntax error. The grammar reflects operator precedence and associativity. Parsing is preceded by lexical analysis, which splits an expression
into meaningful language categories such as numbers and operators.

Figure 6.12 Parsing the expression 3+(4/5-6). An expression is syntax-compliant if the entire expression can be completely derived from the grammar rules. Otherwise there is a syntax error . The grammar reflects operator precedence and associativity . Parsing is preceded by lexical analysis, which splits an expression into meaningful language categories such as numbers and operators.

Everything would be fine except that the previous function E would break almost immediately due to infinite recursive call to itself in the third line. This is an inherent problem arising from the structure of our simple grammar . The way around it is to transform the rules in order to eliminate the left recursion. The new version of the grammar, with no left recursion, looks as follows:

Grammar transformation techniques can be found in the books on formal grammars, such as the classical Compilers: Principles, Techniques, and Tools by Aho et al. (2006) and Modern Compiler Design by Grune et al. (2012). Nevertheless, it is easy to experimentally verify that the two sets of rules are equivalent: that is, that they describe the same types of expressions . However, eliminating the left recursion introduced two more productions, as well as the mystery e symbol. The latter stands for the epsilon production, which simply boils down to no action, as we will observe soon when analyzing the implementation.

6.6.2 Design of the Expression-Processing Framework

Equipped with the short introduction to grammars, we can draw a complete picture of the components building up our interpreter framework. This is shown in Figure 6.13.

Schematic illustration of the steps of expression interpretation and post-processing. An expression is lexically analyzed and put through the parser, which, based on the grammar rules, checks the syntax and builds the syntax tree. The tree can then be traversed by visitors such as the printing and evaluating visitors. In effect, we can
obtain a view of the tree on the screen, or simply its value.

Figure 6.13 Steps of expression interpretation and post-processing. An expression is lexically analyzed and put through the parser, which, based on the grammar rules, checks the syntax and builds the syntax tree. The tree can then be traversed by visitors such as the printing and evaluating visitors. In effect, we can obtain a view of the tree on the screen, or simply its value.

As mentioned earlier, the interpreter consists of the lexical analyzer and parser . Thanks to our simple grammar, these will be joined together under the roof of a simple interpreter design pattern. We will implement two of them: the base class will be responsible for checking the syntax of an expression, and the other class will do the same but building the syntax, or parsing, tree during the grammar checking. A syntax tree is a separate data structure that will be implemented using the composite design pattern. The structure of a syntax tree exactly reflects the hierarchy and associativity of all the subexpressions . Hence, a syntax tree can be used to analyze the expression further, evaluate its value, simplify the expression, print its structure, or even generate assembly code for its implementation. To achieve these tasks, we will refer to another software component – the visitor design pattern. Together with the composite, these two implement double dispatching .4 That is, there can be many different visitors that know how to traverse and what to do in different nodes of a tree. On the other hand, the nodes are aware of the visitor(s) and allow their operation. Such cooperation is very useful not only for syntax trees but for many programming tasks. Imagine, for instance, programming a set of tasks to be performed on a selected subtree of the filesystem, such as searching for a file with a given context, accumulating the total size of all files in the subtree, etc. These can also be implemented using the composite-visitor duet.

Note that remembering examples of design patterns can help with spotting their potential application in many other software components . This, in turn, greatly facilitates their implementation since design patterns provide ready solutions to, or at least approximations of, many common programming problems encountered in various contexts.

6.6.3 The First Expression Interpreter

The first component shown in Figure 6.13 is implemented in the TSimpleExpressionInterpreter class, shown in Listing 6.14. Its main actions consist of realizing the productions from the modified grammar with no left recursion, as presented in the previous section. The productions are implemented as the virtual member functions of the TSimpleExpressionInterpreter class, whose definition starts on line [3]. But the class has interesting features that we will discuss as well.

On lines [8, 9], the two private data members are defined. The first, fExpressionString, simply stores the input expression. The other, fCurrPos, contains the index of a character in fExpressionString, which is currently matched. Note that the type of fCurrPos is set to be compatible with the size_type expected by the std::string object.

On line [18], the default constructor is defined, whereas on line [19], we see the declaration of the virtual destructor to allow for class inheritance . Again, the = default is used to instruct the compiler to put its default implementation, which introduces the virtual table in this case.

As we know, the default implementations can also be ordered by the compiler for the copy constructor, as well as for the assignment operator . This is done by typing = default on lines [21] and [23].

Also, the moving versions of the copy constructor and an assignment operator can be defaulted, as follows.

These instruct the compiler to automatically apply move semantics on all data members, if possible. To do so, the noexcept keyword has to be inserted to indicate that the moving operations do not throw any exceptions.

The class TSimpleExpressionInterpreter contains the overloaded operator () function, defined on line [47]. Hence, TSimpleExpressionInterpreter is a functional object (Section 6.1). Its main action takes place on line [52] and consists of calling the Expr_Fun member function. Soon we will see that this is the function corresponding to the expression produced by the grammar . If it returns true, then the second condition is checked to verify if the entire input expression has been processed. The fulfillment of the two means the expression has passed the grammar rules. Otherwise, a syntax error occurred, and false is returned.

Before we analyze the grammar production functions, let's take a look at the Match function, defined on line [70]. This is probably the simplest possible lexical analyzer, which compares a parameter c with a character at the current position in the fExpressionString, as indicated by the fCurrPos member. If these two are the same, then Match returns true, as shown on line [78]. However, before that, fCurrPos is advanced to the next position in the input string. In all other cases, Match returns false.

Now we encounter a number of virtual member functions that directly implement the grammar rules with no left recursion. The previously mentioned main production is implemented by Expr_Fun, defined on line [89]. On line [91], it simply calls Term_Fun and Expr_Prime_Fun, exactly as defined in the E -> T E' production. A true value, meaning a syntactically correct expression, is returned only if the two components have been fulfilled.

On the other hand, the Expr_Prime_Fun function, defined on line [96], is responsible for checking the second grammar production: E' -> + T E' | - T E' | e. Since there are alternative rules, separated by the | operator, each branch is implemented as a separate branch of the if-else statements on lines [98, 104]. Considering only the first one, + T E', if + is matched on line [98], then the Term_Fun and Expr_Prime_Fun functions are called in order, as defined in the production. Note that there is also a recursive call to the Expr_Prime_Fun function. But this time we avoid an infinite calling loop, which was the main obstacle when trying to implement the original version of the grammar rules. If the two previous functions return true, then that production is fulfilled, and true is returned to the caller. The final empty production, denoted by e, is implemented on line [110] simply by returning true. The other productions are implemented exactly in the same manner, directly following the grammar rules.

Although the grammar allows any expression, the lexical analyzer has been simplified for clarity of discussion. Therefore, the first module to be extended is the lexical analyzer – this task is left as an exercise (number 15 in the “Questions and Exercises” section).

6.6.4 Building the Syntax Tree with the Composite Design Pattern

Passing an expression through the TSimpleExpressionInterpreter interpreter returns a binary answer indicating whether the expression does or does not fulfill the grammar rules. However, even more useful would be to generate a data structure whose inherent structure reflects the formation of the input expression. Such a structure is called a syntax tree, and examples are presented in Figure 6.14.

Schematic illustration of the syntax trees after parsing the expressions (a) 3+4*5 and (b)(1+2)*(3+(4+5))/(6-7). Each tree reflects the rules of operator precedence and associativity with no parentheses.

Figure 6.14 Syntax trees after parsing the expressions (a) 3+4*5 and (b) (1+2)*(3+(4+5))/(6-7). Each tree reflects the rules of operator precedence and associativity with no parentheses.

In the expression shown in Figure 6.14b, note the construction of the (3+(4+5)) subexpression. Since the + operator is left-associative, without the parentheses 3+4 is executed first, and then the result is added to 5. However, in our case we have this order set differently, as reflected by the tree stucture. The entire factor (3+(4+5)) is then taken to the left side to be multiplied by (1+2), due to the left associativity of the * operator.

Before we find an algorithm for building syntax trees from expressions, let's think of the building blocks for their construction. We have two different categories of nodes: leaf nodes storing numeric values (for simplicity, we assume only digits); and binary nodes, which have two links to the left and right subtrees.

6.6.4.1 The Composite Design Pattern to Define the Nodes of a Tree

Figure 6.15 presents a UML class diagram with the hierarchy of node classes that will be used to build our syntax trees.

This class relation is known as the composite design pattern (Gamma et al. 1994). We discussed the composition relation in Section 4.9. Obviously there are common points between the two concepts, but the main difference is that the simple composition relation does not assume any “familial” relation between classes, whereas in the composite design pattern the most characteristic feature is that the composition happens between classes from the same hierarchy. Concretely, BinOperator is derived from the TNode class; hence it is TNode -like. On the other hand, it also contains two TNode class exemplars, fLeftChild and fRightChild. These, in turn, can point at any possible object from the TNode hierarchy, whether a ValueLeafNode or even another BinOperator – OK, maybe not directly BinOperator, since it is a pure virtual class but one of its derived class es, such as PlusOperator. Such constructions are common in computer science. An example is the filesystem discussed in Section 6.2.7. In this case, a node can be either a file or a directory (a folder). A directory contains other nodes: files and/or directories, and so on. Another example is a command framework where simple commands coexist with macro commands. Macros, being commands by themselves, consist of a series of prerecorded commands to be executed, and so on.

Schematic illustration of the UML class diagram of the composite design pattern containing building blocks for the parsing tree. The composite structure arises if the two relations can be observed in a hierarchy. The first
is inheritance, outlined with a green circle. The second is the composition between objects in the same hierarchy, outlined by a red circle. The composite structure is common and can be used to express relations
between objects in a file-directory tree or between simple and macro commands in a command framework,
for instance.

Figure 6.15 UML class diagram of the composite design pattern containing building blocks for the parsing tree . The composite structure arises if the two relations can be observed in a hierarchy. The first is inheritance, outlined with a green circle. The second is the composition between objects in the same hierarchy, outlined by a red circle. The composite structure is common and can be used to express relations between objects in a file-directory tree or between simple and macro commands in a command framework, for instance.

6.6.4.2 Implementation of the TNode Hierarchy and Cooperation with Visitors

Before we dive into the code, we will explain a special relation between all the nodes and the visitor design pattern that will be used to traverse the tree. The two are interrelated: nodes call visitors, and visitors call functions from the nodes. Thus the two groups know about each other. But we cannot simply add their header files to each other – one has to be first. To overcome this problem, something needs to be defined first. In C++, this can be solved by properly arranging the header files and by using only prior declarations of the interrelated classes, with their full definitions provided elsewhere. In the Nodes.h header shown in Listing 6.15, we see such declarations at the beginning on lines [9, 13]. These are the definitions of TVisitor and TNode. The first will be used later. But why do we need a declaration of TNode if the definition follows in a few lines? This is necessary to define two important and different pointer types. The first, defined on line [17], is Node_UP, which is an alias to the smart pointer std::unique_ptr that defines the owning relation . That is, the Node_UP type will be used to hold pointers to TNode -like objects, which they are responsible for deleting at the proper time, as discussed in Section 5.3. On the other hand, on line [20], a simple NodePtr non-owning pointer to TNode is defined. This type is used whenever access to a TNode object is required to perform an action, access its data, and so on, but not to delete it.

The complete definition of the TNode base class starts on line [28]. This is a pure virtual class since it contains pure virtual member functions that define an abstract interface for the entire hierarchy, as shown in Figure 6.15. Hence, it is not possible to create a TNode object. This fact is reflected by the class constructor placed in the protected section of the class on line [36]. By using = default, we instruct the compiler to provide its default implementation. This makes our intentions clearer than simply providing an empty definition like {}. As we know, when designing a class hierarchy, it is important to define the base class destructor as virtual (Section 4.11). Here it is declared as pure virtual on line [44] by adding the = 0 specifier. In this case, though, it is redundant, since there are other pure virtual functions . Nevertheless, declaring a class that has no functions as pure virtual is a useful technique. Basically, for pure virtual functions, we do not even need to provide definitions (body). But some of them will be called anyway. This is the case for the base class destructor ~TNode ; therefore, it needs a complete definition as well. Although it might seem that on line [44] we could easily add the empty {} after = 0, it is forbidden by the standard (though accepted by some compilers). Therefore, this empty definition goes in the Node.cpp source file. Fortunately, the linker reports whether a pure virtual function needs a full body – i.e. if a body is required but cannot be found.

In the TNode class and its hierarchy are two distinctive pure virtual functions whose roles we need to explain, as follows:

  1. The Accept pure virtual function, declared on line [53], provides a common interface for the broad group of visitors. These are objects that will let us traverse a tree by visiting each of its nodes in a certain order. Accept is called, providing one of the visitor objects as its parameter. This, in turn, is passed alongside the tree to perform its job, whatever it is. The good news is that due to dynamic polymorphism, one Accept per TNode class is enough to accept the entire hierarchy of various visitors, even those that will be devised in the future. We will return to this issue when discussing the visitor hierarchy
  2. The purpose of the second pure virtual function Clone, declared on line [59], is to define a mechanism for creating a complete duplicate of any object from the TNode hierarchy based on a simple pointer to its prototype . Such behavior can be achieved by providing Clone in each class derived from the TNode. Due to its behavior, this mechanism is called a prototype design pattern (Gamma et al. 1994)

We will see these two in all classes derived from TNode.

6.6.4.3 Implementation of the ValueLeafNode Class

Now let's visit another member of the TNode hierarchy. This is the template ValueLeafNode class, whose definition starts in Listing 6.16, lines [64, 65]. It is responsible for storing the value fVal of type V, as shown on line [70]. Objects of this class represent leaves, such as those shown in Figure 6.14. Being a template class makes it easy to define leaves that store non-numeric values. This can be an abstract name of a variable whose value is stored in a look-up table, for instance.

On lines [74–82], we see a series of definitions for the constructors and assignment operator s of the ValueLeafNode class. Since we do not know what type will be used for V in the future, the move constructor and assignment operator are also added with the default implementation. This means the compiler will try to use move operations for the fVal object, if the V class has move operations and they are not throwing exceptions. For this purpose, our implementations of the move semantics are declared noexcept . ValueLeafNode is not a pure virtual class, so in the source Visitors.cpp file are full implementations of the Accept and Clone virtual functions declared on lines [90] and [95].

Finally, the NumberLeafNode alias is defined as the ValueLeafNode class instantiated with the double type. We can extend this idea when implementing the tree with leaves holding other types.

6.6.4.4 Implementation of the BinOperator Class

The BinOperator class defined in Listing 6.17, line [103], constitutes a base class for all the binary operators such as PlusOperator. The most characteristic part of BinOperator is that it owns two branches representing subexpressions of the syntax tree. These are held in the owning pointers fLeftChild and fRightChild, defined on lines [108, 109]. As alluded to previously, owning the two subexpressions, each the TNode type, makes the entire class setup a composite design pattern. But such ownership makes its copy operations much more difficult, as we will discuss shortly. Fortunately, due to the automation of the smart pointers, we do not need to worry about object disposal. The parametric constructor defined on line [114] takes two smart pointers, which are then adopted by moving their held objects to the fLeftChild and fRightChild members. Moving the objects between the unique_ptrs is a standard technique, as discussed in Section 5.3.

However, having the two objects representing subexpressions makes the copying operations nontrivial. This problem was discussed in Section 4.6.3. If we are to copy such an object, how do we copy objects held by the fLeftChild and fRightChild members? Probably the best way is to make a complete image, which doing so involves creating a series of objects. We will return to this problem when discussing the Clone member. To disallow a shallow copy, the copy and move constructor s, as well as the assignment operator s, are declared = delete, as shown on lines [124–128]. Such a declaration in a member function disallows its usage.

Lines [138, 139] define the AdoptLeftChild and AdoptRightChild functions. The role of each is similar to the ubiquitous class setters, with an important difference: they take over the objects provided by the smart pointers passed as their arguments. To reflect this fact, the names of these functions start with the Adopt prefix. On the other hand, names of functions giving up their held objects usually start with the Orphan prefix. Although not required, such naming conventions systematically increase code readability.

On the other hand, GetLeftChild and GetRightChild, defined on lines [132–134], are simple constant getters that return the “ordinary” non-owning pointer NodePtr.

Finally, notice that similar to the base class TNode, BinOperator defines two pure virtual functions: Accept on line [143] and Clone on line [150]. Interestingly, we do not need to provide their definitions since they will be ensured by the derived class es.

6.6.4.5 Implementation of the PlusOperator Class

All classes representing binary operations are derived from BinOperator, as shown in Figure 6.15. As an example, PlusOperator is shown in Listing 6.18; its definition starts on line [154]. Lines [160, 161] define its parametric constructor . All it has to do is call its base class, passing the provided arguments left and right.

Because PlusOperator is a “real” class, it also needs to provide full definitions of the Accept [170] and Clone [175] virtual functions.

The implementation of the Accept member is very simple and the same for all objects of the TNode hierarchy. It is shown in Listing 6.19 for PlusOperator.

Accept reverts the call to a given visitor object, providing a pointer this to this node object as its parameter. Such call reverting is enough for visitor v since, based on the type of this, an appropriate overloaded Visit function will be invoked due to polymorphism . This is the double-dispatching mechanism; we will return to it when describing visitor objects in Section 6.6.7. See also the class diagram in Figure 6.17.

On the other hand, the role and implementation of the Clone member function are covered in the next section.

6.6.4.6 Deep Copying Node Objects – The Prototyping Mechanism

Whereas copying objects containing data – such as NumberLeafNode – is not a problem, and we can even leave code generation for this action to the compiler by specifying = default, there is an inherent problem with copying objects containing pointers to other objects, such as PlusOperator. Simply copying pointers is usually undesirable since we end up with two different objects with references to the same data (Section 4.6.3). Therefore, all binary operations implemented with classes derived from BinOperator are disallowed any copying and moving operations by defining their copy and move members as = delete. So, in the following code snippet, only the first line compiles:

Imagine that we need to duplicate a tree, composed of a number of nodes of different types, having only a pointer to that tree's root node. Such a task might be needed to launch two independent tree annotators, for instance. How do we traverse such a tree, and create a copy of each node and each child node? Even worse, if we have a pointer to a child, we store only a pointer to the base class TNode, so how can we know which node exactly to re-create? This can be done using a special visitor that knows everything about the node hierarchy. But there is a much simpler and more general solution to this ubiquitous problem: the prototype pattern, in the form of special virtual Clone members implemented in each class of the TNode hierarchy. The best way to understand its operation is to analyze the simple code in Listing 6.20.

Each Clone member,5 such as the NumberLeafNode::Clone whose definition starts on line [1], dynamically creates its clone object, as on line [3], providing itself as a creation pattern in the form of the * this argument to the copy constructor . Then a smart pointer to this newly created object is returned, so we do not need to worry about the lifetime of that object. That's it. The only thing to observe is that Node_UP was defined to store a pointer to the base class TNode. Again, this is an instructive example of C++ polymorphism and the Liskov rule, which states that any derived object can be substituted in place of the base. However, when we take such a pointer, whatever it is, and call its virtual Clone member (being virtual is a key point here), the Clone will know precisely what object it needs to create – this is the mechanism behind the prototype pattern.

A slightly more advanced action is performed by Clone members in the classes representing binary operators, such as PlusOperator::Clone, whose the definition starts on line [5]. It needs to create another PlusOperator by calling std::make_unique on line [7], which invokes the PlusOperator parametric constructor . But its two parameters also need to be new subtrees, since we re-create the entire tree and all of its nodes are brand new. Again, we are lucky to have the Clone operation defined for every node – to re-create a subtree, we simply take the child node by calling GetLeftChild and GetRightChild on line [8], and calling their Clone members.

This way, the whole cloning operation will recursively pass down the tree, and the new PlusOperator node will be created with two branches with new subtrees.

6.6.5 Interpreter to Build a Syntax Tree

We already defined a simple interpreter, TSimpleExpressionInterpreter, that follows the grammar rules of simple expressions and can tell whether an expression can be derived from these rules (Listing 6.14). We have also defined building blocks in the form of syntax tree nodes, as shown in Figure 6.15. Now we will join the two, so that when parsing the expression and matching a grammar production after a grammar production, the interpreter can also generate the syntax tree that reflects the internal structure of the expression, as shown in Figure 6.14. Thanks to the object-oriented features of C++, we can achieve this task by deriving a new interpreter class from the base class TSimpleExpressionInterpreter and tweaking some of the functions.

The new ExpressionTreeBuilderInterpreter class is publicly derived from TSimpleExpressionInterpreter, as shown in Listing 6.21, line [19]. Hence, in addition to a number of important #include s, we need to include the header with the base class, as on line [13].

The constructor and virtual destructor on lines [24, 25] are declared as defaults. Since the class will store a unique_ptr to the resulting syntax tree, copying and moving are forbidden by defining the appropriate members as deleted on lines [29–36].

Instantiations of the ExpressionTreeBuilderInterpreter class are functional objects due to the presence of the overloaded functional operator (), whose definition starts on line [54]. Its action on line [56] is to call the base functional operator and, if successful, copy an object from the top of fNodeStack to the fRoot member, as done on line [57]. These are defined on lines [70, 72], preceded by the type alias es on lines [64, 66]. The role and implementation of the node stack will be explained in the next section.

As mentioned earlier, in the derived interpreter class, we will reuse some member functions representing the grammar productions. More specifically, we can reuse those that call only other nonterminal actions, whereas those that involve terminal symbols need to be modified slightly.

Let's start with Digit_Fun, whose role is to try to match each of the digits from 0 up to 9. In the base class, such matching was enough; and if it was successful, true was returned. In the derived overridden version of Digit_Fun, whose definition starts on line [78], the first matching task is identical to that in the base class. However, after a successful match on line [80] and before returning true on line [83], a new NumberLeafNode is created on line [82], which stores the numerical value corresponding to the matched digit symbol. Moreover, the entire node object is pushed onto the node stack fNodeStack, where it waits for further processing.

These actions are repeated for each digit from 0–9, so we skip the implementations here.

Let's now examine the following Expr_Prime_Fun function – whose definition starts on line [97] – and its first production, E' -> + T E'. After matching the + symbol on line [99], the term branch is evaluated by calling Term_Fun on line [101]. So far, we are performing the same action as in the base class. But after matching + and returning from Term_Fun, it is possible to join the nodes that were previously created and saved on the stack nodes into a subtree bound by the PlusOperator node. This is done by calling the auxiliary CreateSubTree function on line [103] with a new, and still not connected, PlusOperator as its argument . The CreateSubTree function will be explained shortly. Then Expr_Prime_Fun is recursively called on line [104] to verify the E' -> + T E' rule.

Nearly identical calling schemes are performed for all the other operators. In each case, CreateSubTree is called, with only one difference – the type of the created node object, which reflects the operator just matched by the Match function.

Figure 6.16 illustrates the operation of the CreateSubTree function when processing the 3+4*5 expression. The NumberLeafNode objects with values 3, 4, and 5 are directly pushed onto the stack in the order of their arrival from the parser (Section 4.8.4). However, when a new binary operator node is created, its two child subexpressions are popped off the top of fNodeStack. These are then connected to that operator as its left and right subtrees. Finally, the operator itself is pushed back on top of the stack, and the operations proceed until the entire expression has been parsed or a syntax error has been encountered.

Schematic illustration of the stack operations of the CreateSubTree function when processing the expression 3+4*5. When a new binary operator node is created, its two child subexpressions are popped off the top of the
stack. These are then connected to that operator as its left and right subtrees, and finally the operator is
pushed back on top of the stack. On the other hand, the NumberLeafNode objects are directly pushed
onto the stack in the order of their parsing.

Figure 6.16 Stack operations of the CreateSubTree function when processing the expression 3+4*5. When a new binary operator node is created, its two child subexpressions are popped off the top of the stack . These are then connected to that operator as its left and right subtrees, and finally the operator is pushed back on top of the stack. On the other hand, the NumberLeafNode objects are directly pushed onto the stack in the order of their parsing.

Implementation of CreateSubTree, whose definition starts on line [176], simply follows the previously described algorithm. Its parameter bin_op is a newly created binary operator with empty left and right subexpressions . First the precondition is checked to assert that the stack has at least two nodes that can be processed. Its two topmost nodes need to be removed from the stack, as done on lines [185, 186]: these constitute the right and left subexpressions. In this order, they are connected to the passed bin_op by calling its AdoptLeftChild and AdoptRightChild members on lines [188, 189].

Note that the subtrees are passed from the left and right smart pointers using the std::move helper. After that, bin_op owns the passed objects, and left and right are empty. On line [191], bin_op is pushed back on top of the stack, where it waits for another such action. Finally, on line [192], the postcondition is verified, which asserts that bin_op has been passed onto the stack.

The BUILD_THE_TREE preprocessor flag, if set to 0 on line [178], turns off the code on lines [179–190]. In this case, all the nodes are pushed onto fNodeStack, and no tree is built. We will use this mode of operation when discussing stack operations in Section 6.6.9.

6.6.6 Stack for Smart Pointers

In this section, we will specifically focus on the following issues:

  1. Template specializations with a specialized stack class to store std::unique_ptr
  2. Proper use of ownership and move semantics
  3. Dependent and independent names in template definitions

We discussed the template TStackFor class in Section 4.8.4. Its main purpose is to provide a stack interface (LIFO functionality) with an internal vector for data support. But TStackFor, in its general form as shown in Listing 4.17, assumes that objects pushed onto or popped from the stack are copied. However, when manipulating objects with strong owning semantics, such as the Node_UP or any other object held by std::unique_ptr, such a strategy cannot work since the objects need to be moved instead. Hence, we need a slightly more specialized version of TStackFor. As we pointed out in Section 4.8.2, C++ is ready for such challenges with the template specialization mechanism, which allows us to write a special version of a template class for a special occasion: that is, for a specific instantiating type. Such a version of TStackFor, specialized to cope with std::unique_ptr objects, is presented in Listing 6.22. The class definition starts on lines [2, 3] with TStackFor< std::unique_ptr< T >, MaxElems >, which tells the compiler that this a is specialized version if it happens to instantiate a TStackFor for any type T being held within std::unique_ptr. We can freely specify type T, as long as it can be held by std::unique_ptr, as well as the second numeric parameter MaxElems, which stands for the maximum allowable size of the stack.

Lines [7, 9, 12] contain a series of type alias es. These provide shortcuts for the types used in this specialization. In the third case, an additional typename was necessary. Such a helper is needed to assure the compiler that the construction refers to a type, and not to a variable in a namespace, for instance. Let's briefly describe what is going on. In C++ templates, there are two types of names:

  1. Dependent name – A name that in some way depends on a template parameter
  2. Non-dependent name – A name that does not depend on a template parameter

The problem is with the first group, i.e. dependent names, since being dependent on a particular type of template parameter means they can represent totally different language categories. For example, the template dependent name T::my_type can denote a type, whereas T::my_static_data can denote a name of a static object. However, in the two-phase name-lookup process, the C++ compiler needs to figure out precisely what group a particular name belongs to. In the first pass, before a template is provided, the compiler selects non-dependent names. It looks for dependent names when an instantiation happens with a particular type. Hence we can use the typename keyword in front of a name that is dependent on a template parameter to tell the compiler that this is the name of a type. If typename is not placed in front of a dependent name, the compiler treats it as a non-type name and sifts it out in the first pass. There are many additional rules involved in template parameter matching, particularly concerning the dependent and non-dependent types (https://stackoverflow.com/q/610245; Stroustrup 2013; Cppreference.com 2019b; Vandevoorde et al. 2017).

Returning to the rest of the code, on line [16], the main underlying data container is defined. Recall that this is a std::vector storing std::unique_ptr data. Hence, TStackFor can be seen as a kind of adapter that superimposes a special stack-like interface over a flat vector structure. As expected, there are two member functions. The definition of the Push operation starts on line [42]. Note that its parameter new_elem is passed by value. Because this is a unique_ptr, its held object is taken over and – after checking on line [44] to be sure there is enough space – placed on top of the stack. For this purpose, emplace:back is used on line [47], rather than the simple push_back. The former has the advantage of directly using the move operation and hence avoiding unnecessary object copying. Note that after Push returns, the passed new_elem smart pointer is empty (Section 5.3.1.1).

The Pop function is defined on line [66]. In our realization, it is perfectly OK to call Pop from the empty stack . Such conditions are checked on line [68], and if the stack is empty, then false is returned. Otherwise, an object is first read out on line [71] from the last element in the underlying vector using the fData.pop_back method. The returned object is a unique_ptr holding an object of type T. Therefore, in the same line, it is first released and reset to the ret_elem. Then, on line [72], an empty now smart pointer is removed from the underlying fData vector ( pop_back only reads the last element, leaving it untouched).

Finally, on lines [78, 79], a useful template alias is created that defines a group of stacks to hold std::unique_ptr s of any type T with the maximum number of elements set to 1000.

Such an alias with partial specialization is possible only with the using directive, since the older typedef can be used only to specify complete types. Therefore, in new code, the former is a preferred.

Let's also mention that SL offers the std::stack class. But our needs regarding the interface are slightly different than that provided in std::stack, so we used the previous short implementation.

6.6.7 Traversing Trees with the Visitor Design Pattern

Having now created syntax trees, we can perform actions on them. For example, we can traverse a tree and compute its value. We can also try to simplify an expression by detecting common factors, or we can generate code that will perform whatever semantics are expressed by a tree. Such actions can be coded directly into the classes defining the tree. However, if we had many different actions, this would clutter the code. In such cases, we can use the visitor design pattern.

As shown in Figure 6.13, a tree is composed of N different nodes and there are V different visitors. Each visitor has to traverse the tree, accessing each of its nodes in a certain order. Hence we must implement a rectangular structure of N × V different actions. The problem now is figuring out which function to call. This is known as the double-dispatching problem. As we will see, the correct design of the two hierarchies underpinned with the polymorphic mechanism helps us keep order and easily add new visitors and nodes to this mechanism if necessary. The UML class diagram in Figure 6.17 shows the two class hierarchies and their interaction realized by the TNode::Accept and TVisitor::Visit complementary functions. As we will see, they solve the double-dispatching puzzle.

Before we go to the code analysis, we will explain a slight problem related to file organization: because the definition of TNode depends on TVisitor and the definition of TVisitor depends on TNode, there is a cross-dependency between these two. In C++, this can be resolved using the forward declarations of classes and by properly organizing the header and source files. In our projects, these are organized as shown in Figure 6.18.

Let's consider the organization of the Nodes.h and Nodes.cpp files. Nodes.h can contain only the forward declaration of TVisitor, which lets us use references and pointers to TVisitor. But this is not sufficient for the compiler to call a member function or access member data from TVisitor. Hence the definitions in Nodes.h of TNode and its related classes can use only references and/or pointers to TVisitor – nothing more. Because of this, all actions requiring full access to the definition of TVisitor need to be postponed to the Nodes.cpp source file. A similar situation with respect to the definition of TNode happens in the Visitors.h and Visitors.cpp files. In other words, Nodes.h does not include Visitors.h, and Visitors.h does not include Nodes.h. However, both source files, i.e. Nodes.cpp and Visitors.cpp, can freely include both headers, Hodes.h and Visitors.h.

Schematic illustration of the UML class diagram of the TNode and TVisitor class hierarchies and relations between them. The
TNode composite is a structural pattern. Objects of the TNode hierarchy can be used to build binary trees representing
expressions. TVisitor is a behavioral pattern. Concrete visitors are responsible for traversing a tree and performing particular actions. Each TNode is equipped with the Accept member function, which can accept any representative of the TVisitor hierarchy. Complementarily, any TVisitor has a series of Visit member functions, which are
overloaded for each concrete node. These build the double-dispatching calling relation.

Figure 6.17 UML class diagram of the TNode and TVisitor class hierarchies and relations between them. The TNode composite is a structural pattern. Objects of the TNode hierarchy can be used to build binary trees representing expressions . TVisitor is a behavioral pattern. Concrete visitors are responsible for traversing a tree and performing particular actions. Each TNode is equipped with the Accept member function, which can accept any representative of the TVisitor hierarchy. Complementarily, any TVisitor has a series of Visit member functions, which are overloaded for each concrete node. These build the double-dispatching calling relation.

Schematic illustration of the UML artifact diagram showing relations of the header and source files with nodes and
visitors. A double-dispatch relation requires strict separation of definitions from implementations since
both groups of definitions depend on each other. Hence, header files contain only forward declarations of
other classes. Sources include both headers.

Figure 6.18 UML artifact diagram showing relations of the header and source files with nodes and visitors. A double-dispatch relation requires strict separation of definitions from implementations since both groups of definitions depend on each other. Hence, header files contain only forward declarations of other classes. Sources include both headers.

There are many resemblances between the two class hierarchies. The base class TVisitor is defined starting from line [16] in Listing 6.23. Exactly as for the TNode classes, the definition of TVisitor is preceded on lines [4–12] by a number of forward declarations of the classes whose names need to be known in this context.

On lines [21, 22] are default definitions of the constructor and virtual destructor, which are necessary in the case of class hierarchies. For clarity, we omit other default constructor s here.

On lines [27–31], we encounter a number of declarations of the pure virtual Visit functions, each reflecting access to a different node object that can be created. There is no Visit for BinOperator or TNode since we will never directly visit those nodes.

6.6.7.1 The Expression-Evaluating Visitor

EvalVisitor, defined on line [35] in Listing 6.24, is a visitor that aims to evaluate the value of the entire expression represented by a syntax tree. Since it is a very simple class, we omitted the definitions of the constructor and destructor . This is acceptable since the virtual destructor defined in the base class is sufficient. There is one data member fValue, defined on line [39], which upon successful traversal of a tree will contain the value of the entire expression.

Lines [66–70] declare the entire series of overridden Visit functions. They strictly follow the corresponding definition from the base class, which is underpinned with the override keyword. Also note that at the same time, these Visit s are overloaded for each potential node.

Listing 6.25 contains an implementation of the most representative members of EvalVisitor : the Visit methods. The definition of the first Visit for the NumberLeafNode object starts on line [76]. In this case, the only action is to copy the value from the NumberLeafNode object to the visitor 's inner data member fValue, on line [78].

On the other hand, an overloaded version of Visit, this time for PlusOperator, is defined on line [81]. For this and other binary operators, first the LeftRightValues member function is called on line [83]. It returns a tuple with two values that correspond to the values of the left subtree and right subtree. All that remains is to add these values on line [84] and assign them to fValue, which always reflects the last value of the subexpression.

A slightly different implementation can be observed in the Visit for DivOperator, defined on line [86]. In this case, evaluating division can throw an exception on division by zero, as encoded on lines [90, 91]. In practice, we should react not only on zero but also on division by a value that is too small (Section 3.16). In EvalVisitor, this it is controlled by the kDivThresh constant member.

Finally, let's analyze the LeftRightValues function defined on line [103]. As alluded to previously, it is called by each of the binary operators to assess the values of its left and right subtrees. This is accomplished by a call on line [105] to evaluate the left subtree, after which on line [106] the current content of fValue is stored in the local variable left_val.

A similar action is repeated on line [108, 109] for the right subtree. Then the values of these two subtrees are returned on line [111] as a tuple.

There are many possible scenarios for implementing the interaction between TNode::Accept and TVisitor::Visit. But in this project we decided that as much as possible, actions would be implemented in the group of visitor objects, relieving the nodes of that burden. This adds to the code's clarity and makes it painless to add new visitors.

6.6.7.2 The Expression-Printing Visitor

The second visitor, PrintVisitor, is defined starting on line [114]. Its role is to print a syntax tree in a terminal window. Each level of the tree should be properly indented to reflect the tree's structure. Like other visitors, on lines [138–142], this one declares the entire suite of overridden and overloaded Visit members for each different type of node. All of them call the auxiliary LeftRightProcess function declared on line [133].

Listing 6.26 shows the two representative functions of PrintVisitor. The first Visit for the NumberLeafNode objects, on lines [3, 4], simply prints its value after properly indenting the output as controlled by the fDepthLevel data member. The version of Visit for the PlusOperator objects, shown on line [7], calls the LeftRightProcess function on line [9] with the '+' character as its second parameter. Exactly the same action, but with a different second argument, is performed in the case of the remaining binary operators.

When traversing a tree, the LeftRightProcess function, whose definition starts on line [12], is called repeatedly from different nodes on different levels of the tree. Each level is expressed by different indentation value. Therefore, at each call to LeftRightProcess, the indentation value is advanced by a constant value, as on line [16]. But before this happens, on line [14], the current indentation value is preserved. It is restored on line [27] at the end of this function. As a result, the deeper levels' indentation is greater by the fDL_Step value.

On line [18], the Accept from the right node is invoked. On line [20–23], the symbol of the current operation is called with a number of formatting characters. Finally, on line [25], the Accept on the left child is called, and the last indentation value is restored.

6.6.8 Testing the Interpreters

In this project, we have designed and implemented classes that together form the simple expression interpreter framework shown in Figure 6.13. Basically, we have two modes of operation, each related to the functionality of the two interpreters: TSimpleExpressionInterpreter, which verifies the syntax of an expression; and ExpressionTreeBuilderInterpreter, which, when parsing, builds the syntax tree that reflects the structure of the processed expression. In this section, we present two test functions to verify the performance of these two objects.

In SimpleExpression_Test, shown in Listing 6.27, the definition starts on line [14] and the theInterpreter object of the TSimpleExpressionInterpreter class is created on line [17]. It is then used in the loop on lines [28, 29] to test a number of expressions contained in the vector of strings, as defined and initialized on lines [19–25].

The only answer we can get from this interpreter is whether an expression does or does not comply with the grammar, as shown in the following output message:

The SyntaxTree_Test function in Listing 6.28, line [1], operates with the ExpressionTreeBuilderInterpreter class from Listing 6.21. The exprParser object of this class is created on line [4]. The expression to be processed is defined on line [6].

Line [8] invokes operator () with good_expr as its argument. This does the parsing, as described in Section 6.6.5. If false is returned, then a syntax error was encountered. In this case, on lines [10, 11], the correct message is output, indicating a possible place where the interpreter got stuck, and the procedure ends.

On the other hand, if the expression is parsed with success, then on line [18], theRoot pointer is attached to the root of the syntax tree. We do not know what the type of this node is, but this does not stop us from calling the Accept virtual function on line [23] with the PrintVisitor object as its parameter. This is polymorphism again. As a result, the structure of the syntax tree is printed in the terminal window, as we will show.

Although the currently processed tree could be used again, on line [27] its exact copy is created by calling the Clone method on the node pointed to by theRoot pointer. This is used by evalVisitor, created on line [31], to compute and print the value of the entire expression, as coded on line [34, 35]. But since the division can throw an exception, the whole structure is enclosed with the try-catch statement starting on line [32].

After executing the previous SyntaxTree_Test, the following output is obtained from PrintVisitor :

After that, EvalVisitor outputs

In the next section, we will see how the value of an expression can be computed without a syntax tree using a direct operation on the stack with nodes.

Finally, it is worth noticing that the elements of formal grammars, parsing tree s, and expression evaluations that we have discussed constitute the salt of the earth in compiler design. Despite their advanced theoretical background, areas such as embedded systems, production automation, and robotics can use these techniques to implement a simple language – a command-like system – a to program or control a device. If the grammar for our command language is simple, after the left-recursion elimination, we can try to implement it, as shown in this section. All of the parsers presented in TSimpleExpressionInterpreter, as well as its derived ExpressionTreeBuilderInterpreter, belong to the group of top-down LR parsers. These can be used for many simple grammars and also for more complex grammars used for expressions . A characteristic feature of the parsers presented here is that the grammar rules are hardcoded. That is, to change or modify a grammar, we need to change the code and recompile the solution. Usually this is not a problem, but some solutions let us feed a grammar to the parser as an input. In such cases, and also for more complicated grammars, libraries such as Flex for lexical analysis and Bison for parser construction are recommended (successors of Lex and Yacc). Bison relies on the bottom-up RR parsers and lets us process a broader group of grammars. An interesting C++ implementation of the recursive -descent parser is the Boost Spirit library (www.boost.org/doc/libs/1_67_0/libs/spirit/doc/html/spirit/introduction.html): a template-based code that lets us parse a grammar expressed in the BNF notation directly in C++.

When learning C++, skimming Compilers: Principles, Techniques, and Tools by Aho et al. (2006) and the slightly more up-to-date Modern Compiler Design by Grune et al. (2012) is highly recommended. They provide a much broader view not only of computer language definitions, compilation, code generation, memory management, assembly, and linking, but also of C++ itself. Particular programming techniques used in compilers are frequently encountered in other programming assignments.

6.6.9 Representing Expressions on a Stack in Reverse Polish Notation

We saw the structure of an interpreter that, when parsing an expression, builds the syntax (parsing) tree, which is then traversed by EvalVisitor to finally provide the value of an expression. This seems like a lot of constructions. Indeed, if evaluating the value is the ultimate goal, then a much simpler solution can be found. A possible method is to first convert the infix expression into reverse Polish notation (RPN), which lets us represent the expression on a stack without using parentheses and then compute the value of that expression. This problem attracted much attention in the early years of computer science. For example, in 1961, Edsger Dijkstra published a report describing his shunting yard algorithm (https://en.wikipedia.org/wiki/Shunting-yard_algorithm), which does the job. It may be a surprise, but we followed almost the same idea, as we will soon show. However, before that, let's briefly explain what RPN is.

6.6.9.1 Reverse Polish Notation

RPN (https://en.wikipedia.org/wiki/Reverse_Polish_notation) is a way of writing mathematical expressions, fulfilling the precedence and associativity rules of operators with using parentheses. The name comes from the idea of prefix notation proposed by the Polish logician Jan Łukasiewicz in 1924 to facilitate understanding of mathematical expressions. But in computer science, its reversed form, as proposed in 1954 by Arthur Burks and his colleagues, has found many applications. The idea behind RPN is simple. The following example expression in common infix notation

can be equivalently written in RPN as follows:

That is, in RPN, the operators follow their operands in a series. The previous formula can be evaluated by reading from left to right. When an operator is encountered, its two preceding operands need to be taken, and the place of this triple in the series is filled with their result. The algorithm continues until only one value is left in the series – this is the value of the entire expression. In our example, working from the left, we find multiplication * first. This and its two predecessors 4 and 5 are taken, and their result 20 is written back. Continuing to the right, the plus operator + is encountered. Again, its two predecessor operands are taken, this time 3 and 20, and the final result 23 is written back. If the expression was

then its RPN version would be

which evaluates to 35. Interestingly enough, RPN was found by many engineers to be a faster form of evaluating expressions . This was reflected in the famous series of engineering calculators by Hewlett-Packard, such as the HP35.

Based on this description, notice that the most natural way to process infix expressions into their RPN equivalent form, after which they can be easily evaluated, is the stack data structure discussed in Section 4.8.4. Let's trace the evaluation of the last RPN expression using the stack, as shown next.

6.6.9.2 Algorithm for Evaluating an RPN Expression

In the previous section, we outlined the algorithm for evaluating values of RPN expressions . Its steps require an auxiliary stack to hold temporary values, as shown in Figure 6.19. The expression is read from left to right. The operands, such as 5, 3, and 4 in the previous example, are pushed onto the stack. When an operator is encountered, the two topmost values are popped off the stack, such as 4 and 3; and their value – 7, in this case – is pushed onto the stack. This continues until the end of the RPN expression is encountered and only one value is left on the stack. This is the result of the expression: 35 in our example.

The ComputeValueFrom function, whose definition starts on line [8] in Listing 6.29, implements this idea. But before we delve into the code, let's summarize the novel C++ techniques that will be used on the way:

  1. Basics of RTTI for investigating the types of objects when executing code
  2. Using typeid
  3. Using dynamic_cast with references and pointers
  4. Explicitly calling base member functions
Schematic illustration of evaluating the RPN expression 5 3 4 + *, The expression is read from left to right. There are
two categories of symbols: operands and operators. The operands are pushed onto the stack. When
an operator is encountered, then the two topmost values are popped from the top of the stack and used
together to compute the value, which is then pushed back onto the stack. The steps continue until the end of the RPN expression is encountered and there is only one final value on top of the stack.

Figure 6.19 Evaluating the RPN expression 5 3 4 + *, The expression is read from left to right. There are two categories of symbols: operands (values) and operators. The operands are pushed onto the stack . When an operator is encountered, then the two topmost values are popped from the top of the stack and used together to compute the value, which is then pushed back onto the stack. The steps continue until the end of the RPN expression is encountered and there is only one final value on top of the stack.

Our input to the ComputeValueFrom function is a vector of nodes, defined by NodeVec on line [1]. This corresponds to the input RPN expression, 5 3 4 + * in Figure 6.19, but the elements are node objects from the TNode hierarchy (Section 6.6.4). The returned value will be represented by the LeafValType whose alias is defined on line [4].

If there are any nodes in the input node_vec, the auxiliary stack auxStack holding LeafValType objects is created on line [21]. Then the nodes from node_vec are processed from left to right in the range loop that starts on line [25], as shown in Figure 6.19. But the problem is as follows: although all elements of node_vec are smart pointers to the TNode base class, the actual pointers are to different objects from the TNode hierarchy. For example, with our simple expression, we will have three LeafValType s followed by one PlusOperator and one MultOperator. We need to process them differently, as described in the previous RPN processing algorithm, but we have only TNode pointers. So, we have to figure out how, at runtime, to differentiate one TNode pointer to LeafValType from the other TNode pointer to PlusOperator, for instance. In C++, the RTTI mechanism comes to the rescue, as we mentioned in Section 3.19.1 when we discussed GROUP 2 of the C++ operators. The two RTTI constructions are the typeid and dynamic_cast operators. typeid allows us to check the type of an object referred to with a pointer or a reference, even if that pointer or reference is to the base class from which the class of an object of interest was derived. On the other hand, at runtime, dynamic_cast will try to do a checked conversion of a pointer or a reference to the requestedtype. If the converted types are up, down, or sideways along the inheritance hierarchy, then the conversion will succeed. Otherwise, if dynamic_cast< a_pointer > is attempted, a nullptr will be returned. In the case of dynamic_cast< a_reference >, a std::bad_cast exception is thrown. We will use these features in our code. First, to distinguish the LeafValType objects, line [29] verifies the typeid of an object held by node, using the typeid of NumberLeafNode. If they agree, then the value of the object is pushed onto the auxStack. Concretely, on line [32] node is cast to the pointer to the NumberLeafNode object, from which the value is read and pushed onto the stack. But how is this done? Because node is a std::unique_ptr, it is not compatible with the NumberLeafNode class. However, it has an overloaded cast operator built in to a pointer of the held class. As a result, std::unique_ptr can be used in all code requiring a pointer or reference to the held object. Hence, GetValue can be called through the node cast to the NumberLeafNode object.

If a node is not the NumberLeafNode type, it is probably one of the four binary operators. So, the only thing left to do is to figure out which one we currently hold in the node smart pointer . This is done on lines [46–61] by a chain of if-else statements . In each condition, a dynamic_cast to a pointer to one of the four operator classes is attempted; if the types agree, then the cast succeeds. In this case, the corresponding operation is performed on the two operands previously popped off the stack on lines [42, 43], and the result is pushed onto auxStack. For instance, on line [47], a summation takes place because PlusOperator was identified. In this construction, we use the fact that dynamic_cast returns nullptr if the types do not agree. In such a situation, an attempt to cast to a reference would throw an exception, which we try to avoid when resolving “normal” conditional constructions. typeid could also be used instead of dynamic_cast.

This construction is used for all operators except DivOperator. It is additionally preceded by checking for division by zero, in which case std::overflow_error is thrown,6 as shown on line [59].

Finally, the algorithm stops with only one value left on the stack. This is the final result, which is taken off the stack on line [69] and returned on line [71].

Notice that RTTI was not necessary in the ExpressionTreeBuilderInterpreter class or the visitor mechanism. By correctly designing our polymorphic operations, we can avoid using RTTI. There is nothing wrong with it, but the need to check object types in polymorphic code can make us think twice about whether our design could be better. However, sometimes there is no better way. In our example, we decided to keep the TNode hierarchy as free as possible from operations related to evaluation and other actions that are left to the visitors. Hence, e.g. the PlusOperator on line [47] will not add the two operands for us: it simply represents the plus operation.

The RPN_Value_Test function on line [5] glues together parsing the expression and evaluating the expression value, which is done directly from the stack structure by the ComputeValueFrom function. But since we are reusing the same ExpressionTreeBuilderInterpreter class, we need to change the code slightly to push all the tokens onto the stack and stop building the syntax tree. For this purpose, in the code shown in Listing 6.21, the BUILD_THE_TREE preprocessor flag on line [178] should be set to 0. This simply excludes a few lines from the action, and only the last Push onto the fNodeStack is executed. To verify the proper compilation setting, a simple checkpoint is inserted on lines [7–9]. However, what is more interesting is understanding that by pushing the TNode objects onto fNodeStack, we obtain the RPN structure of the expression, exactly as shown in the example in Figure 6.19. In other words, our interpreter, fueled by the grammar rules, does infix-to-postfix conversion!

To make our exprParser – defined on line [11] as an object of the ExpressionTreeBuilderInterpreter class – perform parsing without any final postprocessing of the syntax tree, line [16] invokes operator() from the base class. To accomplish this, we have to put the name of the base class before the name exprParser of the object and the name of the member function. In our case, the name of the base class is TSimpleExpressionInterpreter, for which an alias BaseClass was created on line [64] in Listing 6.21. But we need to provide a qualified name (see Figure 6.2) for the member function, which in our case is operator (). Hence, analogous to the member call scheme in Figure 3.8, a general scheme for calling members of the base class(es) is as shown in Figure 6.20.

Schematic illustration of the syntax of the base class member call from an object or a reference to the object (a) and by the pointer to an object (b).

Figure 6.20 Syntax of the base class member call from an object or a reference to the object (a) and by the pointer to an object (b).

Alternatively, line [16] can be written as follows:

16   if( ( & exprParser )->BaseClass::operator()( good_expr ) )

Then, on lines [20, 21], ComputeValueFrom from Listing 6.29 is called with the input vector reflecting the RPN structure of the input expression. Since an exception can be thrown, the call is enclosed by the try-catch statement on line [18–23] (Section 3.13.2.5).

Summarizing, in this section, we have learned about the following:

  1. Basics of formal grammars, lexical analysis, and expression parsing
  2. Building class hierarchies with interfaces expressed with pure virtual functions
  3. Expression parsing with the interpreter design pattern
  4. Implementing syntax trees with the composite design pattern
  5. Object access with non-owning pointers and owning access with smart pointers
  6. Stack specialization to store std::unique_ptr objects
  7. Traversing tree structures with the visitor design pattern
  8. The cloning technique and prototype design pattern to create objects dynamically
  9. Converting infix expressions into RPN and stack operations
  10. RTTI with the typeid and dynamic_cast operators

6.7 Summary

Questions and Exercises

  1. Refactor the system shown in Figure 6.1 to handle menu item actions. Using the composite design pattern, as shown in Figure 6.15, implement a macro command. This command is also derived from the THandler class from Listing 6.1. But at the same time, it contains other pre-recorded commands.
  2. Implement a cyclic buffer with THandler objects (Section 6.1). Use smart pointers.
  3. Using the range-v3 library (https://github.com/ericniebler/range-v3), implement the letter histogram component discussed in Section 2.5.
  4. Write regular expression s to match the following:
    1. A date with its fields: the two-digit day number, followed by a dot, then the two-digit month number, a dot, and the four-digit year
    2. A C++ comment: // a C++ comment …
    3. A C-like comment: /* a C comment */
    4. A sentence, i.e. text starting with a capital letter and ending with a dot, semicolon, exclamation mark, or question mark

    Then use the following function to test these regular expression s:

  5. Write unit test s for the currency exchanger project.
  6. The exchange project can be arranged in many different ways. For instance, instead of a new XML_CurrencyExchanger class, you could write a class to convert an XML file with currency data into a format for the initialization file suitable for the base class. Draw the relations of this approach. What are its benefits and drawbacks? What would happen if you had more different formats for the XML-coded information?
  7. Remake the component in the basic version of CurrencyExchanger presented in Section 3.18.5 to enclose it in a class instead of the CurrExchanger namespace . What are the consequences of the two solutions?
  8. In Section 6.2.4, we mentioned the strategy design pattern. It is similar to the handle-body pattern but lets us dynamically change the body object depending on the runtime conditions. Implement a strategy pattern to efficiently sort vectors of different lengths. A suitable sorting algorithm – the one that best fits the current vector length – should be instantiated before the sorting action. Hint: read about sorting algorithm performance with respect to vector length.
  9. Based on the example function RecursivelyTraverseDirectory presented in Section 6.2.7, write function(s) to traverse a directory and create a sorted list of the files' dimensions.
  10. Write a software mechanism to recursively traverse a directory and all of its subdirectories and perform an action supplied in the form of a functional object passed as a parameter (this can be a composite pattern).
  11. Refactor the CC_GUI class to keep all the positions and dimensions of the controls in a separate resource file (see Listings 6.8 and 6.9). In the class initialization procedure, open that file, read the data, and properly set up the widgets. Do not forget about controlling data correctness.
  12. Design and implement an extended version of the CurrencyCalc_GUI project to allow for currency type conversions in both directions, as well as choosing an arbitrary reference currency.
  13. Find out how to create a dynamically loaded library in your OS. Remake the static library presented in Section 6.2.6 into a dynamic one, and test its behavior. What are the advantages and disadvantages of both types of libraries?
  14. Refactor the SyntaxTree_Test function defined in Listing 6.27 to accept user-supplied expressions.
  15. Extend the lexical analysis to allow for white symbols (space, tabs, etc.) and C++ numbers. For this purpose, define and use the grep patterns discussed in Section 6.2.1.
  16. Derive a new interpreter class that accepts not only numbers but also symbols, so such expressions as a + 3.14 * b are recognizable. Then construct a symbol table containing the associations of symbols with their values and write a visitor to evaluate expressions.
  17. Extend the interpreter to accept function calls with various arguments.
  18. Design and implement a code-generation visitor for runtime evaluation of any expression. The visitor should generate text, which will be C++ statements . These will form a new program that, after a separate compilation, will provide the final code for evaluating runtime expressions. See “Simple code generation for a stack machine” in (Dick et al. 2012). This idea can be further extended to create your own simple language generator.
  19. Implement a simple calculator using the Boost Spirit library (Boost.org 2020).
  20. Reflection is a technique that lets us inspect a class without having any information about it. Especially when implementing template functions and classes, for example, we might be interested in verifying whether a type for template instantiation contains certain data or the method for a proper call. Such features will be available soon in the C++ standard. But until then, we can try to implement at least some of this functionality. For example, a reflection method can be added to a class, which returns a std.::tuple conveying all of its data (Section 3.17). Such a reflection-aware class can be easily compared, streamed out, etc. This technique is shown in the following code, which presents a partial definition of the TBook class:
    1. Augment the TBook class with the remaining special functions.
    2. Add the overloaded operator== and operator <, as well as the streaming operators, taking advantage of the reflection method.
    3. Extend the reflection method to operate in the class hierarchy (make it virtual).
  21. When writing generic components using templates it is sometimes important to assure that the supplied types fulfill specific properties. For example, we may allow only signed types or types that can be compared with the == and != operators, etc. Such generic expectations of the template arguments can be verified with the concepts predicates, the new powerful feature of the C++20 standard.
    1. Read and learn about the concepts (e.g. B. Stroustrup: Concepts: The Future of Generic Programming,2017, https://www.stroustrup.com/good_concepts.pdf; https://en.cppreference.com/w/cpp/concepts). Write your own concept, say exclusively_unsigned, to assure that a type comes exclusively from the unsigned types.
    2. Refactor the RevertEndianess template function from Listing 4.16 to use your exclusively_unsigned concept. Verify the compiler messages when a signed type is tried.
    3. Design and implement concepts for the TStackFor template class used in Section 6.6.6.
  22. The std::optional, which we used in Section 6.2.5, has two companions:
    1. The template std::variant that can represent many objects of potentially different types, from which only one is held at a time. Read about the std::variant (https://www.bfilipek.com/2018/06/variant.html) and refactor the function GetRoots from the code from Section 3.16 to return the std::variant holding either one or two roots or an error code.
    2. The non-template std::any which can hold an object of any type which can be further changed in the run-time for example to pass ownership of arbitrary values (https://www.bfilipek.com/2018/06/any.html). Consider refactoring the fVal member of the ValueLeafNode from Listing 6.16 to the std::any to stand for any object representing a value such as a int, double or std::string. What are the differences between these two designs?
  23. In Section 6.2 we processed the XML file with the grep library. However, XML files can be processed with one of many C++ libraries, such as Xerces-C++ (http://xerces.apache.org/xerces-c/) or RapidXML (http://rapidxml.sourceforge.net/ ). With the help of one of them write a program to process XML file to extract information on the currency exchange ratios.
  24. JSON is another popular format for data storage and representation. There are also some C++ libraries for processing JSON formats, e.g. by Niels Lohmann (https://github.com/nlohmann/json) or RapidJSON (https://github.com/Tencent/rapidjson). Using one of these header-only libraries write a module to write and read currency information from Section 6.2 to and from the currency.json file.

Notes

  1. 1   A nice summary of Pimpl can be found on Bartek Filipek's coding blog: www.bfilipek.com/2018/01/pimpl.html.
  2. 2   It is also possible to build a library without using the CMake tool. That is, we can create a new project in our programming environment, select its type as a static library, and manually add the files. However, CMake gives us flexibility between various platforms and programming tools.
  3. 3   To describe the syntax of computer languages, the Backus–Naur Form (BNF) has been used for years. It is a metalanguage devised for the definition of the syntax of ALGOL 60 . BNF rules are composed of terminal symbols, nonterminal symbols, and meta-symbols such as the alternative |.
  4. 4   A double-dispatching technique but with a callback mechanism s was presented in Section 6.2.8.2.
  5. 5   Since Clone produces and abandons an object, it should be named something like Orphan_. However, in the C++ community, it is common to call it Clone or clone, so we also follow this nomenclature.
  6. 6   In STL, there is no predefined div_by_zero exception, so we can either use std::overflow_error or std::underflow _error or create a custom exception class.