2
Introduction to Programming

To create a computer program to solve a given task, we need to specify the operations a computer must perform for us. This knowledge comes from the problem analysis, and sometimes it is the most difficult stage of the software creation process. Once we know these operations, they need to be written in a form a computer – that is, its processor – can understand. Although each processor speaks its own language, called assembly or machine languages, these are too primitive to be used efficiently for coding by humans. Instead, many high-level computer languages have been developed for this purpose. C++ is one of them: it is well known; it has the best possible tools and libraries; and it has significant support from the C++ community, which has years of experience using it.

C++ has hundreds of features that can be used in various ways to translate our expectations into an optimal representation that a computer can understand. Learning about C++'s features is an interesting endeavor, but it can take time. However, knowing selected parts of C++ will enable us to write a large number of programs. In this chapter, we will present an introduction to programming with C++. To begin, we will discuss the hardware-software ecosystems . More precisely, we will present the computational models that constitute a foundation for understanding how the software operates and what its development process is.

2.1 Hardware Model

Computers are highly integrated but also complicated systems, composed of dozens of chips, each composed of thousands of transistors, etc. Modeling – that is, the creation of models – is the process of simplifying reality to expose some of the most important features that are necessary to explain part of that reality. Figure 2.1 depicts a computational model of a computer system equipped with a multi-core processor, main memory, a system support chipset, as well as a number of connected external devices, such as a keyboard, a screen, disks, and network devices.

The central part constitutes a processor with a number of cores, which themselves are independent computational units. The most important external module is the main memory, which contains both code and data. This type of organization follows the well-established von Neumann computer architecture.

In addition to the main processor, computations can be done by cores on the graphics processing unit (GPU ), which is also equipped with its own on-board memory. Although we will not directly address this platform in the programming exercises, it is good to know that its native programming languages are C and C++.

Schematic illustration of the model of a computer system. The central part consists of a processor with a number of cores that are independent computational units. A processor is usually augmented by a system support chipset, which is responsible for interfacing with various modules external to the processor. The most important is the main
memory that contains code and data. In addition to the main processor with its cores, computations can be performed by cores in the graphics processing unit.

Figure 2.1 Model of a computer system. The central part consists of a processor with a number of cores that are independent computational units. A processor is usually augmented by a system support chipset, which is responsible for interfacing with various modules external to the processor. The most important is the main memory that contains code and data. In addition to the main processor with its cores, computations can be performed by cores in the graphics processing unit (GPU ).

Figure 2.2 depicts a model of a multi-core processor with various types of memory. The multi-core processor has a series of onboard memory caches and has an external main memory module attached. We will now explain the functions of these hardware components from the computational point of view.

The arithmetic logic unit (ALU) is the heart of every core and performs basic low-level operations. These are in the form of binary-encoded commands stored in memory, which, after decoding, tell the ALU what to do: for example, with values stored in other memory regions. These commands are in an encoded form called machine language and are not human-readable. They can be represented in an equivalent assembly language that can be understood and even, in rare cases, used to write software; however, each processor has its own machine language, so writing in assembly is tedious, error-prone, and decidedly hardware dependent. Fortunately, these days we can avoid such inconvenience with the help of high-level languages such as C++. These are much easier for humans to understand and are independent of processors and operating systems. In addition, code written with their help can be easily translated into almost any known machine representation with the help of a compiler.

As shown in Figure 2.2, the ALU usually has relatively limited access to a particular block of the core's registers. Depending on the processor, these are a few memory locations that are able to store only a very limited number of computer word s, mostly for ALU operations. In addition to the universal registers, there are two special ones: PC, the program counter, which indicates the current position of the executed code; and SP, the stack pointer, which points at an external memory region called the stack that contains return addresses to procedures and sometimes call parameters (see Section 3.14.3). Figure 2.2 also shows two specific flags: zero (Z) and carry (C). In practice, these are two of many flags, usually encoded as specific bits in one of the control registers . The reason we have highlighted them is that they help to understand some numerical operations discussed in Chapter 7 of this book.

Schematic illustration of a model of a multi-core processor with various types of memory. The arithmetic logic unit performs low-level machine operations. The closest memory storage consists of registers. In addition to the
universal registers, there are two special ones: PC and SP. L1, L2, and L3 are memory caches with different sizes and access times. The cores can operate independently, sharing
only part of the upper-level memory. The processor is connected with external main memory that stores both
code and data.

Figure 2.2 A model of a multi-core processor with various types of memory. The arithmetic logic unit (ALU) performs low-level machine operations. The closest memory storage consists of registers. In addition to the universal registers, there are two special ones: PC (the program counter that indicates the current position of the executed code) and SP (the stack pointer, which points at the memory region called the stack, containing return addresses to the procedures and some parameters). L1, L2, and L3 are memory caches with different sizes and access times (1c denotes a single processor cycle). The cores can operate independently, sharing only part of the upper-level memory. The processor is connected with external main memory that stores both code and data (von Neumann architecture).

L1, L2, and L3 are memory blocks called memory caches and have different sizes and access times. Their characteristic feature is that the further from the ALU such a memory cache is, the larger its capacity – but its access time is also longer. At the very end of this specific capacity-speed triangle is the main memory: its capacity is a few orders of magnitude greater than that of the memory caches, but it also has the longest access times. This is due to the hardware technologies used to manufacture it. Registers and caches are internal processor memory (cores). But this is the external main memory, which, in the von Neumann architecture,1 contains the entire body of compiled executable code for a program as well as most of its data. More specifically, this is operational memory, which, unlike disk storage, can be directly addressed by a processor in order to execute a program. From the programming point of view, it is important to realize that to access any cell in the operational memory, the following two specialized signal buses need to be handled by the processor and the chipset:

  1. Address – All of this bus's signals specify the location of a memory cell to access. For historical reasons, the smallest possible cell to address is a byte, i.e. 8 bits of data. The logical addresses correspond to the C/C++ pointers
  2. Data – This bidirectional bus contains data to be read from or written to memory

Let's stress that this flat memory model greatly simplifies what is really going on in the hardware. Briefly, in many contemporary architectures, data buses are either 32 or 64 bits long. These even multiples of bytes specify what is called a computer word . Although, as mentioned, a logical address can refer to a single byte, due to the width of the data bus, memory accesses are related to operations on words. For this reason, the lower address bits need not be directly set on the bus. Together with the mechanisms of memory paging and address translation lookahead buffers (TLBs), this makes the physical number of lines in common architectures usually in the range of 40–52. Nevertheless, such a flat memory model is a fair trade-off between the variability of hardware architectures and what is seen from the software perspective. For example, using this approach, object placement and initialization in a memory space are explained as shown in Figure 3.1.

On the other hand, the much faster cache memory cannot be directly addressed from a C++ program. There is no need to do this, since caches only store copies of data from the main memory to allow for faster access. Nevertheless, they can be efficiently employed in the execution of C++ programs by proper data placement in the operational memory. We will address these issues in Chapter 8 when discussing memory access and concurrent operations of the cores.

After this brief introduction to modern computer architectures, notice that older and simpler processor architectures were usually based on an unsophisticated ALU, few registers, no cores, and no memory caches . All of these features allow us to run C++ programs even in a multithreading fashion. Hence, simple processors connected with external memory constitute an even simpler, but still highly usable, computational model. Such approaches are still in use on some embedded platforms, due to their lower cost and frequently lower energy consumption. The latter is also an issue when it comes to software design.

In this introduction, we have only touched on the domains of computer architectures and the computational programming model. Nevertheless, this discussion will be useful for understanding the programming techniques discussed in the following chapters of this book. For further reading, we recommend Patterson and Hannessy (2018) and Bryant and O'Hallaron (2015).

2.2 Software Development Ecosystem

Software makes computers run. On the other hand, computers are used in software development . This dichotomy can be observed in Figure 2.3: it presents the software development ecosystem, elements of which will be the topics of later chapters of this book. Let's briefly characterize each of them.

The first steps shown in Figure 2.3 are software design and coding. Although these are the first steps in this diagram, we will soon discover that they are the end of a very important – perhaps the most important – conceptual phase of software development . After part of the code is completed, it needs to be translated into a representation that the processor can understand, as discussed in the previous section. This process of translating from the C++ language is done by a program commonly known as a compiler. Shoulder to shoulder with operating systems, compilers are very complex and sophisticated programs, as we will have many occasions to observe. In the C++ environment, software components can be created independently; and to build a final program, these pieces need to be joined together. This task is performed by a linker . After its successful operation, we have a code component that can be executed by a computer.

Schematic illustration of the software development ecosystem. To execute software written in C++, it needs to be translated by
the compiler into a machine representation. Since software can be created in parts and can use previously created libraries, all of them need to be joined into one executable. This is done by the linker. These are the software build stages.

Figure 2.3 Software development ecosystem. To execute software written in C++, it needs to be translated by the compiler into a machine representation. Since software can be created in parts and can use previously created libraries, all of them need to be joined into one executable. This is done by the linker . These are the software build stages. Next, the executable needs to be properly tested and debugged to remove malfunctions. Then, the operation of the current version can be further optimized by launching the profiler . Its role is to measure the performance of software components to discover hot spots – code fragments that consume the most execution time. After that, the entire software development process can be repeated to build the next, better version. During this process, we cannot forget to protect and control the versions of our software. This can be accomplished by using a source control and versioning platform. After the software is ready, it can be deployed on a concrete hardware platform.

However, we are still a few steps ahead of saying that we have finished our job. In fact, new development stages commence: code testing and debugging. Since even short programs can implement complicated concepts, we should have no illusion that we will write the correct version from the very beginning. Therefore, we should also write software that controls what other software is doing. This can be, for instance, in the form of unit tests, which we will also discuss. When problems are spotted in software, it can be run step by step with a debugger . This is yet a third program in our toolchain ; in addition to the line-by-line code execution, it lets us add breakpoints, examine the values of variables, look into memory regions and call stacks, and even examine the registers of a processor, if necessary. Debugging is the best way to learn what code is doing. Therefore, we use debuggers extensively when working on the code examples presented in this book.

Software can have different configurations and versions. Although many configurations are possible, usually there are two distinctive modes:

  1. Debug – A software version that usually is not optimized and is used for “in house” testing and debugging
  2. Release – A stable version of the software after intensive testing, which is fully operational, optimized, and intended to be distributed to users

Let's assume we have attained a stable version of our software, so we can run real computations. In many cases, performance still is not satisfactory due to long execution times or stalls. This can be amended with the help of a profiler . Its role is to measure the performance of software-building components and discover hot spots at different levels. Such reported places in the code – i.e. those that consume the most execution time – can be further optimized with the help of other algorithms or refactoring (i.e. redesigned and reimplemented) into a parallel version, if possible. After that, we can repeat the entire software development process in order to build the software's next, better version.

During the software development cycle, we cannot forget to protect and control the versions of our software. This, in turn, can be accomplished by one of the source control and versioning platforms discussed in the Appendix, Section A.6.2.

After the software is ready, it can be deployed on a concrete hardware platform. A degree of software-hardware (S/H) dependence is a function of software features. That is, some programs can easily run on almost any hardware platform, whereas others may require significant computational power or can be tuned for specific displays, for instance. Hence, if appropriate, such features need to be taken into account during the software development steps.

Thus the software development process resembles a spiral: the same tasks are repeated over and over until a satisfactory solution is obtained. In other words, we operate in cycles with larger and larger diameters. Indeed, we enter a spiral development model, which is a recognized software development strategy used by many companies. If properly organized, such an approach can be highly efficient in practice. In the next section, we will look in more detail at the software development steps.

Last but not least, note that the programming ecosystem in Figure 2.3 shows only one person – but software development and construction usually requires teamwork. The human factor, which frequently is overlooked in books on programming, can have a significant impact on the success or failure of any project. Although this serious issue is out of the scope of this book, smooth operation entails project organization and cooperation (McConnell 2004). To some extent, these are supported by the modular structure of C++ projects, as well as by tools that perform source versioning and maintaining, such as Git and GitHub in Figure 2.3.

2.3 Software Development Steps

Just as there is more than one way to design a house, a bridge, or a car, there is also more than one way to design software. However, in all of these cases, the ultimate criterion indicating success or failure is the performance of the final product. As in other engineering fields, based on years of experience and thousands of projects, some common rules can be recognized that help us avoid common pitfalls in future designs. Such software development steps are depicted in Figure 2.4, and we will take a closer look at each of them.

Schematic illustration of the UML activity diagram presenting the steps of the software development process.

Figure 2.4 UML activity diagram presenting the steps of the software development process.

In this book, we focus mostly on correct implementations in C++. However, there can be no good implementation of a flawed design. Also, even good design and implementation do not guarantee a successful solution if software does not undergo extensive testing. Therefore, although C++ is our main subject, we will also pay attention to other software development steps. Let's describe each of them briefly in the order shown in Figure 2.4:

  1. Requirements specification – A process of defining the main tasks to be solved, as well as functionality and applications of the product. Involves cooperation between many parties, such as end users, system architects, managers, etc. A proper statement of the allowable costs and foreseen deadlines is also essential at this stage (Dick et al. 2017).
  2. Problem analysis – Based on the previous step: the process of establishing the main relations in the system, recognizing the governing rules of the critical system processes, identifying the most difficult components, and sketching the main stages of the critical algorithms with special focus on the most problematic ones. In some cases, this can be one of the most expensive and tedious steps, involving experts from other domains.
  3. Solution design – Based on the previous steps: the process of specifying the architecture of the whole system, considering all of its components and their interfaces. Includes specification of tools and implementation strategies. Frequently, this process involves the preparation of Gantt charts that capture interdependencies between system implementation stages, human resources, deadlines, and milestones.
  4. Implementation – The process of software creation based on the previous steps. The main product of this stage is the source code. The executable can be built, as shown in Figure 2.3.
  5. Testing and debugging – Joint processes of discovering and fixing malfunctioning software. Tests should be well designed using formal methodologies such as test-driven development (TDD) and unit testing (Appendix A.7). However, problems arise when a malfunctioning implementation is due to flaws in the design or other upper levels of the development process. Therefore, the role of the design steps is paramount.
  6. System deployment – The process of system integration and installation to ensure proper operation on a specifically configured hardware platform. This usually requires adjustments to specific hardware components that, for example, are missing or have different parameters (Sommerville 2016).
  7. System maintenance – The practice of ensuring uninterruptable system operation and allowing for system evolution and seamless system changes.

Certainly, these software development steps are not the only possible path to follow in this highly demanding engineering domain; they are simply one of the possible models . Design strategies developed in other domains, such as electronics and architecture, can also be of help in software design. Interestingly, design patterns – which for us are well-established software components with specific characteristics – were first postulated as common patterns encountered in building construction (Alexander 1977).

The previous steps should also be reflected in the project documentation, the structure of which could be the topic of a much longer discussion. Needless to say, such documentation should as precisely and concisely as possible describe all the steps necessary to solve a given problem. In our projects, we will deal mostly with code documentation. In this respect, the principle of self-documenting code has proven useful in many practical realizations. This is based on the assumption that proper comments are added during software implementation. These should succinctly convey the most important concepts and, most importantly, be up-to-date. We will return to this issue in future sections.

Finally, let's stress that the development process shown in Figure 2.4 is just one of the possible processes and does not reflect all of the important aspects of software development, such as the human factor. A design that follows all of these steps is still not guaranteed to be successful. However, as a heuristic finding, not following any design strategy, especially when developing complex systems, is a good recipe for disaster. There are many more comprehensive strategies for IT project development, such as PRINCE2, Scrum, and Agile, to name a few. Further investigation of these subjects is definitely worth the effort; see, for example, Hinde 2018, Cohn 2019, and Sommerville 2016.

2.4 Representing and Running Algorithms

At last, in this section, we will implement a C++ program. It will ask a user to enter a number and, after checking whether it is a positive value, display the number's square root. That is, if 16 is entered, 4 will be displayed, and so on. However, before we go to the C++ code, which is one of the forms of representing algorithms, let's first reflect on what an algorithm is and the ways to express it.

In this section, we will learn the following:

  1. What an algorithm is, and how to represent it
  2. What the Unified Modeling Language (UML) is, and what can be represented with UML activity diagrams
  3. How to write a simple C++ program that computes the square root of a value entered by a user
  4. How to compile and execute simple C++ code using online platforms
  5. How to write a C++ source file, compile it, and run its executable in a Linux environment

2.4.1 Representing Algorithms

An algorithm describes steps for execution. Algorithms are frequently encountered not only in computer science but also in daily life. For example, fire alarm instructions in every office provide the steps to be followed in the event of a fire; selecting a given program on a dishwasher specifies the steps to be performed by the machine to clean our precious glass; etc.

Much depends on the abstraction level at which we are operating. The concept of abstraction in computer science is frequently encountered: when representing algorithms, it simply tells what level of details is necessary. In some cases, especially if we wish to view the entire form of an algorithm, a more general coarse level is preferred, and an image can be used: more specifically, a diagram that uses commonly agreed-on symbols. Such diagrams are used in many domains, such as electric engineering and architecture. In computer science, we are lucky to have a common system of diagrams under the umbrella of UML (see http://uml.org and www.visual-paradigm.com). UML is much more than a simple collection of common symbols – this framework facilitates the entire software development process shown in Figure 2.4. We will return to it many times. However, let's now present the UML activity diagram that represents the steps required by our program, shown in Figure 2.5.

Schematic illustration of the UML activity diagram presenting an algorithm for computing the square root of a value x entered from the keyboard. All actions can be expressed by two symbols: activities and decision blocks.

Figure 2.5 UML activity diagram presenting an algorithm for computing the square root of a value x entered from the keyboard. All actions can be expressed by two symbols: activities (procedures) and decision blocks.

Basically, there are two types of symbols in Figure 2.5. Actions are in rounded rectangles, and decision blocks are in diamonds. Lines ending in arrows show the control flow. Therefore, such diagrams are sometimes called flow charts . Usually, in simple diagrams, the main flow proceeds from top to bottom. But observe that the flow can be easily redirected, forming loops in execution. There are also black circular symbols denoting the beginning and end of an action, respectively. Although simple and straightforward, the ending symbol reminds us about a very important problem in algorithm design: the stop problem. Put simply, this means a well-designed algorithm should have well-defined stop conditions, i.e. it should not hang up forever in an undefined state. Some software, such as operating systems, operates in an indefinite loop; however, it should never find itself in an undefined state or behavior.

Note that in Figure 2.5, the whole program can be seen in one glance. But at the same time, the steps express certain general actions, such as “Compute sqrt(x),” without in-depth details. This is what we call the coarse level or, more generally, a top-down approach. On the other hand, going into detail means entering finer and finer levels: that is, “going down” the design. Usually, this involves more symbols; to avoid clutter, these are usually presented in separate diagrams. Nevertheless, in some tasks, it is possible to start from the details. Such a bottom-up approach is much less useful in our domain and usually results in designs in which upper-level components do not interface well.

Yet another way to present algorithms is in the form of the pseudo-code. This is a mixture of simple computer commands interspersed with more general statements, usually written in the form of human-readable commands or short sentences. Algorithm 2.1 presents this approach.

The ultimate representation comes in the form of computer code, such as that presented in Listing 2.1. This can be translated into a computer-executable format and launched.

The code in Listing 2.1 does what was outlined in Figure 2.5 and in Algorithm 2.1.

2.4.2 Using Online Compilers

Before we explain each line of code in Listing 2.1, we can compile it and run its executable with the help of an online platform with a C++ compiler, as shown in Figure 2.6.2 However, when typing, we may make a mistake, in which case the code will not compile, and the platform will display an error message from the compiler. This should be carefully read to see what the problem is, in order to fix it. But sometimes the precise position of an error can be difficult to determine, and the real culprit may be a few lines above the place indicated by the compiler. So, we should also check the preceding lines.

Snapshot of the web page http://ideone.com online C++ compiler environment running in the Mozilla Firefox viewer with the code from Listing 2.1 (a). The source code has been copied into the editor pane. Under it is the input pane, simulating std::cin. After we click Run, the source is compiled and executed. Results are shown in (b).

Figure 2.6 The http://ideone.com online C++ compiler environment running in the Mozilla Firefox viewer with the code from Listing 2.1 (a). The source code has been copied into the editor pane. Under it is the input pane, simulating std::cin . After we click Run, the source is compiled and executed. Results are shown in (b).

Online compilers are very useful when it comes to testing short code snippets, as frequently shown in this book. However, for larger, multi-file projects, we will frequently use an IDE to build executables.

2.4.3 Structure of a C++ Program

To understand the roles of the various code components, a simplified version of the C++ program has been extracted from Listing 2.1 and is shown and explained in Figure 2.7. A detailed description of the code from Listing 2.1 will be presented in the next section.

A program consists of a series of statements, separated by semicolons ; . Forgetting this ending symbol is one of the most frequent errors in early programming. However, notice that there is no semicolon at the end of the block of the main function, which continues inside a region enclosed by braces {} .

Snapshot of the structure of a simplified C++ program that consists of a main function. Header files containing definitions of other components are brought in by the #include directive. Each statement, such as the
double x {0.0} definition, ends with a semicolon;. However, there is no semicolon at the end of the function block, delineated with {} braces. The std::cin object represents a keyboard, whereas std::cout
represents the screen for output. Inside main, other functions can be called to produce results, such as
std::sqrt( x ), which computes the square root of a value x.

Figure 2.7 Structure of a simplified C++ program that consists of a main function. Header files containing definitions of other components are brought in by the #include directive. Each statement, such as the double x {0.0} definition, ends with a semicolon ; . However, there is no semicolon at the end of the function block, delineated with {} braces. The std::cin object represents a keyboard, whereas std::cout represents the screen for output. Inside main , other functions can be called to produce results, such as std::sqrt ( x ) , which computes the square root of a value x .

2.4.4 Code Analysis

Let's return to the code presented in Listing 2.1. The first line [1] contains a simple comment: that is, text that is intended only as additional information for the programmers. In C++, comments start with two slash symbols // and end at the end of the current line. We will see more such symbols that are composed of multiple single characters – these are necessary due to the limited set of special symbols on the computer keyboard. Lines [3, 4] contain two #include statements, each followed by a file name in <> brackets. These are preprocessor directives, discussed in Appendix A.1, that effectively copy and paste the contents of the indicated file into the current position. This is done to introduce programming components already written by other programmers, which we can easily reuse rather than bothering with self-implementation. Examples are the std::cout object and the std::sqrt function, which we will use in upcoming lines. Fortunately, the whole copy-and-paste is done invisibly when compilation starts, so we do not see any clutter in our code.

In our example, the main function is defined on line [6], and its body – a set of code lines – is contained between the curly braces on lines [7–17]. int in front of main means it is supposed to return a value, but this is irrelevant in this case.

The first object, a variable named x , is defined and initialized on line [8]. We see that it is of the double type. It is used to represent floating-point numbers, which approximate real numbers, as presented in Section 7.4. The curly braces { 0.0 } mean x is initialized with the value 0, which in the floating-point domain is precisely expressed as the 0.0 constant.

Line [10] instructs the computer to display the text "Enter x=" on the screen, i.e. in a window. This is accomplished with the help of the std::cout object, representing the output stream (a screen), and the << operator (again, two characters that together have a new meaning). The latter, in its default version, means a left-shift of bits in an integer; but we will see that in C++, operators can be assigned different meanings, depending on the context they operate in. This is quite a useful feature.

The opposite action is encoded on line [11]: the computer will wait for the user to enter a value of x . This is accomplished with the help of the std::cin object and >> operator, respectively. In effect, just after the user hits the Enter button, x will change its contents to whatever value was typed in.

However, not all values of x can be used to compute a square root. Therefore, on line [13], the condition x >= 0.0 (that is, whether x is greater than or equal to 0.0) is checked with the if-else statement (Section 3.13.2.1). Then there are two options: either the condition is fulfilled, in which case the code from line [14] is executed; or it is false, and the alternative line [16], just after the else part on line [15], is executed.

Line [14] deserves a short explanation. It is a bit more complicated than line [10], since, in addition to text, it displays a series of objects. All of them are separated by the << operator. std::sqrt ( x ) is a call to a function named sqrt , contained in the Standard Library3 – hence the std:: prefix and the #include <cmath>  − with x as its parameter. That is, once we know that x is positive, we call an already-written function that computes the square root for us. The question arises, what would happen if x was negative? It depends, but definitely nothing good. Such a situation is known as undefined behavior (UB), which we must avoid.

The last object sent to the screen on line [14] is std::endl . It instructs the computer (i) to put the caret on a new line and (ii) to flush the output, with the result that all of the symbols will be transferred to the screen. The first action, passing to the new line, can also be achieved by inserting \n to the text constant, as shown on lines [14, 16].

Finally, let's explain the ubiquitous scope resolution operator :: (two colons). It is used to access members of namespaces, such as data and functions, classes, as well as structures, as will be discussed later in the book. Its syntax is shown in Figure 2.8.

For instance, std::cout in the previous code means the cout object from the std namespace. The latter denotes the namespace of the Standard Library.

Schematic illustration of the syntax of the scope-resolution operator :: to access members of namespaces, structures, and
classes.

Figure 2.8 Syntax of the scope-resolution operator :: to access members of namespaces, structures, and classes.

2.4.5 (image) Building a Linux Executable

In this section, we will show how to build an application by typing a few commands in the Linux OS.4 Not surprisingly, actions to perform are also summarized in the form of an algorithm, as follows.

  1. Open a terminal window (in Linux, use the Ctrl + Alt + T shortcut)
  2. Run the ls –l command to list files and directories. The following is some example output:
  3. With the mkdir command, create a new directory, such as Projects, and list its contents:
  4. Go into Projects by calling cd Pro* (to save typing, the special grep character * is used, which substitutes for zero or more characters):
  5. Now type, or copy-paste, the code from Listing 2.1. To do so, we can call one of the installed text editors, such as GNU nano or emacs. However, to create a sq.cpp file, a simple cat command can be used. Just after typing cat > sq.cpp , paste the code from the clipboard (or type it), as follows

    To indicate that we wish to stop redirecting what we type or paste to sq.cpp, press Ctrl + Z to terminate the redirection pipe.

  6. Now we are ready to compile our program by calling the GNU gcc compiler, as follows:

    To make gcc use the C++ compiler, rather than C, the g++ command is invoked, as shown. Its second argument is the source file sq.cpp, after which the –o option and the name of the executable (sq) are provided. However, before doing that, make sure gcc is already installed.

  7. We list our files with ls –l :

    We see that, in addition to our source file sq.cpp, a new sq file appears with an executable x attribute turned on. Congratulations! This means we have successfully created the executable.

  8. Run the program and see its actions, as follows:
  9. If we made an error when typing or simply wish to work further to modify the source, a text editor is a convenient solution. Open it, for instance, by typing this:

    After making any changes and saving sq.cpp, repeat the previous compilation process.

Certainly, some practice in managing the Linux environment is necessary. There are ample online resources (such as www.linux.org, or https://askubuntu.com for Ubuntu users), as well as many books (Dalheimer and Welsh 2006). However, even more fascinating is Linux programming. This has gained further attention in recent years, mostly due to embedded systems powered by Linux. There are also plenty of online resources for this subject, including the outstanding book Linux Programming Interface by Michael Kerrisk (Kerrisk 2010). Since the Linux OS is written in C, examples of Linux programming are also provided with C. To understand them, reading Appendix A.2 will be beneficial. Linux is also a perfect environment for modern C++, as will be shown in this book.

Although Linux is a great operating system with hundreds of useful features and applications, sometimes it is good to have Linux and Windows coexist on a single computer. Windows 10 comes with excellent facilities for Linux programming. First, the Ubuntu and Debian Linux distributions can be downloaded and installed from the Windows Store, and allow us to run the Windows Subsystem for Linux (WSL) like any other Windows application, providing full Linux services at the same time. Second, a Linux development plug-in is available in Microsoft Visual 20195 (MV'19).

The MV'19 environment is also our main development platform. Therefore, unless stated otherwise, the majority of the projects – that is, the whole platform, with all files and settings to create a complete program – presented in this book were first developed and tested with MV'19 in Windows 10. However, the projects were generated automatically with the help of the CMake program. This greatly facilitates maintaining up-to-date projects in environments with multiple operating systems and various compiling platforms, as described in Appendix A.6.1. Organization of C++ projects and the roles of various files are discussed in Section 3.15.

Summarizing, in this section, we have learned the following:

  1. What an algorithm is, and how to represent it with a UML activity diagram, as pseudo-code, or as C++ code
  2. How to implement the main function that constitutes the entry to any C++ program
  3. Defining and initializing a variable of the double type to represent real numbers
  4. Using the std::cin object for input and the std::cout object for output
  5. Using the conditional if-else statement to check the value of a variable and, depending on the result, execute a code path
  6. Editing a C++ source file, compiling it, and running it online, as well as in the Linux environment

2.5 Example Project – Compound Interest Calculator

In this example, we will make a simple program that can facilitate financial investments. Let's assume we have $1000 (the original principal amount), and we want to safely invest it in a bank at an interest rate of 3% per year. It is easy to determine that after a year, we will gain $1000 · 3% = $30, so our final principal amount will be $1030. What happens if we repeat the same thing? Now our initial principal amount is $1030, so after a year, we will have a gain of $1030 · 3% = $30.90. This time we earn slightly more than in the first year, since we started with a slightly higher principal amount. Let's write a C++ program that does these computations for us. As discussed in Section 2.3, these are our requirements. However, before writing a single line of code, we should spend some time on the problem analysis, as shown in Figure 2.4. In this case, we will begin by writing some simple equations that will guide our implementation.

2.5.1 Compound Interest Analysis

Given initial capital denoted as C0, interest rate r, and a number of periods t, the interest is computed as follows:

(2.1)equation

Hence, the new capital after t is C1, which is the sum of C0 and I1. Then C1 can be invested again, and so on. The entire process can be written as follows

(2.2)equation

where r denotes an income rate in the compounding period t. However, banks usually provide the value of r with respect to a yearly period, i.e. 12 months, whereas there can be many compounding periods per year. Hence, we need to be more granular and set as our basic unit the compounding period. In other words, our index i will count the number of fully finished compounding periods in the total investment time. Say we invest for 24 months, and the compounding period is 4 months; in this time, we will have 6 periods. How did we compute this? 24/4 = 6. In general, the periods are i = m/t, where m denotes the total investment time expressed in months. On the other hand, in each period, the investment rate is r/12 · t. For example, if r is 3% and t again is 4 months, then the interest in this period is 0.03/12 · 4 = 0.01. Thus, we obtain the following equation that will serve as the basis for our implementation:

where

  • C0 is the initial capital.
  • Ci is the final capital after m months of investment (i.e. i = m/t compounding periods).
  • r is the annual investment rate (expressed as a fraction, so if provided as a percentage, we need to divide by 100).
  • t denotes the compounding period in months (e.g. for quarterly compounding, t is three months).
  • m is the total investment time in months.

Let's see how this works in practice. Assume we have $2500 to invest, the annual rate is 3.4%, and the investment and compounding periods both are six months. What will our income be? If we put all the data into Eq. (2.3), we obtain C1 = 2500$ · (1 + 0.034/12 · 6)(6/6) ≈ 2542.50. Thus, the gain is C1C0 = $42.50. Unfortunately, in some countries, this gain is taxed, so our investment will be slightly less lucrative.

2.5.2 Implementation of the Interest Calculator

We are almost ready to start our implementation. The only thing we need to know is how to write an equation like (2.3) in C++. Writing expressions is similar, more or less, to writing equations in mathematics. However, we have to keep two obstacles in mind. First, for clarity and to address various terminals, expressions have to fit in a line or lines of plain text. In other words, subscripts and superscripts are not allowed. The second limitation is the number of symbols available to represent operators. Hence, to form some operators in C++, terminal symbols are glued together: += , && , :: , etc. A complete table of C++ operators is presented in Section 3.19.

This is how we write Eq. (2.3) as a C++ expression:

First, all the symbols, such as C_i, r, t, etc. need to be defined – that is, they must have an associated type. In our case, the built-in float or double type should be chosen to best represent fractions. Also, variables need to be initialized. Second, there is no power operator in C++. Therefore, we need to call the function std::pow from the C++ mathematical library. It accepts two arguments and then returns as a result its first argument raised to the power of its second argument. This function comes in a ready-to-use library, but we need to let the compiler know that we are going to use it. Finally, note that for the constant values, instead of 1 we wrote 1.0, and instead of 12 we wrote 12.0. We did so to use the floating-point format and avoid converting from an integer representation.

Now we are ready to start our implementation. The result looks like the following code, which is split into parts for easier reading.

On lines [1] and [2], two header files from C++ libraries are included. Thanks to this, we can use the already mentioned std::pow function, as well as the std::cout and std::cin objects representing the output screen and the input keyboard, respectively. To avoid repeating the std:: prefix over and over again, on line [5], the names of the input-output objects from the std namespace are introduced with the help of the using directive (see Section 3.5).

Line [9] begins defining the main function. C++ applications are required to contain exactly one definition of main . Further details of various forms of main are presented in Appendix A.2.2.

On line [12], we output text on the screen using the cout object, the << operator, as well as a text constant in quotation marks. (We will see that the action of an operator such as << can differ, depending on the types of objects it is used to act on.) endl means to move the cursor to the next line and update the screen.

On line [13], a variable named C_0 is created to hold an initial amount. The names of variables are very important for conveying useful information. So, this variable could be named something like initial_amount . However, when implementing equations, such as Eq. (2.3), it is usually better to keep the names of the equation's elements; and mathematical equations are usually written with single letters and subscripts. The variable (object) C_0 can hold any positive value, including fractions. Because of this, double was chosen for its type. C_0 is created and, on the same line immediately initialized to 0 by typing {} .

Next, on line [14], we wait for the user to enter the real value of the initial capital. After that, we check this value on line [16]. If it is not correct, the program writes out a message and, on line [19], the error code −1 is returned. Checking for incorrect values is one of the very important, but frequently forgotten, actions that need to be performed in real code – always try to stay on the safe side. In C++, there are more advanced mechanisms to deal with errors than returning an error code; these are discussed in Section 3.13.2.5.

This process of printing, defining variables, waiting, and then checking the correctness of the user input data is repeated a few times. This is shown in the following code and should be clear by now:

Having read all the data, we start our computations and define two constants on lines [58] and [59], respectively. Their names convey information about their role. As we already know, adding the keyword const ahead of double makes an object read-only. Omitting const here would not change the computations, but it would harm the code's resistance to unwanted errors. So, if a value should remain constant, we should declare it const .

Given a total number of investment months m and a compounding period t, also in months, line [61] computes a number i of compounding periods. Then, on line [63], the final sum is computed, exactly as in Eq. (2.3). On lines [65–66], the computed values are displayed on the screen. Finally, on lines [69–73], our income and also income after tax deductions are computed and displayed.

2.5.3 Building and Running the Software

We can easily launch the previous code in one of the online C++ compiler platforms, as mentioned in Section 2.4.2. However, building an application that can be launched like any other program in our system is a more convenient option in this case. To build the CompInterest application for Linux, we can follow the steps presented in Section 2.4.5. But because the code in Listing 2.2 uses only standard C++ mechanisms, this software component is rather general. Hence, it could work on any platform if we had the tools to create and maintain multi-platform projects. For each software development environment, we can manually create a suitable project and build the application. However, considering the number of such platforms and operating systems, doing so would be tedious. Instead, we can use the CMake project generator. With its support, not only this but all future projects can be automatically generated to fit different development platforms and various operating systems. All we need to do is install this free tool and write the CMakeList.txt file, which contains commands that tell CMake what to do. (These are described in Appendix A.6.1.) Fortunately, a suitable CMakeList.txt file is attached to the example projects and can be downloaded with the other sources from GitHub (https://github.com/BogCyg/BookCpp). Also fortunately, all of our projects use an almost-identical CMakeList.txt file.

Having built the CompInterest application, to test it, let's assume we have $1000, the annual investment rate is 2.7%, and we wish to invest our capital for four months. After launching CompInterest, our interest can be easily computed as follows:

The previous main function can act as a simple template for similar programs that read data, do some simple computations, and output the result. Also, this project can be used as a startup to build an application with a graphical user interface (GUI ) for a mobile platform or as a web component. We will discuss some of these ideas later.

In this section, we learned:

  1. How to define and initialize integers with int , values with fractions with double , and constant strings with ""
  2. The common arithmetical operators + , , * , / , and how to write simple expressions
  3. How to print text and variables on the screen with the built-in cout object and << operator, and how to enter their values with the cin object and >> operator

2.6 Example Project – Counting Occurrences of Characters in Text

We have already seen how to represent a simple algorithm and how to build it and run it in an online platform, as well as how to create an executable. We have also learned about a few useful features of C++, such as the main function, creating and initializing variables, the input and output objects, as well as the conditional if-else statement . Here, we will extend this list of C++ features by adding the for statement to implement software loops, as well as containers to store objects of the same type. In this section's example, we will show how to compute a histogram of occurrences of the letters az in a sentence.

2.6.1 Problem Analysis and Implementation

Assume that we wish to compute a histogram of occurrences of the letters az in a given sentence. How do we set about this problem? Let's start with a simple example and analyze the process of building such a histogram from the word Alcatraz, as shown in Figure 2.9.

Schematic illustration of building a histogram of characters: a use case analysis. Each bin contains the number of occurrences of a given letter.

Figure 2.9 Building a histogram of characters: a use case analysis. Each bin contains the number of occurrences of a given letter.

A histogram is a data structure composed of N bins, each containing a number of occurrences of an event associated with its bin. Each bin corresponds to a particular letter in our example or other event in general. Thus, histograms are commonly used to measure the frequency with which events occur. They can also be used to measure probability – the higher the frequency, the higher the probability. Initially, all numbers in the histogram are set to 0. Then, when traversing the text to be measured, an occurrence of a given letter increases the corresponding bin by 1. Hence, in our example, the bin that corresponds to the letter a gets the number 3, since there are 3 occurrences of a in Alcatraz, and so on (upper- and lowercase letters are treated the same). The following C++ code performs these actions.

2.6.2 Running the C++ Code with the Online Compiler

Before we explain how this code operates, let's see what it does. How do we run the code? The simplest way is to open one of the web pages with an online C++ compiler, such as Coliru, copy and paste the code, and click the “Compile, link and run…” button. See the screenshot in Figure 2.10. Given the examples from the previous sections, building a standalone application also should not be a problem.

Schematic illustration of a 10 cross 10 grid board to implement the Secret Chamber Game.

Figure 2.10 The Coliru online compiler environment (http://coliru.stacked-crooked.com) running in the Mozilla Firefox viewer with the code from Listing 2.3 copied and pasted into the editor pane. Under it is the output pane. The lower-left pane contains the command line to build and run the program. At lower right are the action buttons.

We obtain the following output:

That is, each letter is associated with the frequency of its occurrence in the input sentence.

2.6.3 Histogram Code, Explained

Let's now split up the code from Listing 2.3 and explain what it does, line by line. On lines [1–3], #include <file_name> is called, where file_name corresponds to a header file. As alluded to earlier, headers contain already-written code that we can use to simplify our lives as programmers. Figuring out what header to choose for a given language component is sometimes difficult – the easiest way is to look it up on the Internet. Here we include iostream for the std::cout and std::cin objects, which represent the screen and the keyboard, respectively. Then the string file is introduced, to use the std::string class to store plain text. Finally, a vector that contains the definition of the std::vector class that allows us to store the histogram is included, as shown in Figure 2.9.

Lines [6–7] contain using directive s. These are optional but let us save on typing, since instead of std::cout we can omit the prefix std:: and succinctly write cout :

The actual program starts on line [10]. A program in C++ is composed of the main function and all the functions it calls from inside. A function is outlined with braces, such as { on line [11] and } on line [34]. main on line [10] returns an object of type int , but we are not using this option here (Appendix A.2.2 presents more about main ).

Now we are inside main . On line [14], the object histogram of the vector type is defined and initialized with two parameters (both forms, std::vector and vector , are OK in this context). The first one reads 'z' − 'a' + 1 . Since 'z' and 'a' are numeric code values of the last and first letters in the English alphabet, subtracting the two and adding 1 gives a total number of 26 letters. Later, we will see how to treat upper- and lowercase letters the same way. Thanks to this, a structure of 26 counters is created, as shown in Figure 2.9. The second initialization parameter for the histogram is 0 , which sets the initial counters to 0. vector is a very useful data structure that can contain a number of other objects of the same type.

On line [16], a string object named in_str is created (we can skip the std:: prefix again). This is also a kind of vector, but it is specialized to store and process vectors of characters (text):

The object in_str is empty at first, but not for long. On line [18], it waits until the user types a sentence in the terminal window and presses the Enter key. This would be sufficient; but since we are running the program in an online compiler that does not easily accept user input, on line [19] we hard-coded some example text to in_str :

Now it is time for some action. We need to access all the letters in the in_str object and increment the histogram at the bin positions that correspond to each letter. So, if the letter is 'a' , then the index should be 0, if 'b' , then the index is 1, if 'z' then 25, and so on. Note that for N elements, the indices go from 0 to N − 1. To access each letter, on line [21], a for loop is created. In it, auto c means c will take consecutive letters of in_str , which is on the other side of the colon : . Thanks to auto , we do not need to bother with the type of c – it will be inferred automatically by the compiler from the type of letters stored in in_str .

Line [23] operates under the supervision of the closest for loop. Hence, for each letter c , we find the corresponding index in the histogram vector by subtracting the code 'a' . Then that entry is increased by 1 by calling the ++ operator. To treat upper- and lowercase letters the same way, each letter c is converted to lowercase by calling the std::tolower function (here we have to explicitly write the prefix std:: since std::tolower has not been listed with using ). To reset the prompt position in the output window to the beginning of the next line, on line [25], the endl object is sent to the screen object cout .

Lines [27–28] show the next for loop. Its purpose is to print all the letters on the screen. However, this time for is composed of three fields, separated with a semicolon ; . Its first part, auto k { 'a' } , is executed only once – it creates an object k with the initial value 'a' . The second part, the condition k <= 'z' , is checked during each iteration; if it is fulfilled, line [28] is executed, which simply prints a letter represented by k , followed by a space. Just after that, the third part of the for loop is executed. In our example, this is the ++ k expression, which advances k by 1: that is, to the next letter code. Then, the loop repeats from the point of condition checking. However, if at any iteration the condition is not fulfilled, i.e. it evaluates to false , the loop breaks, and the next line of code just after the loop is executed. If more than one statement needs to be executed at each iteration of the loop, the statements can be grouped into a block with braces {} .

Finally, the loop on lines [32–33] displays computed values of the histogram . Again, the first form of the for loop is used, so at each iteration, h gets the value of a consecutive element stored in the histogram object.

That's it. How can we use this code further? Just copy it, change something, and play. And keep reading, since all the constructs used here will be explained in the following chapters.

In this section, we learned the following elements of C++ programming :

  1. How to implement a simple C++ program
  2. What the int main( ) function is used for
  3. What header files to #include , and why
  4. What using directive s need to be placed at the beginning of a program
  5. How to define a simple array of elements with std::vector
  6. How to input text from the keyboard, and how to store it in the std::string object
  7. How to automate object type deduction with the auto keyword
  8. How to implement a loop with the for statement
  9. How to check logical conditions with the if statement
  10. How to call predefined functions for text manipulation, such as std::isalpha and std::tolower

In the following chapters, we will polish these techniques in further examples.

2.7 Summary

Questions and Exercises

  1. The greatest common divisor (GCD) of two positive integers a and b is the largest integer that divides each of them without any remainder. The solution to this provides the Euclidean algorithm (https://en.wikipedia.org/wiki/Euclidean_algorithm):
    1. On a sheet of paper, work out an example: for instance, setting a = 255 and b = 221 .
    2. Implement Algorithm 2.2. Hint: To implement the loop use the statement while (b != 0) , or for( ; b != 0 ; ) { loop_statements 2-4 } .
    3. Verify the operation of your program by setting different values of a and b .
  2. Analyze the following bisection algorithm for computing an approximation of the square root from an integer value.
    1. Write an implementation of Algorithm 2.3.
    2. Compare the returned values with those produces by the code in Listing 2.1.
  3. The bisection algorithm has other applications. It is helpful in search tasks over monotonic values (https://en.wikipedia.org/wiki/Bisection_method). Use the bisection algorithm to find the root of a function (i.e. its zero-crossing point).
  4. There can be problems when entering values using the std::cin object. For example, on line [14] of Listing 2.2, a numeric value is expected, but the user can enter abc . To prevent such errors, the status of the reading operation can be checked as in the following code snippet:

    This works because the cin >> C_0 expression converts to a bool value of true or false , depending on whether the operation succeeded or not, respectively. Then, the ! operator negates this result, so if false was returned, the code inside if is executed, and the program terminates.

    Using this method, update the value-reading lines of code in Listings 2.2 and 2.3.

  5. Refactor the CharHistogram source code in Listing 2.3 to print the histogram as vertical bars composed of the * symbol, rather than numbers of occurrences.
  6. A second-order polynomial can be expressed as follows

    where a, b, and c denote real coefficients. The question is whether there are values of x that fulfill this equation. If they exist, they are called roots of the second-order polynomial. Recall that to answer this question, it is sufficient to compute the delta coefficient d as follows:

    (2.5)equation

    If d ≥ 0, we can compute the solution to Eq. (2.4) (its roots) as follows:

    (2.6)equation

    Otherwise, there are no roots. Write a C++ program that asks the user for the three coefficients a, b, and c; resolves whether there are roots; and, if possible, computes them. First rewrite the previous equations into equivalent C++ expressions, and then adapt them.

  7. Implement the Secret Chamber game – its 10×10 grid board looks as follows.
    image

    You can move a player ‘P’ up, down, left, or right but only one position at a time (e.g. by pressing the buttons ‘U’, ‘D’, ‘L’ and ‘R’). You win if you step ‘P’ onto the treasure ‘T’. However, you lose if you step onto one of the traps ‘X’. Moreover, at each turn ‘X’ can randomly change its position by one spot. Pay attention not to step outside of the game board range.

  8. Compute a sum of two positive integer values. However, the sum to compute is given as a string, e.g. “123 + 37”, “78+ 99”, etc. Write a program that performs the following steps:
    1. Prompt a user to enter a string;
    2. Eliminate extra spaces from the string;
    3. Extract the two substrings representing the two arguments, such as “123” and “37”;
    4. Compute integer values of these two substrings, so “123” is transformed to the value 123, etc.
    5. Print the sum of the two values;
    6. Extend your solution to handle potential errors.
  9. There are many ways to compare texts. A simple one is to compare the two histograms of their letter occurrences. Two histograms of the same length can be compared by summing up squares of differences of the corresponding bin values.
    1. Extend the program from Listing 2.3 to compute and compare histograms of two texts.
    2. Test your procedure comparing few paragraphs.
    3. Better results will be obtained if the histograms are normalized before the comparison. After normalization, the sum of all bins in a histogram is 1.0. To normalize the histogram, divide each its bin value by the total sum of all bins in that histogram. Hint: make the histogram vector to store floating-point values, e.g. in line [14] of Listing 2.3 change the initialization value from 0 to 0.0.

Notes

  1. 1  The second distinct type is the Harvard architecture, in which memory is strictly divided into code memory and data memory. Such architectures are sometimes used in digital signal processors (DSP) to implement the single instruction, multiple data (SIMD) strategy at the cost of additional signal buses. See https://en.wikipedia.org/wiki/Von_Neumann_architecture.
  2. 2  For example, to quickly evaluate short programs, we can use one of the following platforms: http://ideone.com, https://wandbox.org, http://coliru.stacked-crookd.com, https://repl.it, www.onlinegdb.com, or http://cpp.sh.
  3. 3  For more on the Standard Library, see https://en.wikipedia.org/wiki/C%2B%2B_Standard_Library and https://en.cppreference.com/w/cpp/header.
  4. 4  In this example, we use Ubuntu 18.04 installed as a Linux bash in Windows 10. However, it can be any distribution of Linux with gcc installed (see, for instance, https://linuxize.com/post/how-to-install-gcc-compiler-on-ubuntu-18-04).
  5. 5  For more information, see https://devblogs.microsoft.com/cppblog/linux-development-with-c-in-visual-studio and https://devblogs.microsoft.com/cppblog/c-with-visual-studio-2019-and-windows-subsystem-for-linux-wsl.