T esting is a dynamic analysis verification and validation (V&V) method. ISO/IEC/IEEE Systems and Software Engineering—Vocabulary (ISO/IEC/IEEE 2010) defines testing as “the process of operating a system or component under specified conditions, observing or recording the results, and making an evaluation of some aspect of the system or component.”
One of the objectives of testing is to find yet undiscovered, relevant defects in the software. Relevant defects are defects that will negatively impact the stakeholders. Both software developers and software testers are responsible for the quality of the delivered software. It has to be a team effort. The job of a software developer is to prevent defects in the product and to build quality into the software products. Software developers are also responsible for V&V activities early in the life cycle, including static analysis techniques discussed in Chapter 20 . As the software moves into testing, however, a mental shift must occur. While software developers make the software work, testers do everything they can, to find all the ways possible, to “break” the software. Even if the developers are also doing testing, they are switching to the tester role and must make this shift to the tester mentality. Testers are successful every time they uncover a yet undiscovered, relevant defect, because that means one less defect will make its way to the users of the software and impact the stakeholders’ satisfaction with that product (assuming that any uncovered defects are corrected before release).
Given that testing, like all other development activities, has a limited amount of resources (time, money, people, other resources), the goal of the tester is to select a set of test cases that are most likely to uncover as many different relevant defects as possible within those constraints. Therefore, the tester is part detective, hunting for clues as to how the software might fail. The tester is part amateur psychologist, trying to look into the mind of the programmer to figure out what kinds of mistakes they might have made when developing the software. The tester is also part psychic, trying to predict how the users might use the software inappropriately or in ways not accounted for by the software being tested. Testers must understand how interfacing entities (hardware, other software applications, and databases) might fail, have their communications corrupted or interrupted, or interface differently than defined in the software requirements.
The actual act of testing, a dynamic analysis technique, requires the execution of a software system or one of its components. Therefore, test execution can not be performed until executable software exists. With the exception of prototypes and simulations, this means that, in Waterfall life cycle models, most test execution starts during or after the coding phase of the software development life cycle. However, while test execution can not start until these later phases, test planning and design can and should be done as early as possible in the life cycle, as illustrated in Figure 21.1 . In fact, test planning and design can start as soon as the software requirements are identified. Simply designing test cases (even without actually executing them) can help identify defects in the software work products, and the earlier those defects are discovered, the less expensive they will be to correct. The V-model, discussed in Chapter 9 and illustrated in Figure 9.2 , highlights the relationship between the testing phases and the products produced in the early life cycle phases:
Testers are also excellent candidates for participating in peer reviews. This can be especially true if the work product being reviewed will provide the basis for the test design. For example, a tester who will write test cases to validate the requirements should participate in the peer review of the requirements specification, and the integration tester should participate in the peer review of the architectural design. The testing effort must be ongoing and managed throughout the software life cycle.
In agile, similar test activities occur, but they are conducted iteratively throughout software development. In fact, in test-driven development, the test cases are created from the stakeholder level requirements and take the place of product-level software requirements. Testers also participate in peer reviews, as appropriate. As source code is written, it is continuously integrated into the software executable and tested.
Figure 21.1 Test activities throughout Waterfall life cycles.
1. TEST STRATEGIES
Select and analyze test strategies (e.g., test-driven design, good-enough, risk-based, time-box, top-down, bottom-up, black-box, white-box, simulation, automation, etc.) for various situations. (Analyze)
BODY OF KNOWLEDGE VI.B.1
There are many different strategies that can be used when testing software. Each of these strategies has its own strengths and weaknesses, so a combination of different strategies is typically the best approach for uncovering a variety of defect types.
White-Box Testing
White-box testing , also known as structural, structure-based, clear-box , or glass-box testing , is testing that is based on the internal structure of the software and looks for issues in the framework, construction, or logic of the software. White-box testing is typically executed early in the testing cycle, starting at the module-level, and may also be performed as source code modules are combined into components. White-box testing explores the internals of each source code module/component, looking at the control flow and/or data flow through the individual lines of source code.
White-box testing can find certain types of control or logic-flow defects that are all but invisible using a black-box testing strategy. It is also much easier to thoroughly investigate a suspicious source code module/component when the tester can clearly see into its internal structure. Early structural testing allows the tester to more easily trace defects back to their origin by isolating them to individual source code modules or paths through that code. Other strengths of white-box testing include the ability to look at:
Gray-Box Testing
When larger numbers of source code modules/components are integrated together, it usually becomes unwieldy to perform pure white-box testing. In any complex system, there are just too many possible paths through the software. As illustrated in Figure 21.2 , at some point testing typically progresses into various levels (shades) of gray-box testing, which is a blending of the white-box and black-box testing strategies. During development, source code modules/components are integrated into programs, programs into subsystems, and subsystems into the software system (note that the number of levels of integration can vary based on the needs of the project). At the lowest level of gray-box testing, the individual source code modules/components are treated as gray boxes, where the tester peeks into the internals of each source code module/component just enough to determine how they interact and interface with each other, ignoring the rest of the internal details. At each subsequent level of integration, the individual programs/subsystems are treated as gray boxes, where the tester peeks into their internals just enough to determine how those programs/subsystems interact and interface.
There are two basic strategies for conducting gray-box testing. The first strategy, called top-down , is illustrated in Figure 21.3 at the source code module to component integration level. The top-down strategy tests one or more of the highest-level software items (source code modules, components, programs, subsystems) in the tree, using stubs for lower-level called software items that have not yet been integrated (or developed). These stubs simulate the required actions of those lower-level software items. The lower-level software items and their stubs are then integrated, one or more at a time, replacing these stubs. Gray-box testing focuses on the interfaces of the newly integrated software items as they are added, and their interactions with the rest of the already integrated software. The top-down strategy is typically chosen when:
Figure 21.2 White-box testing and the transition to gray-box testing.
The second gray-box testing strategy is called bottom-up , as illustrated in Figure 21.4 . This strategy tests one or more of the lowest-level software items in the tree using drivers for higher-level calling software items that have not yet been integrated (or developed).
Figure 21.3 Top-down testing strategy.
Figure 21.4 Bottom-up testing strategy.
These drivers simulate the required actions of those higher-level software items. The higher-level software items and their drivers are then integrated one or more at a time, replacing the drivers. Testing again focuses on the interfaces to the newly integrated software items as they are added, and their interactions with already integrated software. The bottom-up strategy is typically chosen when:
Selection of an integration testing strategy depends on software characteristics, and sometimes project schedule. Different approaches can also be used for different parts of the software. For example, some parts of the software can be integrated using a bottom-up strategy, and other parts using a top-down strategy. The resulting software items could then be integrated using a top-down strategy.
Black-Box Testing
Black-box testing , also known as specification-based, data-driven, input/output-driven , or functional testing , ignores the internal structure of the software and tests the behavior of the software from the perspective of the users. Black-box testing is focused on inputting values into the software, under known conditions and/or states, and evaluating the resulting software outputs against expected values, while treating the software itself as a black-box. When the software is treated as a black-box, its internal structure is not considered when designing and executing the tests. This helps maintain an external focus on the software requirements and stakeholder needs.
Since black-box testing does not require an intimate knowledge of the software’s internal structure, individuals other than those with knowledge of the source code can perform black-box testing. That allows testing to be performed by people who are independent of the developers and know nothing about how the software was designed. Black-box testers may not even know how to read/interpret the coding language used to write source code. Developers have a natural bias and will test the software based on the same set of assumptions they used to create that software. Independent testing allows for more objectivity and a different perspective that can result in more defects being identified. Developers are also focused on making the software work. It is easier for an independent tester to focus on how to “break” the software.
The entire assembled software system is often too large and complex to test using pure white-box testing strategies. Black-box testing can be used to test the entire system, as well as its smaller individual source code modules and components, as illustrated in Figure 21.5 . Black-box testing is also capable of finding defects that are very difficult or impossible to identify using a white-box strategy. For example:
Figure 21.5 Black-box testing.
In fact, many of the strengths of black-box testing are weaknesses of white-box testing, and many of the strengths of white-box testing are weaknesses of black-box testing. Therefore, the best strategy is usually a balancing of both approaches.
Test-Driven Design
Test-driven design (TDD), also called test-driven development , is an agile iterative software development methodology. As discussed in Chapter 9 , TDD implements software functionality based on writing the test cases that the code must pass. Those test cases become the product-level requirements used as the basis for implementing the code. Those test cases are then run often to verify that additional changes have not broken any existing capability (regression testing). While TDD obviously has a testing component, it is in no way limited to being a testing strategy. TDD addresses the entire software development process.
TDD requires the maintenance of a set of automated test cases, written to exhaustively test the existing code before it goes to the independent testers for their testing (and potentially into operations). As illustrated in Figure 21.6 , the software is considered in the “green” state when it passes all of these automated test cases. When new functionality is identified, the first step in TDD is to write test cases for that new functionality. Since code does not yet exist to implement these test cases, they fail, and the software goes into the “red” state. The developer then writes just enough code to pass these new test cases and move the software back to the “green” state. According to Astels (2003), “that means you do the simplest thing that could possibly work.” Refactoring is then done on the changed code, as necessary, to improve the quality of that new code. If changes made during the refactoring cause any of the automated test cases to fail, the code is corrected until the software returns to the “green” state (all test cases pass).
Figure 21.6 Test-driven design model—example.
Risk-Based Testing, Good-Enough Software and Time-Box Testing
According to Myers (2004), “it is impractical, often impossible, to find all the errors in a program.” One reason for this is that exhaustive testing is impossible. For example, consider the black-box testing of a function that accepts a positive integer as one of its inputs. How many possible test cases are there? To exhaustively test this single input there are literally an infinite number of possible test cases. First, there are an infinite number of positive integers to test. There is always the possibility that inputting the number 3,256,319 will cause some strange anomaly to happen. Then there are also an infinite number of invalid inputs that could be tested, including negative numbers and non-numbers (character strings including one or more alpha or other nonnumeric characters). Since exhaustive testing is impossible, then all testing in the real world is based on sampling. Answering the question “How much sampling is needed?” leads to the concepts of risk-based testing, good-enough software and time-box testing.
Risk-based testing focuses on identifying software items (for example, work products, product components, features, functions) with the highest risk (see the discussion of risk-based V&V in Chapter 20 ) or that require the highest integrity levels. Limited testing resources are then proportionally distributed to spend more resources testing higher-risk/integrity items and fewer resources testing lower-risk/integrity items. Risk-based testing also embraces the “law of diminishing returns.” At the start of testing a large number of problems and anomalies are discovered with little effort. As testing proceeds, discovering additional problems and anomalies requires more and more effort. At some point, the return on investment in discovering those last few defects, if they even exist, is outweighed by the cost of additional testing.
The concept of good-enough software recognizes the fact that not all software applications are created equal and not all software applications, or even all components within a single software application, have an equal amount of risk. For example, word processing software does not require the same level of integrity as banking software, and banking software does not require the same level of integrity as biomedical software that impacts a person’s health or even their life. Doing good-enough software analysis has to do with making conscious, logical decisions about the trade-offs between the level of quality and integrity that the stakeholders need in the software, and the basic economic fact that increasing software quality and integrity will cost more and take longer. This is not meant to discount the impacts of long-term continual improvement initiatives—but right now, on today’s project, with current skills and capabilities, this is reality. Hard business decisions need to be made about how much testing (and other V&V activities) a project can afford, what the stakeholders are willing to pay for it and what the acceptable level of risk is that the developers, acquirers and other stakeholders are willing to accept.
In project management, a time-box is a fixed amount of calendar time allocated to complete a given task, like testing. In time-box testing, the calendar time for testing is fixed, and the scope of the testing effort must be adjusted to fit inside that time-box. This can be accomplished by prioritizing test activities and tests based on risk and benefit, and then executing the activities and/or tests in priority order. If time runs out before all of the activities and tests are accomplished, at least the lowest-priority ones are the ones left unfinished.
Simulation
The test environment often can not duplicate the real-world environment where the software will actually be executed. For example, in the test bed, the testers performing capacity testing for a telecommunications switch do not have access to 10,000 people all calling in to the switch at once. One way of solving this gap is to create a simulator that imitates the real world by mimicking those 10,000 people. Another example of when simulators might be needed is when other interfacing software applications or hardware are being developed in parallel with the software being tested and are not ready for use, or are unavailable for use, during testing.
Since a simulator only mimics the other software and hardware in the real-world environment, strategic decisions need to be made about when it is appropriate to spend testing resources creating simulators and utilizing simulation. These are again risk-based decisions because:
Test Automation
Test automation is the use of software to automate the activities of test design, test execution, and the capturing and analysis of test results. Strategic decisions here involve:
Automated tests can typically be run much more quickly and therefore more often, than manual tests, providing increased visibility into the quality of the software at any given time. This can be particularly beneficial during regression testing, especially with iterative or agile development life cycles. Automation can also eliminate human error when comparing large amounts of output data with expected results.
As a trade-off, however, test automation requires an initial investment in tools and resources to create the automation, and a long-term investment to maintain the automated test suites as the software being developed changes over time. For software items that are expected to change frequently, more time may be spent maintaining the test automation than manual testing would require. Test automation also requires a different skill set than testing itself. Therefore, additional staff or additional training for existing staff may be necessary to implement test automation.
Finally, test automation has limitations, in that it can only compare test results with a finite set of expected results. Since automated tests only look for what they are programmed to check, they can not identify other anomalous or unexpected behaviors that may result from software defects. For example, if the input to the test case is a number and the expected output is the square root of that number, then an automated test case would check that the square root value is correct for the selected input value. However, the automated test case would not identify that the background screen erroneously changed colors before it displayed the square root value. A human tester, conducting manual testing, would probably catch this type of anomaly easily.
As with other testing strategies, a mix of manual and automated testing strategies may provide the most effective approach to testing.
2. TEST PLANS
Develop and evaluate test plans and procedures, including system, acceptance, validation, etc., to determine whether project objectives are being met and risks are appropriately mitigated. (Create)
BODY OF KNOWLEDGE VI.B.2
V&V Plan
Test planning starts with the overall V&V plans as discussed in Chapter 20 , where decisions are made about the levels of testing that will be done, the testing objectives, strategies, and approaches that will be used at each level, and the allocation of testing staff and other resources. V&V planning determines what V&V activities, including testing, will be used to evaluate the implementation of each requirement. The V&V plan is one of several test planning documents, as illustrated in Figure 21.7 .
Test Plans
For each of the major testing levels (for example, integration testing, system testing, acceptance testing, beta testing), additional test plans may be created to refine the approach and define additional details for that level. Each test plan defines the specific strategies and tactics that will be used to make certain that project/program objectives associated with that level of testing are being met, and that the requirements that are to be verified and/or validated at that level (as allocated by the V&V plan) are adequately tested. Each test plan may be a separate document, or the test plans can be incorporated as part of:
Figure 21.7 Types of testing documentation.
The test planning process includes developing the test plans, and reviewing and approving that plan. These test plans may be separate stand-alone documents or be included in the V&V plan or other planning document as appropriate to the needs of the project. According to IEEE Standard for Software Test Documentation (IEEE 2008a) and ISO/IEC/IEEE Software and Systems Engineering—Software Testing—Part 3: Test Documentation (ISO/IEC/IEEE 2013), test plans should include:
Test Design Specification
A test design specification is a further refinement of a test plan and is only needed if the complexity of the system is high or there is a need for additional levels of information. According to IEEE (IEEE 2008a), a test design specification is an optional document that defines the features to be tested, the approach refinements, test identification, and feature pass/fail criteria.
3. TEST DESIGNS
Select and evaluate various test designs, including fault insertion, fault-error handling, equivalence class partitioning, boundary value. (Evaluate)
BODY OF KNOWLEDGE VI.B.3
Test planning activities include performing requirements analysis to determine the approach needed to evaluate the implementation of each requirement. Test planning also includes performing risk, criticality, and hazard analysis to determine how extensively to test each requirement. Based on this analysis, test design is performed to then design, document, and verify test cases and procedures for each requirement. Test design also includes performing traceability analysis to confirm the completeness of the set of tests. Finally, test design activities may include the automation of tests that will be repetitively executed.
There are techniques the tester can use during test design to systematically minimize the number of tests while maximizing the probability of finding undiscovered relevant defects.
Equivalence Class Partitioning
Equivalence class partitioning is a testing technique that takes a set of inputs and/or outputs and divides them into valid and invalid sets of data that are expected to be treated in the same way by the software. This allows the tester to systematically sample from each equivalence class, also called a partition, during testing, using a minimum set of tests while still achieving coverage. The assumption is that if one sampled value from an equivalence class uncovers a defect, then any other sample from the same equivalence class would probably catch the same defect. If the sampled value does not catch a defect, other samples probably would not catch it either. Rules of thumb when selecting equivalence classes include:
Test cases and procedures should be written generically for each equivalency class (input a number from 1 to 100, or input a number less than 1) so that the tester selects the sample at the time of test execution. Testers then strive to select a different sample from the equivalence class with every execution of the test case. If test execution is automated, this may require the use of a random number generator or some other mechanism for varying the input with each execution.
Boundary Value
Boundary value analysis explores the values on or around the boundaries of the equivalence classes or for each input or output. This technique will help identify an error where the programmer uses < instead of ≤, uses > instead of ≥, or forgets or mishandles the boundary in some other way. In boundary value testing, tests are created to test values at the boundaries, the minimum boundary value, and the maximum boundary value, and values just below the minimum and just above the maximum. For example, for the inputs in the range of 1 to 100, boundary value tests would include the inputs of 0, 1, 100, and 101.
Another example of boundaries would be to consider an input dialog box, as illustrated in Figure 21.8 . Assuming that the box displays up to 25 characters and that it is a scrolling box that actually allows an input string of up to 255 characters, boundary value testing would test the null input, a single character, 25 characters, 26 characters, 255 characters, and 256 characters. There may also be invisible boundaries for this input, for example, the input buffer size used by the operating system that the tester might want to explore.
Other boundaries to consider when writing tests include:
Figure 21.8 Input field boundary—example.
Fault Insertion
Fault insertion, also called fault seeding or defect salting, is a mechanism for estimating the number of undetected defects that escaped from a testing process. Fault insertion literally intentionally inserts known defects into the software (typically done by people independent of either the developers or testers). The software is then tested and the defects that are found are analyzed to determine if they are part of the intentionally inserted, known defects or if they are new, unknown defects. The number of unfound, unknown defects is then estimated through the use of the simple ratio:
For example, assume 100 known defects were inserted into the software prior to system test. The software is tested and 72 of the inserted defects were found and 165 other defects were found. That means that 28 (100 minus 72) of the inserted (known) defects were not found during system test. The number of unfound, unknown defects (UUD) can be estimated:
This technique assumes that the known defects are being found at the same rate as the unknown defects. Care must be taken to “seed” the software with a representative set of defects for this assumption to hold true. For example, if only logic errors are inserted into the code, then only logic type errors can be reasonably analyzed and predicted.
Good configuration management is required to make certain that the software, prior to fault insertion, can be retrieved for use moving forward. If not, care must be taken when removing all of the known defects so that all known defects are removed and so that new defects are not introduced by mistake. To verify this, regression testing must be performed after the completion of the removal activities.
Fault-Error Handling
Testing techniques that examine exception handling , also called fault-error handling or negative testing , look at how well the software handles errors, invalid conditions, invalid or out-of-sequence inputs, and invalid data.
Testing should check to confirm that exception handling is triggered whenever an exception occurs. Exception handlers should allow graceful software recovery or shutdown if necessary. They should prevent abandonment of control to the operating system (obscure crashes). Testing should also check that any reserved system resources (for example buffers, allocated memory, network channels) are appropriately released by creating tests that produce multiple exceptions to see how the software handles these repeated exception conditions.
One technique for selecting fault-error handling tests is called error guessing. The main idea behind error guessing is to “attack” the software with inputs that expose potential programmer mistakes or represent the kinds of invalid inputs that the software would receive if user or hardware mistakes occurred. It is hard to specifically describe this technique since it is largely intuitive and experience-based. One technique for error guessing is to brainstorm probable types of faults or errors that might occur, given the specific software. Another technique, is to create and use checklists of common errors in past software products. For example:
Cause-Effect Graphing
Cause-effect graphing (CEG) is a model used to help design productive test cases by using a simplified digital-logic circuit (combinatorial logic network) graph. Its origin is in hardware engineering but it has been adapted for use in software. The CEG technique takes into consideration the combinations of causes that result in effecting the system’s behavior. The commonly used process for CEG can be described in six steps.
Step 1 : In the first step, the functional requirements are decomposed and analyzed. To do this, the functional requirements are partitioned into logical groupings, for example, commands, actions, and menu options. Each logical grouping is then further analyzed and decomposed into a list of detailed functions, sub-functions, and so forth.
Step 2 : In the second step, the causes and effects are identified. The first part of step 2 is to identify the causes and assign each cause a unique identifier. A cause can also be referred to as an input, as a distinct input condition, or as an equivalence class of input conditions. Examining the specification, or other similar artifact, word-by-word and underlining words or phrases that describe inputs helps to identify the causes. Causes (input conditions) are events that are generated outside an application that the application must react to in some fashion. Examples of causes include hardware events (for example, keystrokes, pushed buttons, mouse clicks, and sensor activations), API calls, and return codes.
The second part of step 2 is to identify the effects, and assign each effect a unique identifier. An effect can also be referred to as an output action, as a distinct output condition, as an equivalence class of output conditions, or as an output such as a confirmation message or error message. Examples of outputs include a message printed on the screen, a string sent to a database, a command sent to the hardware, and a request to the operating system. System transformations, such as file or database record updates are considered effects as well. As with causes, examining the specification, or other similar artifact, word-by-word and underlining words or phrases that describe outputs or system transformations helps to identify the effects.
For example, Table 21.1 lists the causes and effects for the following example set of requirements for calculating car insurance premiums:
Table 21.1 Causes-and-effects—example.
Causes (input conditions)
Effects (output conditions or system transformations)
  1. Sex is male
  2. Sex is female
  3. Age is < 25
  4. Age is ≥ 25 and < 65
  5. Age is ≥ 65
  6. Sex is not male or female
  7. Age is < ?
  8. Age is > ?
  1. Premium is $1000
  2. Premium is $3000
  3. Premium is $1500
  4. Premium is $500
  5. Invalid input error message
Listing the causes and effect can help identify the completeness of the requirements and identify possible problem areas. For example, these requirements do not list a minimum or maximum age, or what should happen if an entry other then “male” or “female” is entered into the sex field. The tester can use these potential problem areas to design test cases to make certain that the software either does not allow the entry of invalid values or responds appropriately with invalid input error messages. Of course, the tester should also ask the requirements analyst about these issues when they are identified so that potential problems can be corrected as soon as possible in the product life cycle.
Step 3 : In the third step, cause–effect graphs are created. The semantic content of the specification is analyzed and transformed into Boolean graphs linking the causes and effects. These are the cause–effect graphs. It is easier to derive the Boolean function for each effect from their separate CEGs. Table 21.2 illustrates the individual cause- effect graphs from the example list of requirements.
Step 4 : In the fourth step the graphs are annotated with constraints describing combinations of causes and/or effects that are impossible because of syntactic or environmental constraints or considerations. For example, for the purpose of calculating the insurance premium in the above example, a person can not be both a “male” and a “female” simultaneously, as illustrated in Figure 21.9 . To show this, the CEG is annotated, as appropriate, with the constraint symbols shown in Table 21.3 .
Table 21.2 Cause-and-effect graphs—example.
Figure 21.9 Cause-and-effect graph with constraints—example.
Table 21.3 Cause effect graph constraint symbols—example (based on Myers [2004]).
Step 5 : In the fifth step, state conditions in the graphs are methodically traced and converted into a limited-entry decision table. The ones (1) in the limited-entry decision table column indicate that the cause (or effect) is true in the CEG and zeros (0) indicate that it is false. Table 21.4 illustrates the limited-entry decision table created by converting the CEG from the “calculating car insurance premiums” example. For example, the CEG #1 from step 3 converts into test case column 1 in Table 21.4 . From CEG #1, causes 1 and 3 being true result in effect b being true.
Some CEGs may result in more than one test case being created. For example, because of the one and only one constraint in the annotated CEG #3 from step 4, this CEG results in test cases 3 and 4 in Table 21.4 .
Step 6 : In the sixth step the columns in the decision table are converted into test cases, as illustrated in Table 21.5 . It should be noted that the example used above to illustrate the basic steps of CEG was kept very simple. An astute software developer could probably jump right to this set of test cases from the requirements without using the CEG method. However, for large, complex systems with multiple causes (inputs) and effects (outputs or transformations) this method is a systematic way to analyze them to create test cases. If CEG is performed early in the project, it can help in developing and verifying the completeness of the specification.
Table 21.4 Limited-entry decision table—example.
Table 21.5 Test cases from cause-effect graphing—example.
4. SOFTWARE TESTS
Identify and use various tests, including unit, functional, performance, integration, regression, usability, acceptance, certification, environmental load, stress, worst-case, perfective, exploratory, system. (Apply)
BODY OF KNOWLEDGE VI.B.4
Levels of Testing
There are various levels of software testing that can be done, as illustrated in Figure 21.10 . These different levels of testing are known by different names in different organizations. Even the number of levels of testing may vary from organization to organization. The discussion below should be considered as an example of the different levels of testing and include some of the more common names for these levels.
At the lowest level, during unit testing , also called module testing or component testing , the individual source code modules that make up the software product are tested separately. A source code module is the smallest software element that can be separately compiled. The author of a source code module usually performs this unit testing. One of the advantages of unit testing is fault isolation. Any problems that are found during unit test execution are isolated to the individual source code module being tested. Unit testing is typically the earliest level of test execution and therefore identifies defects earlier, when they are less expensive to fix, than if they were found during later levels of test execution. At the unit test level it is also easier to confirm that every line of code or every branch through the source code module is tested during test execution. An analogy for unit testing is to make sure that the individual bricks are good before the workers start building the wall out of them.
Figure 21.10 Levels of testing.
During software integration , aggregating two or more source code modules or other components (constituent configuration items) are used to create an integrated software component (composite configuration item). With large software development projects, integration testing often includes software from many developers. Integration testing may be accomplished in several levels, for example:
Depending on the organization and the size of the project, integration testing may be performed by the developers or by a separate independent test team. The focus of integration testing is on the interfaces and interactions between the integrated components or source code modules. Since the individual source code modules are already tested at the module-level, integration testing can maintain this focus without the distraction of an excessive amount of module-level defects. To continue the analogy used above, integration testing focuses on making sure the mortar between the bricks is good and that the wall is solid as it is being built.
Software system testing , also called software qualification testing , is testing conducted on a complete, integrated software application or product to evaluate the software’s compliance with its specified requirements.
During system integration , if the software is part of a larger system, there may be other levels of integration testing where the software, hardware, and other system components are integrated into the system and testing focuses on the interfaces and interactions between those system components.
System testing , also called system qualification testing , is “testing conducted on a complete, integrated system to evaluate the system’s compliance with its specified requirements” (ISO/IEC/IEEE 2010). The system may be an aggregation of software, hardware, or both that is treated as a single entity for the purpose of system testing, as well as operational use after release. To finish the analogy, system testing focuses on the functionality and quality of the entire house.
If the system is part of a system of systems, yet another level of integration and testing may occur followed by system of system testing and so on as needed.
Acceptance testing , also called user acceptance test (UAT) , is a special type of testing performed by or for the acquirer of the software/system to demonstrate that the as-built product performs in accordance with its acceptance criteria. Typically, the acceptance test criteria are defined in a set of acceptance tests and agreed to as part of the contract, or other early agreement between the development organization and the acquirers. Acceptance tests are designed to provide a mechanism for formal acceptance of the software as it transitions into operations.
Certification testing is also a special type of testing that is typically done by a third party (an organization other than the supplier or acquirer of the software/system). Certification test criteria and any standards that the software/system must meet, should be defined and agreed to in advance. Contractual agreements between the supplier and acquirer may call for certification testing in place of, or in addition to, acceptance testing. A software development organization may also voluntarily seek some form of certification for their products for marketing or other purposes. Commercial-off-the-shelf (COTS) may also be put through certification testing as needed. For example, if the COTS software is used in or to build for safety-critical or security-critical systems, certification testing may be considered as a cost-effective, efficient way of performing V&V.
Functional Testing
Functional testing focuses on testing the functional requirements of the software—“what the software is supposed to do.” Functional testing strategies include:
To test each individual function, the tester decomposes and analyzes functional requirements and partitions the functionality into logical components (commands, actions, menu options). For each component, a list is made of detailed functions and sub-functions that need to be tested, as illustrated in the example in Figure 21.11 .
According to Whittaker (2003), “functional software testing is about intelligence.” First, the testers must become familiar with the environment in which each function/sub-function operates. This involves identifying “users” of that function or sub-function from outside of the software being tested. As illustrated in Figure 21.12 , there are four classes of environmental users that need to be considered.
Figure 21.11 Function and sub-function list—example.
Figure 21.12 Environment in which the function operates.
Once the environmental users for the function/sub-function are identified, the tester must explore the capabilities of each function/sub-function as it relates to each of those users. For example:
To test usage scenarios, the tester chains together individual tests, from testing of the functions/sub-functions, into test scenarios that test start-to-finish user interactions with the system. Each feature may work independently, but they may not “play well together” to get real work accomplished for the users.
Testers perform operational profile testing in order to thoroughly evaluate the areas where defects are most likely to impact the reliability of the software. To perform operational profile testing, the tester must identify the different threads through the software. A thread is a sequential set of usage steps through the software that a user takes to accomplish a start to finish task. For example, as illustrated in the decision tree in Figure 21.13 , one thread would be to enter the credit card correctly the first time, enter the PIN correctly the first time, and then select and pump regular gas. A different thread would repeat these steps except for selecting and pumping premium gas. A third thread would be to enter the card correctly but take two tries before entering the PIN correctly and selecting and pumping regular gas. Once the threads have been identified, probabilities must be assigned to the paths out of any given state. The sum of all of the probabilities (percentages) for paths out of any given state must be 1 (100%). For example, in the decision tree in Figure 21.13 :
Figure 21.13 Decision tree—example.
The probability of a given thread is calculated by multiplying the assigned probabilities on that thread together. Those threads with the highest probability are the most frequently traversed threads. For example, again using the example in Figure 21.13 :
Using this example, if 1000 tests are going to be executed, operational profile testing would sample 556 (1000 × 55.58%) of those tests as variations on thread #1, for example, using different types of cards, different valid PINs and pumping different amounts of gas. A sample of 214 tests would be executed as variations on thread #2, 86 tests would be executed as variations on thread #3, and so on through each possible thread.
Performance, Environmental Load, Volume, and Stress Testing
The objective of performance testing is to determine if the system has any problems meeting its performance requirements including throughput (number of transactions per time unit), response time, or capacities (for example, the number of simultaneous users, terminals, or transactions). Performance testing is usually done at the system level under full environmental load. Typically, software can perform to specification when not much is going on. However, the performance of some software applications may degrade at full volume, with multiple users interacting with the software, and other applications running in the background taking up system resources. Performance tests should be performed after other testing is complete and the software is relatively problem free.
Environmental load testing evaluates the software’s performance capabilities (throughputs, response times, and capacities) under normal load conditions, as illustrated in Figure 21.14 .
Figure 21.14 Load, stress, and volume testing.
Volume testing , also called endurance testing or soak testing , is a special type of environmental load testing that subjects the software to heavy loads over long periods of time and evaluates the software’s capability. For example, does the software have any problems handling the maximum required volumes of data, transactions, users, and peripherals over several days or weeks?
Stress testing is another special type of environmental load testing that subjects the software to surges or spikes in load over short periods of time, and evaluates the software’s performance. For example, is the software able to appropriately deal with jumping from no load to a spike of maximum data, transactions, or users? Stress testing can also involve testing for excess load. For example, if the requirement states that the maximum capacity that a telecommunication switch can handle is 10,000, does the software appropriately handle a spike to 11,000 calls by reporting the appropriate messages (“all lines are currently busy, please call back later”) or does it result in an inappropriate failure condition?
Worst-Case Testing
One type of worst-case testing is an extension of boundary testing. Boundary testing investigates the boundaries of each input variable separately by exploring the minimum value, one below the minimum, the maximum value, and one above the maximum. During routine boundary testing, if multiple variables are needed as inputs to the software, each variable is boundary tested individually, while all of the other variables are input at their normal values. For example, if a software function has two inputs x with boundaries at 0 and 5 and y with boundaries at 20 and 50, boundary testing would first boundary-test variable x with y set to its typical value. Then variable x would be set to its typical value and variable y would be boundary-tested. This example would require eight tests.
Worst-case testing explores the boundaries of multiple input variables in combination at their boundary values. To continue the example above with the two inputs x and y, worst-case testing would require 16 test cases:
  1. x = −1 and y = 19
  2. x = −1 and y = 20
  3. x = −1 and y = 50
  4. x = −1 and y = 51
  5. x = 0 and y = 19
  6. x = 0 and y = 20
  7. x = 0 and y = 50
  8. x = 0 and y = 51
  9. x = 5 and y = 19
  10. x = 5 and y = 20
  11. x = 5 and y = 50
  12. x = 5 and y = 51
  13. x = 6 and y = 19
  14. x = 6 and y = 20
  15. x = 6 and y = 50
  16. x = 6 and y = 51
Another type of worst-case testing is to execute testing under worst-case resource conditions, for example, with the minimum allowable amount of memory, bandwidth, processing speed or network speed, or with the maximum volume of users, peripherals, or other applications running in the background. Testing performance requirements is a particular concern when testing under worst-case conditions.
Resource Utilization Testing
The objective of resource utilization testing is to determine if the system uses resources (for example, memory or disk space) at levels that exceed requirements, or if the software has any problems when needed resource levels fluctuate or when resources are busy or unavailable. To perform resource utilization testing, the testers must evaluate objectives or requirements for resource availability. By testing the software with inadequate resources or by saturating resources with artificially induced overloads, the tester can force resource-related exception conditions. Other resource-related testing considerations include whether the software can appropriately handle hardware or resource failures like:
Usability Testing
The objective of usability testing is to make certain that the software matches the user’s actual work style and determines whether the software has any areas that will be difficult or inconvenient for the users. The characteristics of usability include:
When designing tests for usability, the tester has to consider the different potential users of the system, including:
These different types of users may interpret usability very differently. Features that make the system user-friendly to novice or occasional users may drive power users crazy.
One method that is often employed in usability testing is to have different types of actual users work with the software, and observe their interactions. Usability tests can include having these users perform freeform, unplanned tasks as part of the testing process. This includes allowing users to work in a manner that reflects how they actually expect to work with the system. Alternately, the user may be asked to perform predefined, written scripts containing step-by-step instructions for the user to follow. This method facilitates better coverage but may not effectively reflect the user’s actual work patterns. Using a preliminary prototype or mock-up rather than the final product allows usability testing to be performed earlier in the life cycle. Conducting and observing beta testing or field trials on the software at the actual user’s site, or in the actual environment where the software will be used, can also be used to perform usability testing.
Exploratory Testing
In exploratory testing , the testers design the test cases and procedures at the same time they are testing the software. Unlike preplanned test design, where test cases and procedures are written well in advance based on the software specifications, in exploratory testing the testers use the information they learn about the product as they are testing it to create additional tests. Unlike ad hoc testing, exploratory testing is not just randomly wandering around the software. It is a systematic exploration of part of the software. The testers consciously think about what they do not know but want to find out. Also unlike ad hoc testing, the exploratory tester should “always write down what you do and what happens when you run exploratory tests” (Kaner 1999).
Exploratory testing is based on the tester’s intuition, experience, and observations. For example, a tester runs the preplanned set of test cases for a function and everything matches the expected results. Normally the tester would just move on to the next set of tests. However, if things just do not seem right, the tester uses exploratory testing to create additional test cases that further investigate the feature. Another example would be when the tester does find several problems in an area of the software, but believes there are still more to find, even though all the preplanned test cases have been executed. Again, the tester creates additional test cases to explore the feature further, trying to confirm or refute their suspicions.
Exploratory testing can also leverage off of risk-based testing. For example, if a feature is expected to be low-risk, few planned test cases may be written for that feature. The exploratory testers can quickly investigate the low-risk feature to find out if they can find problems that might cause a reassessment of that feature’s risk. On the other hand, for features that are expected to be high-risk, time may be reserved in the schedule to plan for additional exploratory testing of those features.
Regression Testing
Regression testing is concerned with how changes (fixes or updates) to software affect the unchanged portions of the software or hardware. Regression testing is “selective retesting of a software or software component to verify that modifications have not caused unintended effects and that the system or component still complies with its specified requirements” (ISO/IEC/IEEE 2010).
Regression analysis is the activity of examining a proposed software change to determine the depth and breadth of the effects the proposed change could have on the software. It is the determination of how extensively a change needs to be tested and how many already-executed test cases will need to be re-executed. Regression analysis must balance the risk of releasing the software with undiscovered defects, and the software’s quality and integrity requirements, against the cost of a more extensive testing effort.
Of course, the first testing step when software has been changed is to retest what changed. Both white-box and black-box testing strategies can be used to test the individual source code modules, components, and features that changed.
Regression analysis is then used to determine how extensively other parts of the software should be retested. When deciding what other white-box and gray-box regression tests to execute, consider source code modules/components that share the same local or global variables, or that directly call, or are called by, those source code modules/components that were changed. More rigorous regression testing might also consider going to a second level, as illustrated in Figure 21.15 . This more rigorous testing includes testing the source code modules/components that call, or are called by, the source code modules /component that directly call, or are called by, the source code modules /component that were changed. Regression analysis also includes deciding what other black-box regression tests to execute when changes are made to the software. Black-box regression analysis considers functions that:
The final step in regression testing is to execute the regression test suite . The regression test suite includes white-box, gray-box, and black-box tests that are always repeated no matter what changes in the software. Candidate tests for the regression test suite are tests for the source code modules/components, and functions that:
Figure 21.15 Test what might be impacted by the changes.
Tests that are included in the regression test suite are prime candidates for automation because those tests will be rerun many times (every time the software is changed).
5. TESTS OF EXTERNAL PRODUCTS
Determine appropriate levels of testing for integrating supplier, third-party, and subcontractor components and products. (Apply)
BODY OF KNOWLEDGE VI.B.5
When an external party-provided software component or product is incorporated into a software system, it must be tested to the same level of rigor as other parts of that software system. That is, decisions about testing external software products should be based on risk analysis and the level of integrity required. If an externally- provided component or product is used to develop, build, or test the software system, it must also be tested to confirm that it does not cause problems in the software being developed. Externally-provided software components or products should be treated as an extended part of the primary development organization. The adequacy of the external supplier’s development and test processes should be prequalified and then monitored throughout the effort. Supplier processes do not necessarily have to mirror the acquirer’s process, but they need to achieve the same quality criteria for development and testing.
Throughout the software development process, the supplier of the external product should be performing V&V activities. The amount of visibility the acquirer needs into those V&V activities may vary, depending on the acquirer’s relationship to that supplier. For example, for commercial off-the-shelf (COTS) software or open source software, the acquirer may have no way of determining the extent to which the supplier tested their software. On the other hand, with custom-built software, the acquirer might be an active participant in some supplier testing activities, or at least have access to the test results and reports.
As part of product acceptance for custom-built software, the acquirer should conduct acceptance testing against negotiated acceptance criteria, which are included in the supplier contract or agreement to confirm that the delivered products meet all agreed-to requirements. The acquirer may also want to conduct alpha or beta testing. If the supplier’s software is being integrated into a larger product, it should be integration tested and system tested as part of that larger system. Black-box strategies are usually necessary for this testing since access to the acquired software source code is typically not available.
As with other testing activities, risk should be considered when scoping any COTS testing effort. For example, for COTS software where the mainstream functionality is being utilized, less testing may be required because this functionality has already been validated extensively through operational use. However, if more fringe functionality is being used, or if this is a newly released COTS product, more extensive testing may be appropriate. If the COTS software is being integrated into the software system, it should be integration tested and system tested as part of that larger system. As with any supplier-supplied software, black-box strategies are usually necessary for this testing since access to the software source code is typically not available. If the COTS software is being used to develop or test the software being developed, risk-based validation and/or certification testing may be used to confirm that it performs as required. Any customization or wraparounds that are done for the COTS software should also be tested.
For open source software, risk should be considered when scoping any testing effort. If the open source software is being integrated into the software system, it should again be integration tested and system tested as part of that larger system. If the open source software is being used to develop or test the software being developed, risk-based validation and/or certification testing may be used to confirm that it performs as required like it was with COTS software. Any modifications, customization or wraparounds that are done for the open source software should also be tested. However, there may be legal issues associated with modifications to open source software that may need to be considered, so legal advice should be obtained before using open source software. Open source software can also be problematic when it comes to keeping up with changes and performing retesting and regression testing.
For some projects, the customers may also supply software that will be integrated into the software system. If this is the case, this customer-supplied software must also be tested to the same level of rigor as other parts of the software system.
6. TEST COVERAGE SPECIFICATIONS
Evaluate the adequacy of test specifications such as functions, states, data and time domains, interfaces, security, and configurations that include internationalization and platform variances. (Evaluate)
BODY OF KNOWLEDGE VI.B.6
Test coverage looks at the completeness of the testing. Test coverage maps the tests (test cases and test procedures) to some attribute of the source code modules, component, or software product being tested. These attributes can be attributes of the specification, as discussed in this section, or attributes of the code, as discussed in the next section.
Requirements Coverage
Requirements coverage looks at the mapping of tests to each of the uniquely identified requirements in the specification. Traceability is one of the primary mechanisms used to make sure that all of the functional and other product requirements from the specification are covered. Every requirement should trace forward to one or more tests that will be used to evaluate the complete and correct implementation of that requirement. Requirements coverage metrics can be used to monitor progress toward complete coverage during the testing activities. The requirements coverage metric is the percentage of the requirements that trace forward to at least one test case. If the value of this metric is less than 100 percent then additional test cases need to be written to test the uncovered requirements. However, a value of 100 percent does not guarantee thorough testing, since more than one test case may be needed to comprehensively test each requirement.
A test matrix, also called a system verification matrix , is another tool for making certain that each requirement is appropriately tested. A test matrix traces each requirement to the method that will be used to test it. Initially, a test matrix is used to plan the level of testing and test strategy for each requirement, as shown in Table 21.6 . As test design progresses, additional detail can be added to the matrix to include more information about the specific approach to be taken. In this example, the test strategies include:
Table 21.6 Test matrix—example.
An advantage of creating a test matrix early in the life cycle is the identification of non-testable requirements like R50 in Table 21.6 , before they propagate into other software work products. Non-testable requirements and other issues discovered during the creation of the test matrix should be reported to the requirements analyst for resolution. It should be remembered that any requirement that is not tested is a source of risk, because the implementation of that requirement may include undiscovered defects.
Functional Coverage
Of course, 100 percent requirements coverage automatically achieves 100 percent coverage of all of the functional requirements. Another way to measure functional coverage from a user perspective is to look at the percentage of threads through the program that map to test procedures.
State Coverage
State coverage looks at the mapping of tests to the various states that the software can be in, and the various transitions between those states, to make certain that they are thoroughly tested. State testing is done to determine if the software:
Data Domain Coverage
Data domain coverage looks at the mapping of tests to the various data domains in the software to confirm that they are thoroughly tested. Once the equivalency classes and boundaries for the inputs and outputs have been identified, as defined previously, data domain coverage evaluates the percentage of those classes and boundaries that have test cases associated with them.
Another data domain coverage technique creates a list of all of the data items in the software. This can be done through evaluating the data dictionary, or by looking at the specification to identify data items and structures. The CRUD acronym can then be used to investigate data domain coverage, by making certain that tests exist to evaluate any requirements or stakeholder needs for:
Some organizations use the alternative CRUDL acronym where the “L: is added for listing, or tracking of each data item or structure in reports.
Date and Time Domain Coverage
Date and time domain coverage looks at the mapping of tests to the various date and/or time domains in the software to confirm that they are thoroughly tested. Date and time domain testing considers:
Date and time domain testing also considers whether the software appropriately handles special dates and/or times. For example:
Interface Coverage
An interface is a shared boundary across which information is passed. There are interfaces internal to the software between its source code modules/components. There are also interfaces between the software and external entities (user/hardware, operating systems, file systems, and other software applications), as illustrated in Figure 21.12 . Interface coverage looks at the mapping of tests to the various identified interfaces to make certain that they are thoroughly tested. Integration testing typically focuses on confirming interface coverage.
Security Testing
The objective of security testing is to determine if the security of the system can be attacked and breached, and to determine whether or not the software can handle the breaches, or recover from them if security is breached. Security coverage looks at the mapping of tests to the various identified security issues and then performs penetration testing to confirm that they are thoroughly tested. As our digital infrastructure gets increasingly complex and interconnected, the difficulty of achieving application security increases exponentially and so does the difficulty of the associated security test efforts. As with other types of testing, checklists are often used to make sure that key security areas are tested. Testers also use penetration testing techniques to attempt to break through access control and software security controls to identify weaknesses along specific paths of attack. (See Chapters 6 , 11 , 13 and 17 for additional discussions about software security.)
Platform Configuration Coverage
The objective of platform configuration testing is to determine if the software has any problems handling all the required possible hardware platforms and software configurations (for example, various servers, browsers, operating systems, and so on). Platform configuration coverage looks at the mapping of tests to the various possible platforms to confirm that the software is thoroughly tested on each platform. One method used for platform configuration coverage is to use a platform configuration test matrix. In a platform configuration test matrix , all of the possible platforms are listed as column headers. The example in Table 21.7 includes various operating systems that the software needs to run on but other configurations can be included as needed. The left column lists the tests to be conducted. Each cell lists the status of those tests on that platform:
Table 21.7 Platform configuration test matrix—example.
This matrix only tests a single configuration entity variance per column. However, if combinations of configurations entities are expected to interact with the software in different ways, it may be necessary to test configuration combinations as well. For example, if there are multiple possible hardware platforms and multiple possible operating systems, configuration testing may require the testing of the software on each hardware platform/operating system pairing.
Internationalization Testing
The objective of internationalization testing , also called localization testing, is to determine if the software has any problems related to transitioning it to other geographic locations. This includes adaptation to different languages, cultures, customs, and standards. The tester must be fluent in each language used to test the system in order to confirm proper translation of screens, reports, error messages, help, and other items.
Internationalization configuration coverage looks at the mapping of tests to the various possible geographic locations to confirm that the software is thoroughly tested on each location. Internationalization configuration coverage can also be tracked using a configuration test matrix, similar to the one illustrated in Table 21.7 . However, in the case of internationalization, all of the possible locations are listed as column headers instead of the platforms. Test considerations in internationalization testing include:
7. CODE COVERAGE TECHNIQUES
Use and identify various tools and techniques to facilitate code coverage analysis techniques such as branch coverage, condition, domain, and boundary. (Apply)
BODY OF KNOWLEDGE VI.B.7
Unlike threads, which are usage/functionality focused, paths refer to control flow-sequences through the internal structure of the software source code. There are typically many possible paths between the entry and exit of a typical software application. A source code module containing nothing but straight line code has a single path. Every Boolean decision (IF, or IF THEN ELSE) doubles the number of potential paths, every case statement multiplies the number of potential paths by the number of cases, and every loop multiplies the number of potential paths by the number of different iteration values possible for the loop. For example, a source code module with a loop that can be iterated from one to 100 times has 100 possible paths. Add an if-than-else statement inside that loop and that increases the number of paths to 200. Add a case statement with four possible choices inside the loop as well, and there are 800 possible paths. Moving from the individual source code modules to the integration level, if this source code module with 800 paths is integrated with a source code module that only has two sequential if-than-else statements (four paths), there are now 3200 paths through these two source code modules in combination.
Since there are rarely enough resources to test every path through a complex software application, or even a complex individual source code module, the tester uses white-box logic coverage techniques to systematically select the tests that are the most likely to help identify the yet undiscovered, relevant defects.
Statement, Decision, and Condition Coverage
To demonstrate the different statement, decision, and condition coverage techniques, the piece of nonsense code shown in Figure 21.16 will be used.
A statement is an instruction or a series of instructions that a computer carries out. Statement coverage is the extent that a given source code module/component statements are exercised by a set of tests. Statement coverage is the least rigorous type of code coverage technique. To have complete statement coverage, each statement must be executed at least once. Table 21.8 illustrates that it only takes one test case to have statement coverage of the code in Figure 21.16 . As long as input variables B and C are selected so that both decisions in this code are true, every statement is executed.
Figure 21.16 Code—example.
Table 21.8 Statement Coverage—example.
A decision determines the branch path that the code takes. To have decision coverage , also called branch coverage , each decision takes all possible outcomes at least once. For example, if the decision is a Boolean expression, decision coverage requires test cases for both the true and false branches. If the decision is a case statement, decision coverage requires test cases that take each case branch. Table 21.9 illustrates the test cases needed to have decision coverage of the code in Figure 21.16 :
A condition is a state that a decision is based on. To have condition coverage , each condition in a decision takes all possible outcomes at least once. For the code in Figure 21.16 there are three conditions that the input variable B can have:
There are also three conditions that input variable C can have:
Table 21.10 illustrates one choice of test cases that combines these into condition coverage of the code in Figure 21.16 . Note that condition coverage does not always imply decision coverage. For example, in the set of test cases in Table 21.10 :
Table 21.9 Decision coverage—example.
Table 21.10 Condition coverage—example.
Table 21.11 Decision/condition coverage—example.
Table 21.12 Multiple condition coverage—example.
The next level of rigor is to have condition and decision coverage where each decision takes all possible outcomes at least once, and each condition in a decision takes all possible outcomes at least once. If decision/condition coverage exists, condition coverage, and decision coverage also all exist. Table 21.11 illustrates one choice of test cases that provides decision/condition coverage of the code in Figure 21.16 .
Modified condition/decision coverage (MC/DC) expands on condition and decision coverage by adding the requirements that each condition be shown to independently affect the outcome of the decision. The requirement for independence makes certain that the effect of each condition is tested relative to the other conditions (Hayhurst 2001).
To have multiple condition coverage , each statement is executed at least once and all possible combinations of condition outcomes in each decision occur at least once. Multiple condition coverage always results in condition, decision, and statement coverage as well. Multiple condition coverage is the most rigorous type of structural coverage testing. Table 21.12 illustrates a choice of test cases that provides multiple condition coverage of the code in Figure 21.16 .
Domain and Boundary Testing
For structural testing, the conditions for the input variables define their domains , or equivalence classes . Once these domains are defined, sampling from these domains can be done when executing test cases.
Boundary value testing can also be done by sampling values on the boundaries of these domains. For example, for input variable B in the code in Figure 21.16 , the boundary values would be 39, 40, 60, and 61.
Boundary testing can also be performed on the loop counters when loops are included in the software source code. Boundary loop testing would have test cases for the minimum and maximum times through the loop (if there is a maximum), and one less than the minimum (which would skip the loop) and one more than the maximum. One more than the maximum is an error condition that the software should appropriately catch and handle, because the loop should never be able to be executed more than its maximum number of times. If the loop can be executed more than the maximum number of times, a problem should be reported.
8. TEST ENVIRONMENTS
Select and use simulations, test libraries, drivers, stubs, harnesses, etc., and identify parameters to establish a controlled test environment. (Analyze)
BODY OF KNOWLEDGE VI.B.8
Test Beds
A test bed is an environment established and used for the execution of tests. The goal is to be able to create a test environment that matches the actual operating environment as closely as possible, while providing the testers with a known platform to test from, and visibility into test results. The test bed includes:
Multiple test environments may be used throughout the testing processes. There may be different test environments for performing different levels of testing (unit, integration, system) or for performing different types of testing (performance, security, safety, usability). Requirements for the test environment may be documented in the V&V plans, test plans, test specifications, or even in the individual test cases.
Stubs
A stub is a software source code module/component that can be used to minimally simulate the actions of a called (lower-level) source code module/component that has not yet been integrated, during top-down testing. Stubs can also be used in unit testing. Some required stubs need more than the minimal return response. For example, if the calling source code module/component under test expects a calculation from the invoked stub, that calculation capability may need to be programmed into the stub, or the stub can simply set the returned variable(s). Table 21.13 shows an example of a very simple stub and its associated calling routine.
A stub can also be used when the software needs to talk to an external device in order to replace that device. For example, a stub could intercept the data that would normally be sent to a hardware device and automatically analyze that data against the expected values. This analysis can often be done more quickly and accurately by a stub than it can be manually, especially if the output is large or complex. For example, humans are not good at finding a single missing or incorrect data point in a long stream of data.
Drivers
A driver is a source code module/component that can be used to minimally simulate the actions of a calling (higher-level) source code module/component that has not yet been integrated, during bottom-up testing. Drivers can also be used in unit testing. The driver typically establishes the values that will be passed to the source code module/component under test before calling it. Table 21.14 shows an example of a very simple driver and its associated called routine.
Table 21.13 Stub—example.
Stub:
Calling (higher-level) source code module:
Procedure CALCULATE_TAX (taxes) Return taxes
Procedure SALE_PROCESS
begin;
begin;
return $1.32;
statement 1;
end CALCULATE_TAX;
statement 2;
TAX = CALCULATE_TAX (taxes);
statement 3;
end SALE_PROCESS
Table 21.14 Driver—example.
Driver:
Called (lower-level) source code module:
Procedure EXEC_DRV
Function MEAN_VALUE
begin;
(X1, X2 : FLOAT) return FLOAT
NUM1= 15
begin
NUM2 = 25
return (X1 + X2) / 2.0;
MEAN_VALUE(NUM1, NUM2);
end MEAN_VALUE
end EXEC_DRV;
More-complex test drivers can be used to mimic large amounts of data being entered manually. For example, a driver could mimic data entry streams coming from multiple keyboards simply by reading a predefined set of inputs from a data file and using them to emulate the keyboard entries.
Harnesses
At its simplest, a test harness is a set of test stubs and drivers used to automate unit testing. At its most sophisticated, a test harness, also called an automated test framework, can be used as a scaffold for software, tests, and test data. This harness can be used to automate the testing of a source code module/component, by executing a set of test cases and/or scripts under a variety of conditions, and collecting, monitoring, and even evaluating the test results. The two primary parts of a test harness are the test execution engine and the repository of test scripts.
A test execution engine , also called a test executive, test manager , or test sequencer , is a test automation tool that selects, loads, and executes a test specification from the repository of test scripts. The test execution engine then monitors the execution of that test specification, and reports and stores the results (pass/fail/abort status, time stamps, duration of the test, and other information) for every step in the test sequence. The test execution engine may also have an interface that allows the testers to interact with the automated tests.
A repository of test scripts is a collection of test specifications, also called test sequences , which constitute the specific test steps required to execute the tests and their associated test data.
Controlling Test Environments
Sometimes the time to set up the test environment is trivial. However, according to Black (2004), “A complex test environment can take a long time and involve a lot of effort to set up and configure.” Whatever the initial investment, it must be protected by managing and controlling the test environment as it is being utilized by the testers to execute their tests. The key here is to be able to restore the test environment to a known state with minimal effort because, as tests are being executed, unexpected events may occur that corrupt the data or the software being tested, or cause issues with some other part of the test bed. After the software is released into operations, the test bed may also need to be reproduced to test the software after maintenance changes are made.
Test environment control policies and processes should be established and followed to make sure that the test bed is not interjecting false positives (issues that look like software problems but are actually test bed problems) or false negatives (allowing software problems that should have been identified to be missed) into the test execution processes. The policies and procedures should include mechanisms for:
9. TEST TOOLS
Identify and use test utilities, diagnostics, automation and test management tools. (Apply)
BODY OF KNOWLEDGE VI.B.9
There are many different tools that can be used to make software testing more efficient, consistent and repeatable. Testing tools also reduce the tedium of repetitive work, provide objective evaluation, and provide visibility into the testing processes, test results, and test sufficiency.
Like any other software product, testing tools should go through verification and validation (V&V) prior to use. Requirements for the V&V of supplier-provided testing tools should be specified as part of acquisition plans for those tools. This may include requirements for supplier V&V activities, as well as plans for acquirer conducted V&V (joint reviews, alpha, beta, and acceptance testing). Both supplier-provided and in-house-developed software testing tools should go through an appropriate level of rigorous V&V for the required integrity level, and/or risk of the software they will be testing. This helps make sure that a tool does not indicate that the software passes tests that it actually failed. It also helps eliminate the waste caused when software developers spend time debugging reported software problems, which were actually defects in the testing tools and not in the software.
Test Planning and Management Tools
Test planning and management tools can range from simple templates in a word processor to sophisticated test management tools. Test planning and management tools include:
Test Design and Development Tools
Test design and development tools analyze requirements and design models, software source code and graphical user interfaces, to help:
Test data preparation tools include features that provide support to: (Graham 2008)
Test Execution Tools
There are many test execution tools on the market. These include:
Test Support Tools
Support tools from other disciplines are also used in the testing effort, including:
Other Tools Used by Testers
There are also many simple tools used by testers to perform their work. These tools may or may not be part of the test bed discussed previously. These tools include:
10. TEST DATA MANAGEMENT
Ensure the integrity and security of test data through the use of configuration controls. (Apply)
BODY OF KNOWLEDGE VI.B.10
Test Data Requirements
Once test cases, test procedures and automated test scripts have been designed, the requirements for the specific test data items must be identified. Test data items include specific input data for test cases and may also include “operations-like” databases, standing data tables and global data (Graham 2008).
Test data item requirements describe the data items and their characteristics, and properties needed to execute the tests, including:
Testing should never directly use data items in the operational environment. If operational data are needed as the basis for test data items that data should always be copied from the operational environment into the test environment in order to maintain the integrity and confidentiality of the operational data items.
Test Data Management
Test data management involves the management and control of the test data after they are initially created. The most important methodology in test data management is the utilization of a central repository for test data items. This allows test data items to be reusable, so that:
This can be accomplished through identifying test data items as configuration items, and placing them into a test library (central repository for test data items) in a SCM library tool, or by creating the central repository through other means, as appropriate to the project. As test data items are needed over time for new levels of testing, they can be reused if they are already in the central repository. If they are not in the repository, they can be created and added to the repository for future reuse. The central repository also acts as a single source of content for:
Test data management also includes mechanisms for: