15
Security Quality Assurance Testing
In this chapter you will
•  Explore the aspects of testing software for security
•  Learn about standards for software quality assurance
•  Discover the basic approaches to functional testing
•  Examine types of security testing
•  Explore the use of the bug bar and defect tracking in an effort to improve the SDL process
Testing is a critical part of any development process, and testing in a secure development lifecycle (SDL) environment is an essential part of the security process. Designing in security is one step, coding is another, and testing provides the assurance that what was desired and planned becomes reality. Validation and verification have been essential parts of quality efforts for decades, and software is no exception. This chapter looks at how and what to test to obtain an understanding of the security posture of software.
Standards for Software Quality Assurance
Quality is defined as fitness for use according to certain requirements. This can be different from security, yet there is tremendous overlap in the practical implementation and methodologies employed. In this regard, lessons can be learned from international quality assurance standards, for although they may be more expansive in goals than just security, they can make sense there as well.
ISO 9216
The International Standard ISO/IEC 9216 provides guidance for establishing quality in software products. With respect to testing, this standard focuses on a quality model built around functionality, reliability, and usability. Additional issues of efficiency, maintainability, and portability are included in the quality model of the standard. With respect to security and testing, it is important to remember the differences between quality and security. Quality is defined as fitness for use, or conformance to requirements. Security is less cleanly defined, but can be defined by requirements. One issue addressed by the standard is the human side of quality, where requirements can shift over time, or be less clear than needed for proper addressing by the development team. These are common issues in all projects, and the standard works to ensure a common understanding of the goals and objectives of the projects as described by requirements. This information is equally applicable to security concerns and requirements.
SSE-CMM
The Systems Security Engineering Capability Maturity Model (SSE-CMM) is also known as ISO/IEC 21827, and is an international standard for the secure engineering of systems. The SSE-CMM addresses security engineering activities that span the entire trusted product or secure system lifecycle, including concept definition, requirements analysis, design, development, integration, installation, operations, maintenance, and decommissioning. The SSE-CMM is designed to be employed as a tool to evaluate security engineering practices and assist in the definition of improvements to them. The SSE-CMM is organized into processes and corresponding maturity levels. There are 11 processes that define what needs to be accomplished by security engineering. The maturity level is a standard CMM metric representing how well each process achieves a set of goals. As a model, the SSE-CMM has become a de facto standard for evaluating security engineering capability in an organization.
OSSTMM
The Open Source Security Testing Methodology Manual (OSSTMM) is a peer-reviewed system describing security testing. OSSTMM provides a scientific methodology for assessing operational security built upon analytical metrics. It is broken into five sections: Data networks, telecommunications, wireless, human, and physical security, as shown in Table 15-1 . The purpose of the OSSTMM is to create a system that can accurately characterize the security of an operational system in a consistent and reliable fashion.
Table 15-1    OSSTM Sections and Test/Audit Areas
OSSTMM provides a scientific methodology that can be used in the testing of security. The Institute for Security and Open Methodologies, ISECOM, the developers of OSSTMM, have developed a range of training classes built around the methodology. OSSTMM can also be used to assist in auditing, as it highlights what is important to verify as to functional operational security.
Functional Testing
Functional software testing is performed to assess the level of functionality associated with the software as expected by the end user. Functional testing is used to determine compliance with requirements in the areas of reliability, logic, performance, and scalability. Reliability measures that the software functions as expected by the customer at all times. It is not just a measure of availability, but functionally complete availability. Resiliency is a measure of how strongly the software can perform when it is under attack by an adversary.
Steps for Functional Testing
Functional testing involves the following steps in order:
1.  Identifying the functions (requirements) that the software is expected to perform
2.  Creating input test data based on the function’s specifications
3.  Determining expected output test results based on the function’s specifications
4.  Executing the test cases corresponding to functional requirements
5.  Comparing actual and expected outputs to determine functional compliance
Unit Testing
Unit testing is conducted by developers as they develop the code. This is the first level of testing and is essential to ensure that logic elements are correct and that the software under development meets the published requirements. Unit testing is essential to the overall stability of the project, as each unit must stand on its own before being connected together. At a minimum, unit testing will ensure functional logic, understandable code, and reasonable level of vulnerability control and mitigation.
   
EXAM TIP    One of the principle advantages of unit testing is that it is done by the development team and catches errors early, before they leave the development phase .
Integration or Systems Testing
Even if each unit tests properly per the requirements and specifications, a system is built up of many units that work together to achieve a business objective. There are emergent properties that occur in systems, and integration (or systems-level) testing should be designed to verify that the correct form and level of the emergent properties exist in the system. A system can be more than just the sum of the parts, and if part of the “more” involves security checks, these need to be verified.
Systems or integration testing is needed to ensure that the overall system is compliant with the system-level requirements. It is possible for one module to be correct and another module to also be correct but for the two modules to be incompatible, causing errors when connected. System tests need to ensure that the integration of components occurs as designed and that data transfers between components are secure and proper.
Performance Testing
Part of the set of requirements for the software under development should be the expected service levels of agreement that can be expected from the software. Typically, these are expressed in the terms of a service level agreement (SLA). The typical objective in performance testing is not the finding of specific bugs, but rather the goal is to determine bottlenecks and performance factors for the systems under test. These tests are frequently referred to as load testing and stress testing. Load testing involves running the system under a controlled speed environment. Stress testing takes the system past this operating point to see how it responds to overload conditions.
   
EXAM TIP    Recoverability is the ability of an application to restore itself to expected levels of functionality after the security protection is breached or bypassed.
Security Testing
Testing includes white-box testing, where the test team has access to the design and coding elements; black-box testing, where the team does not have access; and grey-box testing, where information is greater than black-box testing but short of white-box testing. This nomenclature does not describe the actual tests being performed, but rather indicates the level of information present to the tester before the test.
White-Box Testing
White-box testing is performed on a system with full knowledge of the working components, including the source code and its operation. This is commonly done early in the development cycle. The advantage of white-box testing is that the attacker has knowledge of how the system works and can employ their time on compromising it. The unit testing of a section of code by the development team is an example of white-box testing. White-box testing, by design, provides the attacker with complete documentation, including source code and configuration parameters. This information can then be used to devise potential methods of attacking the software. Thus, white-box testing can focus on the structural basis of the software and the operational deployment considerations with respect to its use or misuse.
   
EXAM TIP    When testers have access to full knowledge of a system, including source code, it is referred to as white-box testing.
Black-Box Testing
Black-box testing is where the attacker has no knowledge of the inner workings of the software under test. This is common in more advanced system-level tests, such as penetration testing. The lack of knowledge of the specific implementation is not as important as one may think at times, for the attacker still has the same knowledge that an end user would possess, so they know what inputs are requested. Using their knowledge of how things work and what patterns of vulnerabilities are likely to exist, an attacker is not as blind in black-box testing as you might think. Black-box testing focuses on the behavioral characteristics of the application.
   
EXAM TIP    When testers have access to no knowledge of how a system works, including no knowledge of source code, it is referred to as black-box testing.
Grey-Box Testing
Grey-box testing is thus named, as an attacker has more knowledge of the inner workings, but less than total access to source code. Grey box testing is relatively rare outside of internal testing.
Environment
Software applications operate within a specific environment, which also needs to be tested. Trust boundaries, described earlier in the book, are devices used to demark the points where data moves from one module set to another. Testing of data movement across trust boundaries from end to end of the application is important. When the complete application, from end to end, is more than a single piece of code, interoperability issues may arise and need to be tested for. When security credentials, permissions, and access tokens are involved, operations across trust boundaries and between modules become areas of concern. Verifying that all dependencies across the breadth of the software are covered, both logically and from a functional security credential point of view, is important.
Comparison of Common Testing Types
White-Box Testing
Black-Box Testing
Full knowledge, including source code
Zero knowledge
Assesses software structure and design
Assesses software behavior
Low false positives
High false positives
Logic flaws are detected
Logic flaws are typically not visible
Bug Tracking
Software will always have errors or bugs. And these bugs come in a variety of shapes and sizes. Some are from design issues, some from coding, and some from deployment. If the development team is going to manage these issues, they need to be collected, enumerated, and prioritized. Tracking the defects as they become known will allow for better access and management. Remediation of bugs can take many forms, but typically four states are used:
•  Removal of defect
•  Mitigation of defect
•  Transfer of responsibility
•  Ignore the issue
Sometimes, the removal of the defect is not directly possible. This could be because of other functionality that would be lost in the removal process, or the cost of returning to design or another previous step in the development process would be too costly to execute at this point in production. These four states mirror the options associated with risk, and this makes sense, as bugs create risk in the system.
The goal of tracking bugs is to ensure that at some point they get addressed by the development team. As it may not be feasible to correct all bugs at or near the time of discovery, logging and tracking them provide a means of ensuring that what is found is eventually addressed. Logging them also provides a metric as to code quality. By comparing the defect rate during development to other systems of similar size and complexity, it is possible to get a handle on the development team’s efficiency.
Software defects, or bugs, can be characterized in different ways. One method is by the source or effect of the defect. Defects can be broken into five categories:
    •   Bugs    Errors in coding
    •   Flaws    Errors in design
    •   Behavioral anomalies    Issues in how the application operates
    •   Errors and fault s Outcome-based issues from other sources
    •   Vulnerabilities    Items that can be manipulated to make the system operate improperl y
Defects
A defect database can be built to contain the information about defects as they occur. Issues such as where did the defect occurred, in what part of the code it occurred, in what build, who developed it, who discovered it, how it was discovered, if is it exploitable, etc., can be logged. Then, additional disposition data can be tracked against these elements, providing information for security reviews.
Tracking all defects, even those that have been closed, provides a wealth of information to developers. What has gone wrong in the past, where, and how? The defect database is a tremendous place to learn what not to do, and in some cases, what not to repeat. This database provides testers with ammunition to go out hunting for defects.
Errors
Errors are examples of things gone wrong. They can be of varying levels of severity and impact. Some errors are not a significant issue at the present time, for they do not carry immediate operational risk. But like all other issues, they should be documented and put into the database. This allows them to be included in quality assurance (QA) counts and can help provide an honest assessment of code quality over time. Errors can be found through a wide variety of testing efforts, from automated tests to unit tests to code walkthroughs. The important issue with errors is collecting the information associated with them and monitoring the metrics.
If testing is a data collection effort aimed at improving the SDL process, then error data collection should not be an effort aimed at punitive results. The collection should enable feedback mechanisms to provide information to the development team, so that over time, fewer errors are made, as the previously discovered and now-understood problems are not repeated. Monitoring error levels as part of a long-term security performance metric provides meaningful, actionable information to improve the efforts of the development team.
Vulnerabilities
Vulnerabilities are special forms of errors, in that they can be exploited by an adversary to achieve an unauthorized result. As in all other types of defects, vulnerabilities can range in severity, and this is measured by the potential impact on the overall system. Vulnerabilities are frequently found during activities such as penetration testing and fuzz testing. The nature of these testing environments and the types of results make vulnerability discovery their target of opportunity. By definition, these types of errors are more potentially damaging and they will score higher on bug bar criteria than many other error types.
Bug Bar
The concept of a bug bar is an operational measure for what constitutes a minimum level of quality in the code. The bug bar needs to be defined at the beginning of the project as a fixed security requirement. Doing this establishes an understanding of the appropriate level of risk with security issues and establishes a level of understanding as to what must be remediated before release. During the testing phase, it is important to hold true to this objective and not let the bar slip because of production pressures.
A detailed bug bar will list the types of errors that cannot go forward into production. For instance, bugs labeled as critical or important may not be allowed into production. These could include bugs that permit access violations, elevation of privilege, denial of service, or information disclosure. The specifics of what constitutes each level of bug criticality need to be defined by the security team in advance of the project so that the testing effort will have concrete guidance to work from when determining level of criticality and associated go/no-go status for remediation.
Detailed requirements for testing may include references to the bug bar when performing tests. For instance, fuzzing involves numerous iterations, so how many is enough? Microsoft has published guidelines that indicate fuzzing should be repeated until there are 100,000 to 250,000 clean samples, depending upon the type of interface, since the last bug bar issue. These types of criteria ensure that testing is thorough and does not get stopped prematurely by a few low-hanging fruit–type errors.
Attack Surface Validation
The attack surface evaluation was extensively covered in the design portions of this book. During the design phase, an estimate of the risks and the mitigation efforts associated with the risks is performed. Based on the results of this design, the system is developed, and during development, the actual system design goals may or may not have been met. Testing the code for obvious failures at each step along the way provides significant information as to which design elements were not met.
It is important to document the actual attack surface throughout the development process. Testing the elements and updating the attack surface provide the development team with feedback, ensuring that the design attack surface objectives are being met through the development process. Testing of elements such as the level of code accessible by untrusted users, the quantity of elevated privilege code, and the implementation of mitigation plans detailed in the threat model is essential in ensuring that the security objectives are being met through the development process.
Testing Artifacts
Testing is a multifaceted process that should occur throughout the development process. Beginning with requirements, use and misuse cases are created and used to assist in the development of the proper testing cases to ensure requirements coverage. As software is developed, testing can occur at various levels—from the unit level where code is first created to the final complete system and at multiple stages in between. To ensure appropriate and complete testing coverage, it is important for the testing group to work with the rest of the development team, creating and monitoring tests for each level of integration to ensure that the correct properties are examined at the correct intervals of the secure development process .
Test Data Lifecycle Management
Testing can require specific useful data to perform certain types of tests. Whether for error conditions or verification of correct referential integrity testing, test data must be created to mimic actual production data and specific process conditions. One manner of developing useable data, especially in complex environments with multiple referential integrity constraints is to use production data that has been anonymized. This is a difficult task as the process of truly anonymizing data can be more complex than just changing a few account numbers and names. Managing test data and anonymizing efforts are not trivial tasks and can require planning and process execution on the part of the testing team.
Chapter Review
This chapter opened with a look at some standards associated with software quality assurance. ISO 9216 details quality in software products, while ISO 21827 (SSE-CMM) details the processes of secure engineering of systems. The OSSTMM, a scientific methodology for assessing operational security built upon analytical metrics, was presented as an aid to testing and auditing. Functional testing, including reliability and resiliency testing, was covered. The functional testing elements of unit testing, systems testing, and performance testing were presented. Security testing can be performed in white-, grey- or black-box modes, depending upon the amount of information possessed by the tester. Performance testing, including the elements of load and stress testing, was presented. Testing of the operational environment was covered, as it is associated with the trust boundaries and sets many security conditions on the application. The tracking of bugs, including the various forms of bugs and the establishment of a bug bar, was presented. The chapter closed with a discussion on validation of the attack surface as part of testing.
Quick Tips
•  ISO 9216 details quality in software products.
•  ISO 21827 (SSE-CMM) details the processes of secure engineering of systems.
•  OSSTMM is a scientific methodology for assessing operational security built upon analytical metrics.
•  Reliability is not just a measure of availability, but functionally complete availability.
•  Resiliency is a measure of how strongly the software can perform when it is under attack by an adversary.
•  Functional testing is used to determine compliance with requirements in the areas of reliability, logic, performance, and scalability .
•  Unit testing is the first level of testing and is essential to ensure that logic elements are correct and that the software under development meets the published requirements.
•  Systems, or integration, testing is needed to ensure that the overall system is compliant with the system-level requirements.
•  White-box testing is performed on a system with full knowledge of the working components.
•  Black-box testing is where the attacker has no knowledge of the inner workings of the software under test.
•  Software defects can be classified as bugs, flaws, anomalies, errors, and vulnerabilities.
•  Bug bar is a term for classifying the severity of bugs and having rules as to what levels must be remediated before release.
•  During testing, it is important to revisit and confirm the attack surface of the project to note any creep.
Questions
To further help you prepare for the CSSLP exam, and to provide you with a feel for your level of preparedness, answer the following questions and then check your answers against the list of correct answers found at the end of the chapter.
   1 .   Testing without knowledge of the inner workings of a system is called:
          A.   Pen testing
          B.   White-box testing
          C.   Black-box testing
          D.   Vulnerability scanning
   2 .   OSSTMM is used for:
          A.   Assessing operational security using analytical metrics
          B.   Security engineering
          C.   Quality assurance for software
          D.   Evaluating security engineering practices
   3 .   Functional testing includes all of the following except:
          A.   System testing
          B.   Attack surface area testing
          C.   Unit testing
          D.   Performance testing
   4 .    Functional testing is used to determine which of the following characteristics?
          A.   Reliability, bugs, performance, and scalability
          B.   Resiliency, logic, security, and testability
          C.   Resiliency, bugs, requirements, and scalability
          D.   Reliability, logic, performance, and scalability
   5 .   The ability of an application to restore itself to expected functionality after the security protection is breached or bypassed is called:
          A.   Resilience
          B.   Recoverability
          C.   Reliability
          D.   Restoration
   6 .   When testing is done with complete knowledge of the source code, it is called:
          A.   Unit testing
          B.   Functional testing
          C.   White-box testing
          D.   Code walkthrough
   7 .   What type of testing is used to assess software behavior, albeit with significant false-positive results because of no system knowledge?
          A.   OSSTMM testing
          B.   Environmental testing
          C.   Trust boundary testing
          D.   Black-box testing
   8 .   A list of the types of errors that are not allowed to go forward as part of the SDL process is called a(n):
          A.   Bug bar
          B.   Attack surface validation
          C.   Security requirements
          D.   SDL security gate
   9 .   An operational measure of what constitutes the minimum level of quality with respect to security in code is a description of:
          A.   ISO 9216 process element
          B.   OSSTMM report
          C.   Bug bar
          D.   SDL process requirement
10 .    Defect tracking is important for all of the following reasons except:
          A.   SDL process-based improvement metric over time
          B.   Manage programmer workload
          C.   Application code quality metric
          D.   Ensure all defects are eventually addressable
11 .   The following elements are part of performance testing except:
          A.   Penetration testing
          B.   SLA achievement testing
          C.   Stress testing
          D.   Load testing
12 .   Environmental testing can be used to:
          A.   Test data movement across trust boundaries from end to end of the application
          B.   Ensure the code will run in the cloud
          C.   Ensure code compiles completely
          D.   Verify mutual authentication functions in the application
13 .   Functional testing steps include:
          A.   Requirements, test data creation, expected output results, execute test cases, comparison of actual and expected outputs
          B.   Create test data, perform functional test, score output
          C.   Requirements, create test data, perform functional test
          D.   Requirements, perform functional test, score output
14 .   An international standard for establishing quality in software products is:
          A.   ISO 9000
          B.   ISO 27001
          C.   ISO 21827
          D.   ISO 9216
15 .   The objective of tracking bugs is to:
          A.   Determine the source of bugs
          B.   Ensure that they get addressed by the development team
          C.   Track where bugs are originating in software
          D.   Score developers’ coding abilit y
Answers
   1 .  C. When testers have access to no knowledge of how a system works, including no knowledge of source code, it is referred to as black-box testing.
   2 .  A. OSSTMM is a scientific methodology for assessing operational security built upon analytical metrics.
   3 .  B. Attack surface area calculations are part of the SDL process, not the actual function of the application.
   4 .  D. Functional testing is used to determine compliance with requirements in the areas of reliability, logic, performance, and scalability.
   5 .  B. Recoverability is the ability of an application to restore itself to expected levels of functionality after the security protection is breached or bypassed.
   6 .  C. Testing with full knowledge of source code is called white-box testing.
   7 .  D. Black-box testing is characterized by no knowledge of the system and can examine system behaviors, although it can have a higher false-positive rate due to lack of specific knowledge.
   8 .  A. The concept of a bug bar is an operational measure for what constitutes a minimum level of quality in the code.
   9 .  C. A bug bar is a predetermined level of security defect that must be fixed prior to release. Errors of less significance can either be fixed or deferred. Errors that exceed the bug bar threshold must be fixed prior to software release.
10 .  B. Programmer workload is based on other items, not defect tracking.
11 .  A. Penetration testing is a method of searching for vulnerabilities.
12 .  A. When the complete application, from end to end, is more than a single piece of code, interoperability issues may arise and need to be tested before use.
13 .  A. The steps are: 1) The identification of the functions (requirements) that the software is expected to perform; 2) The creation of input test data based on the function’s specifications; 3) The determination of expected output test results based on the function’s specifications; 4) The execution of the test cases corresponding to functional requirements; and 5) The comparison of actual and expected outputs to determine functional compliance.
14 .  D. The international standard ISO/IEC 9216 provides guidance for establishing quality in software products.
15 .  B. Bugs are not always fixed at the time of discovery. Documenting and tracking them are a way to ensure they get put into work cycles for correction at a later point in time.