Chapter 4
Operationalizing Risk Mitigation

Risk management decides what risks to try to control; risk mitigation is how SSCPs take those decisions to the operational level. Senior leadership and management must drive this activity, supporting it with both resources and their attention span. These stakeholders, the business’s or organization’s leadership and decision makers, must lead by setting priorities and determining the acceptable cost–benefits trade. SSCPs, as they grow in knowledge and experience, can provide information, advice, and insight to organizational decision makers and stakeholders as they deliberate the organization’s information risk strategy and needs. Chapter 3, “Integrated Information Risk Management,” showed that this is a strategic set of choices facing any organization.

Risk mitigation is what SSCPs do, day in and day out. This is a tactical, near-term activity, as well as a set of tasks that translate tactical planning into operational processes and procedures. Risk mitigation delivers on the decision assurance and information security promises made by risk management, and SSCPs make those promises and expectations become operational reality. SSCPs participate in this process in many ways, as you’ll see in this chapter. First, we’ll focus on the “what” and the “why” of integrated defense in depth and examine how SSCPs carry out its tactics, techniques, and procedures. Then we’ll look in more detail at how organizational leadership and management need the SSCP’s assistance in planning, managing, administering, and monitoring ongoing risk mitigation efforts as part of carrying out the defense-in-depth strategic plan (discussed in Chapter 3). The SSCP’s role in developing, deploying, and sustaining the “people power” component of organizational information security will then demonstrate how all of these seemingly disparate threads can and should come together. We’ll close by looking at some of the key measurements used to plan, achieve, and monitor risk management and mitigation efforts.

From Tactical Planning to Information Security Operations

Chapter 3 showed how organizations can use risk management frameworks, such as NIST SP 800-37 Rev. 2 or ISO 31000:2018, to guide their assessment of risks that face the organization’s information and information technology systems. Making such assessments guides the organization from the strategic consideration of longer-term goals and objectives to the tactical planning necessary to implement information risk mitigation as a vital part of the organization’s ongoing business processes. One kind of assessment, the impact assessment, characterizes how important and vital some kinds of information are to the organization. This prioritization may be because of the outcomes, assets, or processes that use or produce that information, or because of how certain kinds of threats or vulnerabilities inherent in those information processes put the organization itself at risk.

The next step in the assessment process is seeking out the critical vulnerabilities and determining what it takes to mitigate the risks they pose to organizational goals, objectives, and needs. This vulnerability assessment is not to be confused with having a “vulnerability-based” or “threat-based” perspective on risks overall. The impact assessment has identified outcomes, processes, or assets that must be kept safe, secure, and resilient. Even if we started the impact assessment by thinking about the threats, or the kinds of vulnerabilities (in broad terms) that such threats could exploit, we now have to roll up our sleeves and get into the details of just how the information work actually gets done day by day, week by week. Four key ideas helps SSCPs keep a balanced perspective on risk as she looks to translate strategic thinking about information risk into action plans that implement, operate, and assess the use of risk management controls:

First, we make strategic choices about which risks to pay attention to—to actively work to detect, deter, avoid, or prevent. In doing so, we also quite naturally choose which risks to just accept or ignore. These choices are driven by our sense of what’s important to the survival of the organization, its growth, or its other longer-term objectives. Then we decide what to “cure” or “fix” somehow.
Second, we must remember that many words we use to talk about risk—such as mitigation—have multiple meanings as we shift from strategic, through tactical, and into day-to-day operations. Mitigate and remediate, for example, can often be used to refer to applying patches to a system, or even to replacing components or subsystems with ones of completely different design; other times, we talk about mitigating a risk by taking remedial (curative, restorative) actions.
Third, all of these processes constantly interact with one another; there are no clean boundaries between one “step” of risk management and the next.
Finally, we must accept that we are never finished with information risk management and mitigation. We are always chasing residual risk, whether to keep accepting it or to take actions to mitigate or remedy it.

Chapter 3 also showed something that is vital to the success of information security efforts: they must be integrated and proactive if they are to be even reasonably successful when facing the rapidly evolving threat space of the modern Internet-based, Web-enabled world. By definition, an integrated system is one that its builders, users, and maintainers manage. More succinctly: unmanaged systems are highly vulnerable to exploitation; well-managed systems are still vulnerable, but less so. We’ll look further into this paradigm both here and in subsequent chapters.

We are now ready to cross the boundary between strategic risk management and tactical risk mitigation. For you to fully grasp the speed and agility of thought that this requires, let’s borrow some ideas from the way military fighter pilots train to think and act in order to survive and succeed.

Operationally Outthinking Your Adversaries

Let’s focus on the key difference between planning and operations. Planning is a deliberate, thoughtful process that we engage in well in advance of the time we anticipate we’ll need to do what our plans prescribe. It asks us to investigate; gather data; understand the stated and unstated assumptions, needs, constraints, and ideals—all of which we try to bring together into our plan. Planning is a balancing act; we identify tasks we need to do; we estimate the people, money, material, and time we’ll need to accomplish those tasks; and then we trim our plan to fit the resources, time, and people available to us. It’s an optimization exercise. Planning prepares us to take action; as Dwight D. Eisenhower, 34th president of the United States and Supreme Allied Commander, Europe, during World War II, famously said, “Plans are worthless, but planning is indispensable.” Making plans, reviewing them, exercising them, and evaluating and modifying them trains your mind to think about how tasks and resources, time, and space fit together. By planning, replanning, reviewing, and updating your plans as part of your “security normal,” you build an innate sense of how to make that “fit” achieve your objectives—and what to do when things just don’t fit!

Plans should lead to process engineering and design tasks, in which we thoughtfully create the procedures, processes, and tools that our workforce will use day to day. Planning should reveal the need for training and human resources development. Planning should bring these needs together and show us how to recognize the moment in which all of the physical, logical, and administrative steps have been taken, our people are trained, and testing has verified that we’re ready to open our doors and start doing business. Once that “initial operational capability” milestone has been reached, and once we’ve delivered the “minimum operational increment of capability” that our users can accept, we switch from planning to operations. We do what’s been planned.

No Plan Survives Contact with Reality

Plans are a set of predictions that rest on assumptions. Plans address the future, and to date, none of us has 100% perfect foresight. Think about all of the assumptions made during the business impact analysis (BIA) process, which we worked through in Chapter 3, and ask, “What if most of them are wrong?” A clear case in point is the underlying assumption of cryptography; we can protect our information today by encoding it in ways that will take far more time, money, and effort than adversaries will find it worth their while to attempt to crack. (This is sometimes called “sprinkling a little crypto dust” over your systems, as if, by magic, it will fix everything.) Your super-secure password that might take a million years of CPU time just might crack on the first guess! (It’s not very probable…but not impossible!) Your thorough audit of your IT infrastructure just might miss a backdoor that a developer or engineer put in and “forgot” to tell you about. Your penetration testing contractor might have found a few more vulnerabilities than they’ve actually told you about. The list of surprises like this is, quite frankly, endless.

Since your plans cannot be perfect, you have to be able to think your way through a surprising situation. This requires you to take the time to think, especially in the heat of battle during an IT security incident.

And if your adversary can deny you that “thinking time,” if they can push you to react instead of thoughtfully considering the situation and the facts on hand and considering the situation in the context of your own objectives, you fall prey to your adversary outthinking you.

How do you avoid this?

Observe, Orient, Decide, Act

The four steps of observe, orient, decide, and act, known as the OODA loop, provide a process by which you can keep from overreacting to circumstances. First observed in studies conducted by Colonel John Boyd, USAF, of U.S. combat fighter pilots during the Vietnam War, it has become a fundamental concept in fields as diverse as law enforcement training, business and leadership, cybernetics and control systems design, artificial intelligence, and information systems design and use. If you can master the OODA loop and make it part of your day-to-day operational kit bag, you can be the kind of SSCP who “keeps their head, when all about you are losing theirs and blaming it on you,” as Kipling put it so adroitly.

Figure 4.1 shows the OODA loop, its four major steps, and the importance of feedback loops within the OODA loop itself. It shows how the OODA loop is a continually learning, constantly adjusting, forward-leaning decision-making and control process.

Schematic illustration of John Boyd's OODA loop — FIGURE 4.1 John Boyd’s OODA loop

Observe: Look around you! Gather information about what is happening, right now, and what’s been happening very recently. Notice how events seem to be unfolding; be sensitive to what might be cause and effect being played out in front of you. Listen to what people are saying, and watch what they are doing. Look at your instruments, alarms, and sensors. Gather the data. Feed all of this into the next step.
Orient: Apply your memory, your training, and your planning! Remember why you are here—what your organization’s goals and objectives are. Reflect upon similar events you’ve seen before. Combine your observations and your orientation to build the basis for the next step.
Decide: Make an educated guess as to what’s going on and what needs to be done about it. This hypothesis you make, based on having oriented yourself to put the “right now” observations in a proper mental frame or context, suggests actions you should take to deal with the situation and continue toward your goals.
Act: Take the action that you just decided on. Make it so! And go right back to the first step and observe what happens! Assess the newly unfolding situation (what was there plus your actions) to see if your hypothesis was correct. Check your logic. Correct your decision logic if need be. Decide to make other, different observations.

Getting Inside the Other Side’s OODA Loop

Think about Figure 4.1 in the context of two or more decision systems working in the same decision space, such as a marketplace. Suppliers and purchasers all are using OODA loops in their own internal decision making, whether they realize it or not. When the OODA loops of customers and suppliers harmonize with one another, the marketplace is in balance; no one party has an information advantage over the other. Now imagine if the customers can observe the actions of multiple suppliers, maybe even ones located in other marketplaces in other towns. If such customers can observe more information and think “around their OODA loop” more quickly than the suppliers can, the customers can spot better deals and take advantage of them faster than the suppliers can change prices or deliveries to the markets.

Let’s shift this to a less-than-cooperative situation and look at a typical adversary intrusion into an organization’s IT systems. On average, the IT industry worldwide reports that it takes businesses about 190 days to first observe that a threat actor has discovered previously unknown or unreported vulnerability and exploited it to gain unauthorized access to the business’s systems. It also takes about 170 days, on average, to find a vulnerability, develop a fix (or patch) for it, apply the fix, and validate that the fix has removed or reduced the risk of harm that the vulnerability could allow to occur. Best case, one cycle around the OODA loop takes the business from observing the penetration to fixing it; that’s 190 plus 170 days, or 12 months of being at the mercy of the intruder and any potential copycat attackers. By contrast, the intruder is probably running on an OODA loop that might take a few days to go from initially seeking a new target, through initial reconnaissance, to choosing to target a specific business. Once inside the target’s systems, the decision cycle time to seek information assets that suit the attacker’s objectives, to formulate actions, to carry out those actions, and then to cover their tracks might run into days or weeks. It’s conceivable that the attacker could have executed multiple exploits per week over those 12 months of “once-around-the-OODA” that the business world seems to find acceptable.

It’s worth emphasizing this aspect of the zero day exploit in OODA loop terms. The attacker does not need to find the vulnerability before anybody else does; she needs to develop a way to exploit it, against your systems, before that vulnerability has been discovered and reported through the normal, accepted vulnerability reporting channels, and before the defenders have had reasonable opportunity to become aware of its existence. Once you, as one of the white hats, could have known about it, it’s no longer a zero day exploit—just one you hadn’t implemented a control for yet.

Defeating the Kill Chain

In Chapter 1, “The Business Case for Decision Assurance and Information Security,” we introduced the concept of the value chain, which shows each major set of processes a business uses to go from raw inputs to finished products that customers have bought and are using. Each step in the value chain creates value—it creates greater economic worth, or creates more of something else that is important to customers. Business uses what it knows about its methods to apply energy (do work) to the input of each stage in the value chain. The value chain model helps business focus on improving the individual steps, the lag time or latency within each step and between steps, and the wastage or costs incurred in each step. But business and the making of valuable products is not the only way that value chain thinking can be applied.

Modern military planners adapted the value chain concept as a way to focus on optimally achieving objectives in warfare. The kill chain is the set of activities that show, step by step, how one side in the conflict plans to achieve a particular military objective (usually a “kill” of a target, such as neutralizing the enemy’s air defense systems). The defender need not defeat every step in that kill chain—all they have to do is interrupt it enough to prevent the attacker from achieving their goals, when their plans require them to.

It’s often said that criminal hackers and cyber threat actors only have to be lucky once, in order to achieve their objectives, but that the cyber defender must be lucky every day to prevent all attacks. This is no doubt true if the defender’s OODA loops run slower than those of their attackers. As you’ll see, it takes more than just choosing and applying the right physical, logical, and administrative risk treatments or controls to achieve this.

What Does Your IT Security Team Need to Make Its OODA Loops Effective?

Think about what this implies for your information security organization. To have its own OODA loops working properly, the information security team needs to be well informed by the systems it’s protecting and by the people who use them. The team needs to have a clear understanding of the goals and objectives, as well as the plans and processes that have been put in place to achieve those objectives. The team also needs to appreciate the larger context—how the world of information technology is changing every day, and how the world of the threat actors is changing, too.

How would you further refine those broad, general statements into specific, actionable needs? How would you go about setting specific criteria for success?

Operationalizing Risk Mitigation: Step by Step

Let’s start by taking apart our definition of risk mitigation (from Chapter 3), and see what it reveals in the day-to-day of business operations.

Risk mitigation is the process of implementing risk management decisions by carrying out actions that contain, transfer, reduce, or eliminate risk to levels the organization finds acceptable, which can include accepting a risk when it simply is not practical to do anything else about it.

Figure 4.2 shows the major steps in the risk mitigation process we’ll use here, which continues to put the language of NIST SP 800-37 and ISO 31000:2018 into more pragmatic terms. These steps are:

Assess the information architecture and the information technology architectures that support it.
Assess vulnerabilities, and conduct threat modeling as necessary.
Choose risk treatments and controls.
Implement risk mitigation controls.
Verify control implementations.
Engage and train users as part of the control.
Begin routine operations with new controls in place.
Monitor and assess system security with new controls in place.

Schematic illustration of Risk mitigation major steps — FIGURE 4.2 Risk mitigation major steps

The boundary between planning and doing, as we cross from Step 3 into Step 4, is the point where the SSCP helps the organization fit its needs for risk treatment and control into its no-doubt very constrained budget of people, money, resources, and time. In almost all circumstances, the SSCP will have to operate within real constraints. No perfect solution will exist; after all of your effort to put in place the best possible risk treatments and controls, there will be residual risk that the organization has by default chosen to accept. If you and your senior leaders have done your jobs well, that residual risk should be within the company’s risk tolerance. If it is not, that becomes the priority for the next round of risk mitigation planning!

Risk assessment on unmanaged systems?

This may seem a blinding flash of the obvious, but without having reasonably good asset inventories to work from, showing what systems, servers, networks, services, applications, endpoints, and data assets the organization uses and depends upon, it’s very hard to do meaningful risk assessment.

If you’re in that situation, jump ahead to the section titled “Manage the Architectures: Asset Management and Change Control,” later in this chapter. That’s your starting point.

You can bring assets under management and perform detailed risk assessment (and even mitigation) on them concurrently, asset by asset. This “discovery-based” assessment is not ideal, but it is probably better than waiting to find and manage all of the assets before assessing and responding to their risks.

Step 1: Assess the Existing Architectures

Let’s continue to peel the onion of defense in depth back, layer by layer, as we put information risk mitigation in action. We started with context and culture; now, we need to draw a key distinction between the organization’s information architecture (how people share information to make decisions and carry them out) and the information technology architecture (the hardware, software, and communications tools) that supports that people-centric sharing of information and decision.

Other chapters will look in greater technical and operational depth at specific layers of the information architecture or the technologies they depend on. In Chapter 5, “Communications and Network Security,” you’ll learn how the SSCP needs to address both the human and technological aspects of these important infrastructures. In Chapter 7, “Cryptography,” you’ll see how to apply and manage modern cryptographic techniques to almost every element of an information architecture. Chapter 8, “Hardware and Systems Security,” provides a closer look at systems security.

But before we get into the technological details, we first must map out the systems, processes, and information assets that are in use, right now, today, within our organization. All of those elements taken together are what we call the information architecture of the organization or business. Whether that architecture was well planned using the best design standards and templates, or it grew organically or haphazardly as users responded to changing needs and opportunities, is beside the point. The information architecture is what exists, and you as the SSCP must know it and understand it if you are to protect and preserve it. And if this statement holds for the information architecture, for that set of purposes, plans, ideas, and data, it holds doubly so for the underlying information technology architectures (note that many organizations don’t realize how many such architectures they really have!) that embody, support, and enable it.

Assessing the Information Architecture: People and Processes

The information architecture largely consists of the human and administrative processes that are the culture, context, process, and even the personality of the organization. You learned in Chapter 3 how vital it is to get this human-centric information architecture focused on the issue of information risk management. Now we need to consider how to take the results of that preparation activity, make them useful, and put them to use as we start developing risk mitigation plans.

Organizational Political and Cultural Context

No organization exists in a vacuum. It is a player in a marketplace; it is subject to written and unwritten norms and expectations that govern, shape, or constrain its actions and its choices. Laws and regulations also dictate what the organization can and cannot do, especially when it comes to how it keeps its information and decision systems safe, reliable, and secure. Laws and regulations may also require reporting or public disclosure of information, including information about information security incidents.

Organizational culture is the sum of all of the ways, written and unwritten, in which organizations make decisions and carry them out. Quite often, the organizational culture reflects the personalities and personal preferences of its founders, stakeholders, leaders, or key investors. Two key aspects of organizational culture that affect information security planning and operations are its willingness to accept or take risks and its need for control.

Being risk-averse or risk-tolerant is a measure of an appetite for risk, whether that risk is involved with trying something new or with dealing with vulnerabilities or threats. The higher the risk appetite, the more likely the organization’s decision makers are to accept risk or to accept higher levels of residual risk.

The need for control shows up in how organizations handle decision making. Hierarchically structured, top-down, tightly controlled organizations may insist that decisions be made “at the top” by senior leaders and managers, with rigidly enforced procedures dictating how each level of the organization carries out its parts of making those decisions happen. By contrast, many organizations rely on senior leaders to make strategic decisions, and then delegate authority and responsibility for tactical and operational decision making to those levels where it makes best sense. It is within the C-suite of officials (those with duty titles such as chief executive officer, chief financial or operations or human resources officer, or chief information officer) where critical decisions can and must be made if the organization is to attempt to manage information and information systems risk—let alone successfully mitigate those risks. The SSCP may advise those who advise the C-suite; more importantly, the SSCP will need to know what decisions were made and have some appreciation as to the logic, the criteria, and the assumptions that went into those decisions. Some of that may be documented in the BIA; some may not.

The Information Architecture: Business Processes and Decision Flow

Let’s look at this topic by way of an example. Suppose you’re working for a manufacturing company that makes hydraulic actuators and mechanisms that other companies use to make their own products and systems. The company is organized along broad functional lines—manufacturing, sales and marketing, product development, purchasing, customer services, finance, and so on.

The company may be optimized along “just-in-time” lines so that purchasing doesn’t stockpile supplies and manufacturing doesn’t overproduce products in excess of reasonable customer demand forecasts. Nevertheless, asks the SSCP, should that mean that sales and marketing have information systems access to directly control how the assembly line equipment is manufacturing products today?

Let’s ask that question at the next level down—by looking at the information technologies that the organization depends on to make its products, sell them, and make a profit.

One approach might be that the company makes extensive use of computer-aided design and manufacturing systems and integrated planning and management tools to bring information together rapidly, accurately, and effectively. This approach can optimize day-to-day, near-term, and longer-term decision making, since it improves efficiency.

Another information architecture approach might rely more on departmental information systems that are not well integrated into an enterprise-level information architecture. In these situations, organizations must depend on their people as the “glueware” that binds the organization together.

Once again, the SSCP is confronted with needing insight and knowledge about what the organization does, how it does it, and why it does it that way—and yet much of that information is not written down. For many reasons, much of what organizations really do in the day-to-day of doing their business isn’t put into policies, procedures, or training manuals; it’s not built into the software that helps workers and managers get jobs done. This tacit, implied knowledge of who to go to and how to get things done can either make or break the SSCP’s information security plans and efforts. The SSCP is probably not going to be directly involved in what is sometimes called business process engineering, as the company tries to define (or redefine) its core processes. Nor will the SSCP necessarily become a knowledge engineer, who tries to get the tacit knowledge inside coworkers’ heads out and transform it into documents, procedures, or databases that others can use (and thereby transform it into explicit knowledge). It’s possible that the BIA provides the insights and details about major elements of the information technology architecture, in which case it provides a rich starting point to begin mitigation planning. Nonetheless, any efforts the company can make in these directions, to get what everybody knows actually written down into forms that are useful, survivable, and repeatable, will have a significant payoff. Such process maturity efforts can often provide the jumping-off point for innovation and growth. It’s also a lot easier to do process vulnerability assessments on explicit process knowledge than it is to do them when that knowledge resides only inside someone’s mind (or muscle memory).

With that “health warning” in mind, let’s take a closer look at what the organization uses to get its jobs done. In many respects, the SSCP will need to reverse engineer how the organization does what it does—which, come to think of it, is exactly what threat actors will do as they try to discover exploitable vulnerabilities that can lead to opportunities to further their objectives, and then work their way up to choosing specific attack tools and techniques.

Assessing the IT Architecture: Systems, Networks, and Service Providers

The information technology architecture of an organization is more than just the sum of the computers, networks, and communications systems that the business owns, leases, or uses. The IT architecture is first and foremost a plan—a strategic and tactical plan—that defines how the organization will translate needs into capabilities; capabilities into hardware, software, systems, and data; and then manage how to deliver, support, and secure those systems and technologies. Without that plan—and without the commitment of senior leadership to keep that plan up to date and well supported—the collection of systems, networks, and data is perhaps little more than a hobby shop of individually good choices that more or less work together.

The development of an IT architecture (as a plan and as a system of systems) is beyond the scope of what SSCPs will need to know; there are, however, a few items in many IT architectures and environments that are worthy of special attention from the SSCP.

One good way of understanding what the organization’s real IT architecture is would be to do a special kind of inventory of all the hardware, software, data, and communications elements of that architecture, paying attention to how all those elements interact with one another and the business processes that they support. Such an information technology baseline provides the foundation for the information security baseline—in which the organization documents its information security risks, its chosen mitigation approaches, and its decisions about residual risk. The good news is that many software tools can help the SSCP discover, identify, and validate the overall shape and elements of the information technology baseline, and from that start to derive the skeleton of your information security baseline. The bad news? There will no doubt be lots of elements of both baselines that you will have to discover the old-fashioned way: walk around, look everywhere, talk with people, take notes, and ask questions.

Think back to the security baseline we defined in Chapter 3, which was a living repository of information asset classification, categorization, compliance, and controls information. Whether that baseline is expanded to include systems implementation and procedural details, as in the previous paragraph, or those details are kept separately but linked to the security baseline, is something that organizations should consider.

Key elements of the IT architecture that this baseline inventory should address would include:

End-user IT equipment and systems, including applications installed on those systems
Key services that directly support business functions and business processes
Centralized (shared) servers for processing, data storage, and communications
Key business applications (sometimes called platforms) that provide integrated sets of services and the databases that they use and support
Network, communications, and other interfaces that connect elements of the organization (and the IT systems that they use) together
External service providers, partners, or other organizations that interface with, use, or make up a part of the organization’s IT architecture
All software, whether licensed, unlicensed (freeware), or developed in-house
Backup and recovery capabilities
Archival storage

Whether the SSCP is building the organization’s first-ever IT architecture baseline or updating a well-established one, the key behavior that leads to success is asking questions. What is it? Where is it? Why is it here (which asks, “How does it support which business process”)? Who is responsible for it? Who uses it? What does it connect to? When is it used? Who built it? Who maintains it? What happens when it breaks?

Let’s take a closer look at some of the special cases you may encounter as you build or update your organization’s IT architectural baseline.

Special Case 1: Standalone Systems and “Shadow IT”

Many organizations consider all of their IT assets, systems, software, and tools to be part of one large system, regardless of whether they are all plugged in together into one big system or network. Other organizations will have their IT systems reflect the way that work groups, departments, and divisions interact with one another. How the organization manages that system of systems often reflects organizational culture, decision-making styles, and control needs. Sadly, many organizational systems just grow organically, changing to meet the needs, whims, and preferences of individual users, departments, and stakeholders.

Let’s look at two classes of systems that might pose specific information risks:

Standalone systems exist to meet some specific business need but are not as integrated into organizational systems planning, management, and control as other systems are. Some of these loosely coupled (or poorly integrated) systems may be kept apart for valid reasons, such as to achieve a more cost-effective solution to data protection needs or to support product, software, or systems development and testing. Oftentimes, legacy systems are only loosely coupled to the main IT systems. These older systems, possibly based on obsolete technologies, have been literally inherited from earlier business ventures or organizational structures, or because of a lack of investment funding to modernize them. Users may have to carry data manually, on tapes or disks, to and from the main IT environment to these legacy systems, for example. This can introduce any number of vulnerabilities and thereby increase the exposure to risk.
Shadow IT is the somewhat pejorative term used by many IT departments to refer to data and applications programs that are outside of the IT department’s areas of responsibilities and control. Organizations that are large enough to have an IT department often rely on well-defined, managed sets of software, data, and tools that they provide to users across the organization. In doing so, the IT department also can be better poised to support those formally deployed systems. The so-called shadow IT applications and systems are created by talented (or merely well-intended) users on their own, often using the powerful applications programs that come with modern office and productivity suites. One user might build a powerful spreadsheet model, for example, that other users want to use; they in turn generate other spreadsheets, or other versions of the original one, to meet their needs. Their needs may serve legitimate business needs, but their design approach of one spreadsheet after another often becomes unsustainable. As shadow IT applications proliferate and become unsustainable, they expose the organization to an ever-increasing risk that key data will become unavailable, unreliable, lost, or exposed to the wrong set of eyes.

Special Case 2: Networks

Chapter 5 will address the key technical concepts and protocols involved with modern computer and communications networks. At this point, the key concept that SSCPs should keep in mind is that networks exist because they allow one computer, at one location, to deliver some kind of service to users at other locations. Those users may be people, software tasks running on other computers, or a combination of both people and software. Collectively, we refer to anything requesting a service from or access to an information asset as a subject; the service or asset they are requesting we call an object.

This idea or model of subjects requesting services is fundamental to all aspects of modern information technology systems—even standalone computers that support only a single person’s needs make use of this model. Once an organization is providing services over a network, the problem of knowing who is requesting what be done, when and how, and with what information assets, becomes quite complicated.

Special Case 3: Clouds, Service Bureaus, Other External IT Systems Providers

End users need to get work done to satisfy the needs of the organization. End users rely on services that are provided and supported by the IT architecture; that architecture is made up of service providers, who rely on services provided by other levels of providers, and on it goes. (“It’s services all the way down,” you might say.) Over time, many different business models have been developed and put into practice to make the best use of money, time, people, and technology when meeting all of these service needs.

The IT architecture baseline needs to identify any external person, agency, company, or organization that fulfills a service provider role. Most of these external relationships should have written agreements in place that specify responsibilities, quality of service, costs, billing, and support for problem identification, investigation, and resolution. These agreements, whether by contract, memoranda of understanding, or other legal forms, should also lay out each party’s specific information security responsibilities. In short, every external provider’s CIA roles and responsibilities should be spelled out, in writing, so that the SSCP can include them in the security baseline, monitor their performance and delivery of services, and audit for successful implementation and compliance with those responsibilities.

Virtual or “Software-Defined” Service Provision

“Doing it in the cloud” is the most recent revolution in information technology, and like many such revolutions, it’s about ideas and approaches as much as it is about technologies and choices. At the risk of oversimplifying this important and complex topic just now, let’s consider that everything we do with information technology is about getting services performed. Furthermore, all services involve using software (which runs on some hardware, somewhere) to make other software (running on some other hardware, perhaps) to do what we need done, and give us back the results we need. We’ll call this the service provision model, and it is at the heart of how everything we are accustomed to when we use the Web actually works. Let’s operationalize that model to see how it separates what the end user cares about from the mere details of making the service happen.

Users care most about the CIA aspects of the service, seen strictly from their own point of view:

Can only authorized users or subjects ask for it?
Is the service delivered on time, when I need it?
Does the service work accurately, reliably, and repeatedly?
When it fails, does the service go through graceful degradation or fail in some kind of safe mode so that errors in input or stored data do not lead to misleading or harmful results?
Are unauthorized users prevented from seeing the results of the service?

By contrast, the service provider has to care about CIA from both its users’ perspective and its own internal needs:

Can I keep unauthorized subjects from accessing this service, its internal data or logic, or its outputs?
Where are the computers, storage facilities, and communications or networks that I need to host the service, run it when needed, and accept inputs from and deliver outputs to the authorized user?
Can I validate to users that the service is accurate and reliable?

This brings us to consider the “as-a-service” set of buzzwords, which we can think of in increasing order of how much business logic they implement or provide to the organization:

IaaS can mean infrastructure as a service. Infrastructures are sets of common services and their delivery mechanisms that are used by many different kinds of end users. Whether we are homeowners, hoteliers, or helicopter maintenance depot operators, we all use electric power, need clean drinking water, and depend on sewer and garbage disposal services to keep things clean and safe. These services come to us much like any commodity (a gallon of water is a gallon of water, anywhere we go), and if the service delivery meets published administrative and technical standards, we call it an infrastructure service.
IaaS is sometimes confused with IDaaS, when talking about identity as a service. This service enables individuals organizations, and even software processes can have sets of credentials established that attest to their identity. The goal of identity as a service is to provide organizations (and people) with increasingly more reliable ways to unambiguously confirm that a user, business, or software agent is in fact who they claim to be. (We’ll see how this works and how we use this service in Chapter 6, “Identity and Access Control.”)
SaaS, or software as a service, typically involves using general-purpose software systems or suites, such as Microsoft Office or Open Office, without having to directly install them on each end user’s hardware. Although almost all businesses write many of the same kinds of documents, their individual corporate business logic takes form as they use these software suites and exists outside of the software products themselves.
PaaS, or platform as a service, usually refers to a large, complex set of business functions implemented as a complete system (such as an insurance company’s claims submission and processing system). Some PaaSs provide generalized features, such as customer relationship management capabilities, that can be highly tailored to meet an individual company’s needs; others reflect the predominant business logic of an entire industry and provide near-turn-key capabilities that need little modification by most users. One example might be a medical office billing, accounts management, and insurance processing PaaS, suitable for clinical practices in the U.S. marketplace. (It probably wouldn’t work very well in the United Kingdom, or in Colombia, without extensive tailoring to local needs.)

We’ll examine this topic in more detail in Chapter 9, “Applications, Data, and Cloud Security”; for right now, it’s important to remember that ultimately, the responsibilities of due care and due diligence always remain with the owners, managing directors, or chief executives of the organization or business. For the purposes of building and updating the IT architecture baseline, it’s important to be able to identify and specify where, when, how, and by whom these buzzwords are implemented in existing service provider relationships.

With the baseline in hand, the SSCP is ready to start looking at vulnerability assessments.

Step 2: Assess Vulnerabilities and Threats

We’ve looked at how badly it will hurt when things go wrong; now, let’s look at how things go wrong.

The IT architecture baseline links IT systems elements to business processes; the planning we’ve done so far then links key business processes to prioritized business goals and objectives. That linking of priorities to architectural elements helps the SSCP focus on which information assets need to be looked at first to discover or infer what possible vulnerabilities may be lurking inside. It’s time for you as the SSCP to use your technical insight and security savvy to look at systems and ask, “How do these things usually fail? And then what happens?”

“How do things fail?” should be asked at two levels: how does the business process fail, and how does the underlying IT element or information asset fail to support the business process, and thus cause that business process to fail?

This phase of risk mitigation is very much like living the part of a detective in a whodunit. The SSCP will need to interview people who operate the business process, as well as the people who provide input to it and depend on its outputs to do their own jobs. Examining any trouble reports (such as IT help ticket logs) may be revealing. Customer service records may also start to show some patterns or relationships—broken or failing processes often generate customer problems, and out-of-the-ordinary customer service needs often stress processes to the breaking point.

It’s all about finding the cause-and-effect logic underneath the “what could go wrong” parts of our systems. Recall from Chapter 3 our discussion of proximate cause and root cause:

Finding the root cause of a vulnerability helps us focus on what to fix so that we can eliminate or reduce the likelihood of that component or system failing.
Finding the proximate cause helps us find better ways to detect failures while they are starting to happen, contain the damage that they can cause, and offers an opportunity to take corrective action before the damage gets worse.

Both are valuable ideas to keep in mind as we look through our systems for “Where can it break and why?” While you’re doing that, it’s also important to keep asking “how would we know this has failed?” In this way you combine your search for possible vulnerabilities with identifying candidate indicators of compromise (IOCs).

Think about the chain of events that can lead from proximate to root cause—from the first stone shaken loose on the mountaintop to the avalanche it triggers. Each of those events in a failure of your systems gives off signals; some of those signals may be clear enough to be useful as warning flags of an imminent or ongoing compromise, intrusion, attack, or failure of a security system. These are the IOCs you need to pay prompt attention to. These become the alarms that trigger your response systems and people into action.

Keeping “How Things Work” Separate from “Why Things Fail”

It’s tempting to combine this step and the previous one, since both involve a lot of investigative work and probably interviewing a lot of the people who use different IT and information processes to get their jobs done. You will, after all, need to correlate what people and data reveal about the purpose and design of information architectures, and the IT that supports and enables them, with the information you discover about how they fail, what happens when they fail, and what happens when such failures are reported to management. As the information risk analyst in all of this, you as an SSCP need to be able to find the connections between all of these disparate, often contradictory data sets and perspectives.

However…

Think about the two very different frames of mind, or the different emotional perspectives, your interview subjects might be in. People often respond to “How does this work?” kinds of questions by offering information and insight; people naturally want to be helpful, and quite often they will want to show their pride in their mastery of particular parts of their jobs. Now consider coming to that same person and asking, “How does this process fail? What breaks in it? What happens when it fails?” These kinds of questions put people on the defensive; subconsciously, they will believe you are actually asking them, “What did you do that broke it?”

How do you avoid putting people on the defensive when you’re asking, “Why does it break when you use it?” kinds of questions? Most of this can be done by the way in which you plan your information-gathering efforts, in particular by spelling out detailed questions you wish to ask each person you speak with. The rest of it is, however, subject to your own personal approach and mannerisms when you’re dealing with someone who just might become defensive or hostile to your inquiries. And all of that is beyond the scope of this book and the SSCP exam.

Vulnerability Assessment as Quality Assurance

In many respects, vulnerability assessment is looking at the as-built set of systems, processes, and data and discovering where and how quality design was not built into them in the first place! As a result, a lot of the same tools and processes we can use to verify correct design and implementation can help us identify possible vulnerabilities:

Data quality assurance looks end-to-end at everything involved in the way the organization acquires external data, how it generates its own data internally, and what it does with the data. It captures the business logic that is necessary to say that a given input is correct in content, in meaning, and in format. Then it enforces business logic that restricts the use of that data to valid, authorized processes and further specifies who in the organization can use those processes. In addition, data quality can and should identify exceptions—cases where an unforeseen combination of data values requires human (supervisory or managerial) decisions regarding what, if anything, to do with such exceptions. Data quality management plans, procedures, detailed implementation notes, and the underlying data models themselves are important inputs to the vulnerability assessment. We’ll cover this in more detail in Chapter 9.
Software quality assurance also is (or should be!) an end-to-end process, which starts with the system functional requirements that document what the software needs to do to properly implement business logic and deliver the correct results to the end users. Software development processes should ensure that all required functions are met by the software, and that it does nothing else. “Side effects,” or “undocumented features,” quite often end up becoming the next set of zero day exploits. Software design walk-throughs and reviews, software testing, and end-user acceptance testing are some of the processes that organizations should use to control the risk that the software they are building is correct, complete, safe to use, reliable, and resilient. End-user operational documentation also provides a great opportunity for the organization to get its software used correctly. All of these processes produce information—meeting minutes, inspection or walk-through logs, trouble tickets or help desk complaints, and requests for change; all of these should be examined as part of the vulnerability assessment. Chapter 9 will go into this in more detail.
Software source code, the builds and controls libraries that support assembling it into finished products, and the finished executable systems can also be analyzed with a wide variety of tools, as you’ll see in Chapter 9.
Communications and network systems should have suitable features built in and turned on so that usage can be monitored and controlled. Whether it’s simply to prevent staff members from surfing too many YouTube videos on company time, or to prevent the exfiltration of critical, private data out of the organization, control of how the company-provided communications assets are used is vital to information security.

Even the company suggestion box should be examined for possible signs that particular business processes don’t quite work right or are in need of help.

That’s a lot of information sources to consider. You can see why the SSCP needs to use prioritized business processes as the starting point. A good understanding of the information architecture and the IT architectures it depends on may reveal some critical paths—sets of processes, software tools, or data elements that support many high-priority business processes. The software, procedural, data, administrative, and physical assets that are on those critical paths are excellent places to look more deeply for evidence of possible vulnerabilities.

Sharing Vulnerability and Risk Insight: A Community of Practice Approach

As you saw in Chapter 2, you and your organization are not alone in the effort to keep your information systems safe, secure, resilient, and reliable. There are any number of communities of practice with which you can share experience, insight, and knowledge:

Critical infrastructure protection and assurance communities, such as InfraGard in the United States, bring together public agencies, law enforcement and national security specialists, and private sector businesses in ways that encourage trust and dialogue.
The Computer Society of the Institute of Electrical and Electronics Engineers sponsor many activities, such as their Center for Secure Design. See https://cybersecurity.ieee.org/center-for-secure-design/ for ideas and information that might help your business or organization.
Many local universities and community colleges work hand-in-hand with government and industry to achieve excellence in cybersecurity education and training for people of all ages, backgrounds, and professions.

You also have resources such as Mitre’s Common Vulnerabilities and Exposures (CVE) system and NIST’s National Vulnerability Database that you can draw upon as you assess the vulnerabilities in your organization’s systems and processes. Many of these make use of the Common Vulnerability Scoring System (CVSS), which is an open industry standard for assessing a wide variety of vulnerabilities in information and communications systems. CVSS makes use of the CIA triad of security needs (introduced in Chapter 1, “The Business Case for Decision Assurance and Information Security”) by providing guidelines for making quantitative assessments of a particular vulnerability’s overall score. Scores run from 0 to 10, with 10 being the most severe of the CVSS scores. Although the details are beyond the scope of the SSCP exam, it’s good to be familiar with the approach CVSS uses—you may find it useful in planning and conducting your own vulnerability assessments.

As you can see at https://nvd.nist.gov/vuln-metrics/cvss, CVSS consists of three areas of concern:

Base metrics, which assess qualities intrinsic to a particular vulnerability. These look at the nature of the attack, the attack’s complexity, and impacts to confidentiality, integrity, and availability.
Temporal metrics, which characterize how a vulnerability changes over time. These consider whether exploits are available in the wild, and what level of remediation exists; they also consider the level of confidence in the reporting about a vulnerability and exploits related to it.
Environmental metrics, which assess dependencies on particular implementations or systems environments. These include assessments of collateral damage, what percent of systems in use might be vulnerable, and the severity of impact of an exploit (ranging from minimal to catastrophic).

Each of these uses a simple scoring process—impact assessment, for example, defines four values from Low to High (and “not applicable or not defined”). Using CVSS is as simple as making these assessments and totaling up the values.

Note that during reconnaissance, hostile threat actors use CVE and CVSS information to help them find, characterize, and then plan their attacks. The benefits we gain as a community of practice by sharing such information outweighs the risks that threat actors can be successful in exploiting it against our systems if we do the rest of our jobs with due care and due diligence.

Start with the CVE?

An obvious question is, “Should I start my vulnerabilities assessment by rounding up what published CVE data says about my systems and their components?” Two concerns ought to be recognized before you make this leap of faith:

CVE-based assessment cannot find vulnerabilities in assets and systems that your organization doesn’t know (or remember) it is using.
Too much reliance on CVE as your source of insight can lull you into missing out on vulnerabilities your business has unwittingly built into or papered over in its business logic and business processes.

It’s also fair to say that the absence of a reported vulnerability is not proof of the absence of that vulnerability. Zero day exploits are great examples of this; a hacker discovers a new vulnerability and exploits it before anyone is aware that the vulnerability existed in the first place.

So, by all means—gather published CVE data on every known systems element, component, installed software product, and everything else you can find. But don’t stop there as you investigate or consider how things might break.

Build (and use) your Risk Register

As you can see, the risk and vulnerability assessment process is an example of knowledge engineering or knowledge discovery in action: you and your associates are mining every source of information available to you to learn how your IT and OT systems can be broken, and what kind of loss or impact that might result in. Many organizations use a risk register as a central repository of all of this data, information, and knowledge. It is (or should be) a living repository, not a static document that’s produced as a report and then left to go out of date. Risk registers can take on many forms, and do need to be tailored to meet the organization’s needs. More powerful security systems such as SIEM (security information and event management) and SOAR (security orchestration and automated response), as well as managed security services, usually provide the capabilities you need to build a risk register to suit your purposes.

A Word about Threat Modeling

If you picture a diagram of your information architecture (or IT architecture), you’ll notice that you probably can draw boundaries around groups of functions based on the levels of trust you must require all people and processes to have in order to cross that boundary and interact with the components inside that space. The finance office, for example, handles all employee payroll information, company accounting, and accounts payable and receivable, and would no doubt be the place you’d expect to have access to the company’s banking information. That imaginary line that separates “here there be finance office functions” from the larger world is the threat surface—a boundary that threats (natural, accidental, or deliberate) have to cross in order to access the “finance-private” information inside the threat surface. The threat surface is the sum total of all the ways that a threat can cross the boundary:

A physical threat surface might be the walls, doors, locks on the doors, and other physical barriers that restrict the movement of people and information into and out of the finance office. A wiretap on a phone or a USB device plugged into a computer in the finance office would be examples of threats crossing that physical threat surface.
A logical threat surface might be the user authentication and authorization processes that control who can access, use, extract, or change finance office information. A hacker who found a backdoor on a finance office computer would be violating the logical threat surface.
An administrative threat surface would be the set of policies, procedures, and instructions that separate proper, authorized use from unauthorized use. Such a policy might ban the use or entry of smartphones, USB thumb drives, and so forth in the finance office; blocking all of the USB ports on the devices, or having pat-down inspections of people going in and out to prevent them from carrying in a smartphone, would be physical implementations of that administrative control. Failing to search the janitor’s trash bags to ensure that someone isn’t “inadvertently” throwing away payroll records, would be an example of a threat crossing this physical threat surface.

You see the dilemma here: authorized users and uses cross the threat surface all the time, and in fact, you cannot achieve the “A” in CIA without providing that right of way. Yet the threat actors need to be detected when they try to cross the threat surface and prevented from getting across it—and if prevention fails, you need to limit how much damage they can do.

Threat modeling is the broad, general term given to the art and science of looking at systems and business processes in this way. It brings a few thoughts into harmony with one another in ways that the SSCP should aware of. First, it encourages you to encapsulate complex functions inside a particular domain, boundary, or threat surface. In doing so, it also dictates that you look to minimize ways that anything can cross a threat surface. It then focuses your attention on how you can detect attempts to cross, validate the ones that are authenticated and authorized, and prevent the ones that aren’t. Threat modeling also encourages you to account for such attempts and to use that accounting data (all of those log files and alarms!) both in real-time alert notification and incident response, and as a source of analytical insight.

As you grow as an SSCP, you’ll need to become increasingly proficient in seeing things at the threat surface.

How Does the SSCP Assess the Human Components?

“Trust, but verify” applies to the human element of your organization’s information processes too! You need to remember that every organization, large or small, can fall afoul of the disgruntled employee, the less-than-honorable vendor or services provider, or even the well-intended know-it-all on its staff who thinks that they don’t need to follow all of those processes and procedures that the rest of the team needs. The details of how such a personnel reliability program should be set up and operated are beyond the scope of the SSCP exam or this book. Part of this is what information security practitioners call the “identity and access control problem,” and Chapter 6 will delve into this in greater depth. From a vulnerability assessment perspective, a few key points are worth highlighting now.

The information security impact assessment is the starting point (as it is for all vulnerability assessments). It should drive the design of jobs so that users do not have capabilities or access to information beyond what they really need to have and use. In doing so, it also indicates the trustworthiness required for each of those jobs; a scheduling clerk, for example, would not have access to company proprietary design information or customer financial data, and so may not need to be as trustworthy as the firm’s intellectual property lawyers or its accountants. With the job defining the need for capabilities and information, the processes designed for each job should have features that enforce these constraints and notify information security officials when attempts to breach those constraints occur. The log files, alerts, and alarms or other outputs that capture these violations must be inspected, analyzed, and assessed in ways that give timely opportunity for a potential security breach (deliberate or accidental) to be identified and corrected before it is harmfully exploited.

Beyond (but hand in hand with) separation of duties, the business process owners and designers must ensure that no task is asking more of any system component—especially the human one—than it can actually be successful with. As with any computer-based part of your business logic, tasks that systems designers allocate to humans to perform must be something humans can learn how to do. The preconditions for the task, including the human’s training, prior knowledge, and experience, must be identified and achievable. Any required tools (be they hammers or database queries) must be available; business logic for handling exceptions, out-of-limits conditions, or special needs have to be defined and people trained in their use. Finally, the saying “if you can’t measure it, you can’t manage it” applies to assessing the reliability of the human component as much as it does to the software, systems, and other components of a business process.

This combination of ingredients—separation of duties, proper task design, meaningful performance monitoring and assessment, and ongoing monitoring to detect errors or security concerns—reduces the risks that an employee is overstressed, feels that they are undervalued, or is capable of taking hostile action if motivated to do so.

Chapter 9 will address other aspects of how the human resources the organization depends on can be more active and effective elements in keeping the organization’s information safe, secure, and resilient.

Don’t Forget the Administrative Controls!

The administrative controls—the people-facing policies and procedures that dictate what should be done, how it should be done, why, and by whom—are often overlooked when conducting a vulnerability assessment. Since most of the headline-grabbing IT systems breaches and information security incidents exploit administrative process vulnerabilities and human frailties, it should be painfully obvious that SSCPs need to pay as close attention to vulnerabilities in the people-driven processes as they do the ones in the hardware, software, and data elements of the information architectures that drive and support the organization.

Key vulnerabilities can exist in the processes used for management and control of all information assets, systems, and baselines. Configuration management and change control, user account provisioning, new IT or information systems project planning and management, and especially the help desk processes throughout the organization should be key parts of your vulnerability assessment activities.

Each such vulnerability in these people-powered processes is an opportunity to increase the level of security awareness, and to strengthen the culture of security accountability across the organization’s work force. It’s also a great opportunity to get management and leadership to visibly support such a security hygiene mind set and culture.

As SSCPs, we have a burden of due care and due diligence to actively find and exploit such opportunities.

Gap Analysis

Even the most well-designed information system will have gaps—places where the functions performed by one element of the system do not quite meet the expectations or needs of the next element in line in a process chain. When we consider just how many varied requirements we place on modern IT systems, it’s no wonder there aren’t more gaps rather than fewer! In general terms, gap analysis is a structured, organized way to find these gaps. In the context of information systems security, you do gap analysis as part of vulnerability assessment.

Several different kinds of activities can generate data and insight that feed into a gap analysis:

Review and analysis of systems requirements, design, and implementation documentation
Software source code inspection (manual or automated)
Review of software testing procedures and results
Inspections, audits, and reviews of procedures, facilities, logs, and other documentation, including configuration management or change control systems and logs
Penetration testing
Interviews with end users, customers, managers, as well as bystanders at the workplace

This last brings up an interesting point about the human element: as any espionage agency knows, it’s quite often the lowest-level employees in the target organization who possess the most valuable insight into its vulnerabilities. Ask the janitors, or the buildings and grounds maintenance staff; talk with the cafeteria workers or other support staff who would have no official duties directly involved in the systems you’re doing the gap analysis for. Who knows what you may find out?

A strong word of caution is called for: the results of your gap analysis could be the most sensitive information that exists in the company! Taken together, it is a blueprint for attack—it makes targets of opportunity easily visible and may even provide a step-by-step pathway through your defenses. You’d be well advised to gain leadership’s and management’s agreement to the confidentiality, integrity, and availability needs of the gap analysis findings before you have to protect them.

Gap Analysis: Voting on Election Day

At a typical polling place on Election Day, we see a sequence of activities something like this:

First, we see the initialization sequence:

Equipment, furniture, etc., is positioned, set up, and tested as required. Phone lines, power, and other communications are also verified to be connected and working. Building security is verified to be working.
Staff are selected and trained.
Voter rolls are provided to the polling place.

Next comes Election Day:

Staff arrive, open the facility, and get it ready for voting to begin.
Polls open, and voters can enter to vote.
On a per-voter basis, registration is verified against identity, and the voter votes. The voter leaves.
Polls close, and the staff begin the process of securing the ballots and generating their counts and any reports.
Ballot materials are secured for transport to a central election commission facility.

Once Election Day is over, and if no recount is needed, the polling place is decommissioned; equipment is removed, communications services are disconnected, and keys to the doors are given back to the facility owner or manager as required.

Gap analysis could be done in several ways:

Design reviews might find that there are ways to take blank ballot sheets and “vote” them fraudulently, or that logs and counts maintained throughout the day have enough procedural errors in them that we cannot rule out the “ghost vote.”
Penetration testing could be done to see if voting machines, polling places, or even staff can be hacked into or suborned.
Post-election audits could demonstrate that voter registration rolls, numbers of ballots cast, and other data indicate a potential for fraud.

Many nations are relying increasingly on Internet-enabled electronic voting systems, which may not be as reliable, safe, or secure as we need them to be. At the July 2017 DEFCON convention, for example, it took less than 90 minutes for teams of hackers to crack open 30 different computerized ballot box systems.

“It takes a thief to catch a thief” doesn’t mean that you have to hire untrustworthy felons to be part of your security team. You should, though, learn to think like a thief and do gap analysis like a hacker would. Do the virtual equivalent of walking around the building on the outside, trying all of the doors and windows and looking for places that the security cameras probably can’t see you; run your own network scanners to look for unsecured ports, and fingerprint the systems to see if they’re running old, outdated, unpatched software that’s known to be prey to exploits. Bring your Wi-Fi scanner, too, and see what kind of unsecured or poorly secured connections might be possible.

Step 3: Select Risk Treatment and Controls

We’ve mentioned before that the SSCP needs to help the organization find cost-effective solutions to its risk mitigation needs. Here’s where that happens. Let’s look at our terms more closely first.

Risk treatment involves all aspects of taking an identified risk and applying a set of chosen methods to eliminate or reduce the likelihood of its occurrence, the impacts it has on the organization when (not if) it occurs, or both. Note that we say “eliminate or reduce,” both for probability of occurrence and for the impact aspects of a given risk. The set of methods taken together constitute the risk controls that we are applying to that particular risk.

Unfortunately, the language about dealing with risks is not very precise. Many different books, official publications, and even widely accepted risk management frameworks like NIST SP 800-37 can leave some confusion. Let’s see if some simple language can help un-muddy these waters:

We decide what to do about a risk by selecting a risk treatment strategy or approach—such as to accept, avoid, treat, or transfer the risk.
When we decide to treat a risk, we may also choose a variety of physical, logical, or administrative control techniques.
When we’re done applying those controls, what’s left over from the original risk is the residual risk. We’ll deal with it another time, perhaps in next year’s plan and budget, or after the next major systems upgrade.

Risk Treatment Strategies

Chapter 3 introduced the need for risk managers to decide whether a particular risk should be deterred, prevented, detected, avoided, or treated. That’s not a one-time decision made only at strategic levels. As risk assessment looks more closely at each risk, those same types of decisions get made again.

When thinking about risk treatment at a more tactical or operational level, security and risk professionals often talk about whether to accept, transfer (or share), mitigate (or treat), and avoid or eliminate. These similar lists of terms, used at different levels of decision making, highlight that the same concept (deterring a risk, for example) may be a strategic choice first, followed later on by more granular choices about how deterrence as a security control approach is put to use to achieve that objective.

With that in mind, let’s take a closer look at the broad categories of risk treatment strategies, tactics, and techniques.

Accept

This risk treatment strategy means that you simply decide to do nothing about the risk. You recognize it is there, but you make a conscious decision to do nothing differently to reduce the likelihood of occurrence or the prospects of negative impact. This is known as being self-insuring—you assume that what you save on paying risk treatment costs (or insurance premiums) will exceed the annual loss expectancy over the number of years you choose to self-insure or accept this risk.

The vast majority of vulnerabilities in the business processes and context of a typical organization involve negligible damages, very low probabilities of occurrence, or both. As a result, it’s just not prudent to spend money, time, and effort to do anything about such risks. In some cases, however, the vulnerabilities can be extensive and the potential loss significant, even catastrophic, to the organization, but the costs involved to deal with the risk by means of mitigation or transfer are simply unachievable.

Another, more practical example can be found in many international business situations. Suppose your company chooses to open wholesale supply operations in an area where the telecommunications and transportation infrastructures can be unreliable. When these infrastructures deliver the services you need, your organization makes a profit and earns political and community support as nontangible rewards. That reliable delivery doesn’t happen all of the time, however. You simply cannot spend the money to install and operate your own alternative infrastructures. Even if you could afford to do it, you might risk alienating the local infrastructure operators and the larger political community, and you need all the goodwill from these people that you can get! As a result, you just decide to accept the risk.

Note that accepting a risk is not taking a gamble or betting that the risks won’t ever materialize. That would be ignoring the risk. A simple example of this is the risk of having your business (or your hometown!) completely destroyed by a meteor falling in from outer space. We know it could happen; we’ve even had some spectacular near misses in recent years, such as what happened over Chelyabinsk, Russia in February 2013. The vast majority of us simply choose to ignore this risk, believing it to be of vanishingly small probability of occurrence. We do not gather any data; we do not estimate probabilities or losses; we don’t even make a qualitative assessment about it. We simply ignore it, relegate it to the realm of big-box-office science fiction thrillers, and go on with our lives with nary another thought about it.

Proper risk acceptance is an informed decision by organizational leaders and stakeholders.

Transfer or Share

Transferring or sharing a risk means that rather than spend our own money, time, and effort to reduce, contain, or eliminate the risk, we assign responsibility for some or all of it to someone else. For example:

Insuring your home against fire or flood transfers the risk of repairing or replacing your home and possessions to the insurance company. You take no real actions to decrease the likelihood of fire, or the extent to which it could damage your home and possessions, beyond what is normally reasonable and prudent to do. You don’t redesign the home to put in more fire-retardant walls, doors, or floor coverings, for example. You paid for this via your insurance premiums.
In the event of a fire in your home, you have transferred the responsibility for dealing with the fire to the local emergency responders, the fire department, and even the city planners who required the builders to put water mains and fire hydrants throughout your neighborhood. You paid for this risk to be assumed by the city and the fire department as part of your property taxes, and perhaps even a part of the purchase price (or rent you pay) on your home.
You know that another nation might go to war with your homeland, causing massive destruction, death, injury, and suffering. Rather than taking up arms yourself, you pay taxes to your government to have it raise armed forces, train and equip them, and pursue strategies of deterrence and foreign relations to reduce the likelihood of an all-out war in our times.

Other ways of transferring risk might involve taking the process itself (the one that could incur the risk) and transferring it to others to perform as a service. Pizza tonight? Carry-out pizza incurs the risk that you might get into an accident while driving to or from the pizza parlor, but having the pizza delivered transfers that risk of accident (and injury or damage) to the pizza delivery service.

In almost all cases, transferring a risk is about transforming the risk into something somebody else can deal with for you. You save the money, time, and effort you might have spent to treat the risk yourself and instead pay others to assume the risk and deal with it.

There is a real moral hazard in some forms of risk transference, and the SSCP should be on alert for these. Suppose your company says that it doesn’t need to spend a lot of money dealing with information security, because it has a really effective liability insurance plan that covers it against losses. If thousands (or millions!) of customers’ personally identifying information is stolen by a hacker, this insurance policy may very well pay for losses that the company entails; the customers would need to sue the company or otherwise file a claim against it to recover from their direct losses to having their identity compromised or stolen. The insurance may pay all of those claims or only a portion of them, but only after each customer discovers the extent of the damages they’ve suffered and goes through the turmoil, effort, and expense of repairing the losses they’ve suffered, and then of filing a claim with the company. Perhaps the better, more ethical (and usually far less costly!) solution would have been to find and fix the vulnerabilities that could be exploited in ways that lead to such a data breach in the first place.

Remediate or Mitigate (Also Known as Reduce or Treat)

Simply put, this means that we find and fix the vulnerabilities to the best degree that we can; failing that, we put in place other processes that shield, protect, augment, or bridge around the vulnerabilities. Most of the time this is remedial action—we are repairing something that either wore out during normal use or was not designed and built to be used the way we’ve been using it. We are applying a remedy, a cure, either total or partial, for something that went wrong.

Do not confuse taking remedial action to mitigate or treat a risk with making the repairs to a failed system itself. Mitigating the risk is something you aim to do before a failure occurs, not after! Such remediation measures might therefore include the following:

Designing acceptable levels of redundancy into systems so that when components or elements fail, it does not cause critical business processes to halt or behave in harmful ways
Designing acceptable fail-safe or graceful degradation features into systems so that when something fails, a cascade of failures leading to a disaster cannot occur
Identifying acceptable amounts of downtime (or service disruption levels) and using these times to dictate design for services that detect and identify the failure, correct it, and restore full service to normal levels
Pre-positioning backup or alternate operations capabilities so that critical business functions can go on (perhaps at a reduced capacity or quality)
Identifying acceptable amounts of time by which all systems and processes must be restored to normal levels of performance, throughput, quality, or other measures of merit

Some vulnerabilities are best mitigated or treated by applying the right corrective fix—for example, by updating a software package to the latest revision level so that you are reasonably assured that it now has all the right security features and fixes included in it. Providing uninterruptible power supplies or power conditioning equipment may eliminate or greatly reduce the intermittent outages that plague some network, communications, and computing systems. The first (applying the software update) might be directly treating the vulnerability (by replacing a faulty algorithm with a more robustly designed one); providing power conditioning equipment is making up for shortcomings in the quality and reliability of the commercial power system and is a good example of bridging around or augmenting a known weakness.

Avoid or Eliminate

The logical opposite of accepting a risk is to make the informed decision to stop doing business in ways or in places that expose you to that risk. Closing a store in a neighborhood with a high crime rate eliminates the exposure to risk (a store you no longer operate cannot be robbed, and your staff who no longer work there are no longer at risk of physical assault during such a robbery).

You avoid a risk either by eliminating the activity that incurs the risk or moving the affected assets or processes to locations or facilities where they are not exposed to the risk. Suppose you work for a small manufacturing company in which the factory floor has some processing steps that could cause fire, toxic smoke, and so forth to spread rapidly through the building. The finance office probably does not need to be in this building—avoid the risks to your accountants, and avoid the possible financial disruption of your business, by moving those functions and those people to another building. Yet the safety systems that are part of your manufacturing facility probably can’t be moved away from the equipment they monitor and the people they protect; at some point, the business may have to decide that the risk of injury, death, destruction, and litigation just aren’t worth the profits from the business in the long run.

Recast

This term refers to the never-ending effort to identify risks, characterize them, select the most important ones to mitigate, and then deal with what’s left. As we’ve said before, most risk treatments won’t deal with 100% of a given risk; there will be some residual risk left over. Recasting the risk usually requires that first you clearly state what the new residual risk is, making it more clearly address what still needs to be dealt with. From the standpoint of the BIA, the original risk has been reduced—its nature, frequency, impact, and severity have been recast or need to be described anew so that future cycles of risk management and mitigation can take the new version of the risk into consideration.

Residual Risk

This has been defined as the risk that’s left over, unmitigated, after you have applied a selected risk treatment or control. Let’s look at this more closely via the following example.

Residual Risk to PII in the Medical Insurance Industry

Suppose you work for a company that provides medical insurance claims processing support in the United States; the company has patient account records for upward of 6 million individual patients who file claims on private and public insurance providers. It uses Web-based front-end applications to support patient claims processing; care provider billing and accounts management; and claims status, accounting, and reporting for the insurance providers, along with all of the related tax and other regulatory filings that are needed. One identified risk is that somebody could conceivably download the entire patient/claimant database and extract PII or other valuable information from it without your knowledge or consent.

Mitigation 1: Separate testing and software development systems so that “live” patient/claimant data cannot be used on test systems and test data cannot be used on the production systems. This reduces the risk that poorly tested software could lead to a data breach. This provides assurance that test data won’t be used to pay (or deny) real patient claims, nor will the software designers and testers be potentially capable of leaking the client database outside of the company’s control. But it does nothing to ensure that the design of the current production system doesn’t already contain an exploitable vulnerability, one that could lead to such a breach.

Mitigation 2: Ensure that the host facility for the production system uses rigorous access controls to authenticate users and processes trying to access it; log all access attempts. This includes authorized systems administrators who need to generate database backups for shipment to an offsite (cold or warm) standby facility to support continuity-of-operations needs. But those media themselves are subject to loss, misdirection, or unauthorized use if your physical logistics processes aren’t suitably robust.

Mitigation 3: Ensure that all data in the system is encrypted when at rest, in motion, and in use. Thus, the backups generated for off-site storage are encrypted when they are generated. This does have a residual (remaining) risk that if the backup media were lost or stolen, even their encrypted content is subject to decryption attacks. The manner in which the company controls encryption key distribution, certificate use, and so forth could also mean that the “strong” encryption used to protect the client data files was not as strong as you were led to believe.

At each step, you see that the total set of risks involved with loss of an entire patient/client database and the PII within it is reduced, either by reducing the threat surface around that database system or by protecting the information itself against misuse. Other risks remain, however.

Risk Treatment Controls

Once again, you see the trio of physical, logical (also called technical), and administrative (PLA) actions as possible controls you can apply to a given risk or set of risks. These controls are often put to best use in combinations that reflect some fundamental security architectural concepts: least privilege, need to know, and separation of duties being among the most effective and most commonly used approaches. You’ll see in Chapter 6 that this same set of concepts have important roles to play as you strive to ensure that only authenticated users are authorized to take actions with your information systems. In that respect, a physical access control, such as a locked door requiring multifactor identification to be verified to permit entry, is also a physical risk control.

Physical Controls

Physical controls are combinations of hardware, software, electrical, and electronic mechanisms that, taken together, prevent, delay, or deter somebody or something from physically crossing the threat surface around a set of system components you need to protect. They do this by guiding, directing, and controlling the movement of physical items (people, machines, vehicles, containers, or property) both across the outermost perimeter of a facility or location and within it. Large-scale architectural features, such as the design of buildings, their location in an overall facility, surrounding roads, driveways, fences, perimeter lighting, and so forth, are visible, real, and largely static elements of physical control systems. You must also consider where within the building to put high-value assets, such as server rooms, wiring closets, network and communication provider points of presence, routers and Wi-Fi hotspots, library and file rooms, and so on. Layers of physical control barriers, suitably equipped with detection and control systems, can both detect unauthorized access attempts and block their further progress into your safe spaces within the threat surface.

Network and communications wiring, cables, and fibers are also physical system components that need some degree of physical protection. Some organizations require them to be run through steel pipes that are installed in such a way as to make it impractical or nearly impossible to uncouple a section of pipe to surreptitiously tap into the cables or fibers. Segmenting communications, network, and even power distribution systems also provides a physical degree of isolation and redundancy, which may be important to an organization’s CIANA+PS needs.

Note the important link here to other kinds of controls. Physical locks require physical keys, or actuators that are controlled by information systems; multifactor authentication requires logical and physical systems; both require “people power” to create and then run the policies and procedures (the administrative controls) that glue it all together, and keep all of the parts safe, secure, and yet available when needed.

Logical (or Technical) Controls

Here is where you use software and the parameter files or databases that direct that software to implement and enforce policies and procedures that you’ve administratively decided are important and necessary. It is a bit confusing that a “policy” can be a human-facing set of rules, guidelines, and instructions, and a set of software features and their control settings. Many modern operating systems, and identity-as-a-service provisioning systems, refer to these internal implementations of rules and features as policy objects, for example. So we write our administrative “acceptable use” policy document, and use it to train our users so that they know what is proper and what is not; our systems administrators then “teach” it to the operating system by setting parameters and invoking features that implement the software side of that human-facing policy.

Administrative Controls

In general terms, anything that human organizations write, state, say, or imply that dictates how the humans in that organization should do business (and also what they should not do) can be considered an administrative control. Policy documents, procedures, process instructions, training materials, and many other forms of information all are intended to guide, inform, shape, and control the way that people act on the job (and to some extent, too, how they behave off the job).

Administrative controls are typically the easiest to create—but sometimes, because they require the sign-off of very senior leadership, they can be ironically the most difficult to update in some organizational cultures. It usually requires a strong sense of the underlying business logic to create good administrative controls.

Administrative controls can cover a wide range of intentions, from informing people about news and useful information, to offering advice, and from defining the recommended process or procedure to dictating the one accepted way of doing a task or achieving an objective.

Choosing a Control

For any particular risk mitigation need, an organization may face a bewildering variety of competing alternative solutions, methods, and choices. Do we build the new software fix in house or get a vendor to provide it? Is there a turn-key hardware/software system that will address a lot of our needs, or are we better off doing it internally one risk at a time? What’s the right mix of physical, logical, and administrative controls to apply?

It’s beyond the scope of this book, and the SSCP exam, to get into the fine-grain detail of how to compare and contrast different risk mitigation control technologies, produces, systems, or approaches. The technologies, too, are constantly changing. As you gain more experience as an SSCP, you’ll have the opportunity to become more involved in specifying, selecting, and implementing risk mitigation controls.

Step 4: Implement Controls

Controls, also called countermeasures, are the active steps we take to put technologies, features, and procedures in place to help prevent a vulnerability from being exploited and causing a harmful or disruptive impact. Controls can perform one or more functions, which we’ll express in their adjective form (such as reactive). The most common functions needed for controls include:

Directive: Commands or guidance given to people or entities to effect a change in their behavior.
Deterrent: Persuade or influence an attacker to postpone an attack, or abandon it in progress. Typically, deterrent controls act to increase the difficulty of an intrusion or attack, which subliminally attempts to convince the attacker that their efforts will be futile, require more effort and expense, or place them at greater risk of being detected and possibly apprehended by law enforcement.
Preventative: Blocks or impedes an action or event from taking place. Preventative (also referred to as preventive) controls are also known as safeguards because they guard valuable assets as they attempt to keep them safe from intrusion, loss, or other harm.
Detective: Determine that signals from sensing devices are indicators of compromise (IoCs), intrusion, or some other out-of-limits condition, and then notify other elements of the security system so that they may respond.
Reactive: Also called countermeasures, these controls take a separate, additional action in response to detecting an attack.
Corrective: As a class of reactive controls, these take actions that are designed or selected to attempt to contain, limit, or eliminate the harmful agent(s) or elements of the attack, so as to prevent the damage or impact from spreading. Corrective controls also can contribute to making an attack location safer for first responder personnel to enter, assess the scene, and take prompt action. Corrective controls in and of themselves do not usually do anything to restore, recover, or repair the systems elements damaged or impacted by the attack.
Recovery: Assists with or performs tasks to restart or restore to normal operational condition those systems elements that were impacted by an attack.

One further functional control type, compensating controls, is actually a set of three different types of functions, each of which in some way assists the actions of other controls in mitigating the potential impact of a vulnerability (or set of vulnerabilities) or recovering from those impacts. Generally, compensating controls are used:

During incident recovery, as a temporary or expedient way of working around impacts, damage, or losses incurred
As a substitute for a different control type or technology, when mandated by a compliance requirement, but that cannot be implemented effectively or affordably
As an augmentation to another control, when that control (or the compensating control) cannot effectively mitigate the risk by itself to an acceptable degree

We must remember that with each new control we install or each new countermeasure we adopt, we must also make it part of the command, control, and communications capabilities of our integrated information security and assurance systems. For example:

Physical systems technologies, such as buildings, locks, cabinets, fire detection and suppression systems, and even exterior and interior lighting, all can play multiple roles. They can prevent or deter unwanted activities; they can contain damage; they can either directly generate an alarm (and thus notify responders) or indicate that something has happened because of a change in their appearance or condition. (A broken window clearly indicates something has gone wrong; you ignore it at your peril!) Getting our money’s worth of security out of our physical systems’ elements usually requires human monitoring, whether by on-site inspection or remote (CCTV or other) monitoring.
Logical systems technologies can and should provide the connectivity, information sharing, and analytical capabilities that keep everyone informed and enable assured decision making in the event of an incident. Getting everybody out of a building in the event of a fire requires the integrated capability to detect the fire and then notify building occupants about it; occupants have to be trained to recognize that the alarm is directing them to evacuate. Signage and other building features, such as emergency lighting and crash-bar door locks (that allow keyless exit), are also part of the end-to-end safety requirement, as is the need to notify first responders and organizational leadership. These provide the communications element of the C3 system.
Administrative systems dictate the command and control aspects of integrated and proactive systems. By translating our planning results into people-facing products, we inform, advise, and direct our team how to plan, monitor, and act when faced with various circumstances. Administrative procedures delegate authority to incident managers (individual people or organizational units), for example; without this authoritative statement of delegation, all we can do is hope that somebody will keep their head when an incident actually happens, and that the right, knowledgeable head will take charge of the scene.

In many organizations, a spiral development process is used to manage risk mitigation efforts. A few high-priority risks are identified, and the systems that support them are examined for underlying vulnerabilities. Suitable risk mitigation controls are chosen and implemented; they are tested to ensure proper operation and correct results. End users are trained about the presence, purpose, and use of these controls, and they are declared operational. Then the next set of prioritized risks, and perhaps residual risks from this first set, are implemented in much the same way.

Note that even in this spiral or cyclic fashion, there really is a risk mitigation implementation plan! It may only exist as an agreed-to schedule by which the various builds or releases of risk mitigation controls will be specified, installed, tested, and made operational. The SSCP assists management by working to ensure that each increment of risk mitigation (each set of mitigation controls being installed, tested, and delivered to operational use) is logically consistent, that each control is installed correctly, and that users and security personnel know what to expect from it.

As with any implementation project, the choice to implement a particular set of risk mitigation controls should carry with it the documented need it is fulfilling. What is this new control required to actually do once we start using it? This statement of functional requirements forms the basis for verification and validation of our implementation, and it is also a basis for ongoing system security monitoring and assessment. The risk mitigation implementation plan should address these issues.

The implementation plan should also show how you’ll engage with the routine configuration management and change control processes that are used in the business. In many businesses and organizations, policies direct that changes to business processes, operational software, or security systems have to be formally requested and then reviewed by the right set of experts, who then recommend to a formal change control board that the request be approved. Configuration management board approval usually includes the implementation plan and schedule so that this change can be coordinated with other planned activities throughout the organization.

This step includes all activities to get the controls into day-to-day routine operational use. User training and awareness needs identified in the implementation plan must be met; users, security personnel, and the rest of the IT staff must be aware of the changes and how to deal with anything that seems strange in the “new normal” that the new controls bring with them. In most organizations, some level of senior leadership or management approval may be required to declare that the new controls are now part of the regular operational ways of doing business.

Detailed implementation of specific controls will be covered in subsequent chapters. For example, Chapter 5 will go into greater depth about technologies and techniques to use when securing voice, video, and public and internal social media, as well as how physical and logical segmentation of networks and systems should be achieved.

Communications

Keep in mind that “control” is just the middle element of command, control, and communications. The control devices or procedural elements have to communicate with the rest of the system so that we know what is going on. Some types of data that must be shared include but are not limited to:

Status, state, and health information of the control, subsystem, or element. This tells systems operators and support staff if a component that has stopped working has entered a fail-safe state—or if it’s been disconnected from the system altogether! Systems health information can be routinely sent by each system component to a central management system; that management system can also poll each system component or direct special queries to a systems element that seems to be behaving oddly. TCP/IP networks, for example, support the Simple Network Management Protocol (SNMP) and other protocols that allow network elements to broadcast, report centrally, or directly query other elements of the network.
Alarm indications need to be promptly communicated by the element that first senses them; alarm conditions cannot and should not wait for routine polling to get around to discovering that something has gone out of limits, is about to fail, or has possibly been tampered with. (Think about your Windows computer telling you that “a network device or cable is unplugged,” or when Outlook reports that it “cannot communicate with the server” as examples of alarm conditions being detected by one systems element—the NIC, or the TCP/IP protocol software stack, perhaps—and reporting it to another systems element—you, the user.)
Routine operational protocol handshaking is also a vital, but often overlooked, element of information security management systems. Virtually every element of every system works by means of cycles of exchanges of signals. These handshakes make it possible for systems elements to each do their part in making the overall system support and achieve the user’s needs and requirements. These protocol or housekeeping messages probably make up the bulk of what we actually see in our network traffic. They are what make general-purpose capabilities such as TCP/IP able to deliver so many types of services to meet almost any user need.

All of those types of control data must be exchanged between systems elements, if the system is to accomplish its assigned tasks. Even systems that are purely people-powered exchange information as part of the protocols that bring those people together to form a team. (Think about a baseball game: the catcher signals to the pitcher, but the runner on second is trying to see the signals too, to see if now’s the time to attempt to steal third base.)

Operational Technology (OT): Command and Control of Physical Actions

Recall that command is the process of deciding what to do and issuing directives or orders to get it done; control, on the other hand, takes commands and breaks them down into the step-by-step directions to work units, while it monitors those work units for their performance of the assigned task. All systems have some kind of command and control function, and the OODA loop model presented earlier in this chapter provides a great mental model of such control systems. Most human-built systems exist to get specific jobs done or needs met, but those systems also have to have internal control processes that keep the system operating smoothly, set off alarms when it cannot be operated safely, or initiate corrective actions if they can. We can think of command and control of systems as happening at three levels of abstraction: getting the job done, keeping the system working effectively, and keeping it safe from outside corruption, damage, or attack.

Industrial control systems give us a great opportunity to see the importance of effective command, control, and communications in action at the first two levels. Most industrial machinery is potentially dangerous to be around—if it moves the wrong way at the wrong time, things can get broken and people can be killed. Industrial control system designers and builders have wrestled with this problem for almost three centuries, such as those that control an oil refinery or an electric power generating station. Command systems translate current inputs (such as demands for electricity and price bids for its wholesale purchase) into production or systems throughput goals; then they further refine those into device-by-device, step-by-step manipulation of elements of the overall system. Most of the time, this is done by exchanging packets of parameter settings, rather than device commands specifically (such as “increase temperature to 450 degrees F” rather than “open up the gas valve some more”). Other control loops keep the system and its various subsystems operating within well-understood safety constraints. These Supervisory Control and Data Acquisition (SCADA) systems are a special class of network and systems devices for data sharing, command, and control protocols used throughout the world for industrial process control. Much of this marketplace is dominated by special-purpose computers known as programmable logic controllers (PLCs), although many Internet of Things devices and systems are becoming more commonplace in industrial control environments. NIST Special Publication 800-82 Rev. 2, Guide to Industrial Control System (ICS) Security, is an excellent starting point for SSCPs who need to know more about ICS security challenges and how they relate to information system risk management concepts in broader terms. It also helps map ICS or SCADA vulnerability information into the National Vulnerability Database (NIST Publication 800-53 Rev. 4).

Since the early 1990s, however, more and more industrial equipment operators and public utility organizations have had to deal with a third kind of command, control, and communications need: the need to keep their systems safe when faced with deliberate attacks directed at their SCADA or other command, control, and communications systems. It had become painfully clear that the vast majority of the lifeblood systems that keep a modern nation alive, safe, secure, well fed, and in business were hosted on systems owned and operated by private business, most of them using the Internet or the public switched telephone network (PSTN) as the backbone of their command, control, and communications system. In the United States, the President’s Commission on Critical Infrastructure Protection (PCCIP) was created by President Bill Clinton to take on the job of awakening the nation to the need for this third level of C3 systems—the ones that keep modern information-driven economies working correctly and safe from hostile attacks via those information infrastructures.

Security professionals and industrial systems engineers use the term operational technology (OT) as a collective way of referring to this fusion of information systems and physically sensing or changing the world around us. Many of the cyber attacks in 2019 through 2021 demonstrated that there was no air gap between the IT and SCADA or ICS systems—there was no effective isolation between the Internet world and the operational technologies that modern life relies upon so heavily.

In many respects, the need for SSCPs and the standards we need people to uphold as SSCPs was given birth by the PCCIP.

Step 5: Authorize: Senior Leader Acceptance and Ownership

As we said in Chapter 3, risk management must start with the senior leaders of the organization taking full responsibility for everything related to risk management. “The captain goes down with the ship” may not literally require that the ship’s commander drown when the ship sinks, but it does mean that no matter what happens, when it happens, ultimately that captain or commander has full responsibility. Captains of ships or captains of industry (as we used to call such senior leaders) may share their due care and due diligence responsibilities, and they usually must delegate the authority and responsibility to achieve them. Regardless, the C-suite and the board of directors are the ones who operate the business in the names of the owners and stakeholders. They “own” the bad news when due diligence fails to protect the stakeholder’s interests

This has two vital spin-offs for risk management programs, plans, and processes:

It requires senior leadership to set the priorities, establish the success criteria, and then fund, staff, and resource the risk management plans in line with those priorities.
It requires senior leadership to celebrate the successes of these risk management programs and processes, as well as own up to their failures and shortcomings.

That last does need a bit of clarification. Obviously, the best way to keep a secret is to not share it with anyone; the next-best way is to not tell anyone else that you have a secret. If senior leaders or stakeholders are making a lot of public noise about “our successful efforts to eliminate information risk,” for example, that might be just the attractive nuisance that a threat actor needs to come and do a little looking around for something exploitable that’s been overlooked or oversold.

Statements by senior leaders, and their appearance at internal and external events, all speak loudly. Having the senior leaders formally sign off on acceptance testing results or on the results of audits and operational evaluation testing are opportunities to confirm to everyone that these things are important. They’re important enough to spend the senior leadership’s time and energy on. The CEO and the others in the C-suite do more than care about these issues. They get involved with them; they lead them. That’s a very powerful silver bullet to use internally; it can pay huge dividends in gaining end-user acceptance, understanding, and willing compliance with information security measures. It can open everyone’s eyes—maybe just a little; perhaps just enough to spot something out of the ordinary before it becomes an exploited vulnerability.

The Ongoing Job of Keeping Your Baseline Secure

There’s been a lot of hard work accomplished to get to where a set of information risk controls have been specified, acquired (or built), installed, tested, and signed off by the senior leaders as meeting the information security needs of the business or organization. The job thus far has been putting in place countermeasures and controls so that the organization can roll with the punches, and weather the rough seas that the world, the competition, or the willful threat actors out there try to throw at it. Now it’s on to the really hard part of the job—keeping this information architecture and its IT architectures safe, secure, and resilient so that confidentiality, integrity, and authorization requirements are met and stay met. How do we know all of those safety nets, countermeasures, and control techniques are still working the way we intended them to and that they’re still adequate to keep us safe?

The good news is that this is no different than the work we did in making our initial security assessments of our information architecture, the business logic and business processes, and the IT architectures and systems that make them possible. The bad news is that this job never ends. We must continually monitor and assess the effectiveness of those risk controls and countermeasures, and take or recommend action when we see they no longer are adequate. Putting the controls in place was taking due care; due diligence is achieved through constant vigilance.

More good news: the data sources you used originally, to gain the insight you needed to make your first assessments, are still there, just waiting for you to come around, touch base, and ask for an update. Let’s take a closer look at some of them.

Build and Maintain User Engagement with Risk Controls

As you selected and implemented each new or modified information risk mitigation control, you had to identify the training needs for end users, their managers, and others. You had to identify what users and people throughout the organization needed to know and understand about this control and its role in the bigger picture. Achieving this minimum set of awareness and understanding is key to acceptance of the control by everyone concerned. This need for acceptance is continual, and depending on the nature of the risk control itself, the need for ongoing refresher training and awareness may be quite great. Let’s look at how different risks might call for different approaches to establish initial user awareness and maintain it over time:

Suppose your organization has adopted a policy that prohibits end users from installing their own software onto company-provided computer systems. Your IT department has established logical controls throughout all computers to enforce this. Initial user training communicates and gains new employees’ acknowledgment of this. Annual employee performance reviews are opportunities to reaffirm the importance of this policy and the need for employees to comply.
Some users in your organization need to access company information systems and networks via their personal computers or smartphones. This means that the risk of commingling personal data and company data on these employee-owned devices is very real. You determine that currently available mobile device management technologies don’t quite fit your circumstances, but even if they did, mobile or personal device users need to appreciate that the risks of data compromise, device loss or theft, misuse of the device by a family member, or conflicts between company-approved software and personal-use software on these devices could pose additional risks. Getting these mobile or personal device users to be actively part of keeping company data and systems secure is a daily challenge.

The key to keeping users engaged with risk management and risk mitigation controls is simple: align their own, individual interests with the interests the controls are supporting, protecting, or securing. Chapter 11, “Business Continuity via Information Security and People Power,” will show you some strategies and techniques for achieving and maintaining this alignment by bringing more of your business’s “people power” to bear on everybody’s CIA needs.

Participate in Security Assessments

By this time, our newly implemented risk mitigation controls have gone operational. Day by day, users across the organization are using them to stay more secure, (hopefully) achieving improved levels of CIA in their information processing tasks. The SSCP and the information security team now need to shift their mental gears and look to ongoing monitoring and assessment of these changes. In one respect, this seems easy; the identified risk, and therefore the related vulnerability, focused us on changing something in our physical, logical, or administrative processes so that our information could be more secure, resilient, reliable, and confidential; our decisions should now be more assured.

Are they?

The rest of the world did not stand still while we were making these changes. Our marketplace continued to grow and change; no doubt other users in other organizations were finding problems in the underlying hardware, software, or platforms we use; and the vendors who build and support those systems elements have been working to make fixes and patches available (or at least provide a procedural workaround) to resolve these problems. Threat actors may have discovered new zero day exploits. And these or other threat actors have been continuing to ping away at our systems.

We do need to look at whether this new fix, patch, control, or procedural mitigation is working correctly, but we’ve got to do that in the context of today’s system architecture and the environment it operates in…and not just in the one in which we first spotted the vulnerability or decided to do something about the risk it engendered.

The SSCP may be part of a variety of ongoing security assessment such as penetration testing or operational test and evaluation (OT&E) activities, all intended to help understand what the security posture of the organization is at the time that the tests or evaluations are conducted. Let’s take a closer look at some of these types of testing. This kind of test and evaluation is not to be confused with the acceptance testing or verification that was done when a new control was implemented—that verification test is necessary to prove that you did that fix correctly. It should also be kept distinct in your mind from regression testing, the verification that a fix to one systems element did not break others. Ongoing security test and evaluation is looking to see if things are still working correctly now that the users—and the threat actors—have had some time to put the changes and the total system through their paces.

Adding a Security Emphasis to OT&E

OT&E, in its broadest sense, is attempting to verify that a given system and the people-powered processes that implement the overall set of business logic and purpose actually get work done correctly and completely, when seen from the end users’ or operators’ perspective. That may sound straightforward, but quite often, it is a long, complex process that produces some insight rather than clear, black-and-white “succeed” or “fail” scorecard results. Without going into too much detail, this is mainly because unavoidable differences exist between the system that business analysts thought was needed and what operational users in the organization are actually doing, day by day, to get work done. Some of those differences are caused by the passage of time; if it takes months to analyze a business’s needs, and more months to build the systems, install, test, and deliver them, the business has continued to move on. Some reflect different perceptions or understanding about the need; it’s difficult for a group of systems builders to understand what a group of systems users actually have to do in order to get work done. (And quite often, users are not as clear and articulate as they think they are when they try to tell the systems analysts what they need from the new system. Nor are the analysts necessarily the good listeners that they pride themselves on being.)

OT&E in security faces the same kind of lags in understanding, since quite often the organization doesn’t know it has a particular security requirement until it is revealed (either by testing and evaluation, or by enemy action via a real incident). This does create circular logic: we think we have a pretty solid system that fulfills our business logic, so we do some OT&E on it to understand how well it is working and where it might need to be improved—but the OT&E results cause us (sometimes) to rethink our business logic, which leads to changes in the system we just did OT&E on, and in the meantime, the rest of the world keeps changing around us.

The bottom line is that operational test and evaluation is one part of an ongoing learning experience. It has a role to play in continuous quality improvement processes; it can help an organization understand how mature its various business processes and systems are. And it can offer a chance to gain insight into potentially exploitable vulnerabilities in systems, processes, and the business logic itself.

Ethical penetration testing is security testing focused on trying to actively find and exploit vulnerabilities in an organization’s information security posture, processes, procedures, and systems. There are some significant legal and ethical issues that the organization and its testers must address, however, before proceeding with even the most modest of controlled pen-testing. In most jurisdictions around the world, it is illegal for anyone to attempt to gain unauthorized entry into someone else’s information systems without their express written permission; even with that permission in hand, mistakes in the execution of pen-testing activities can expose the requesting company or the penetration testers to legal or regulatory sanctions. To avoid legal complications, organizations typically use third-party penetration test organizations and use specific, detailed contracts and test plans that clearly identify responsibilities, identify the purpose(s) of the testing, place boundaries or constraints on how the testing is carried out, and determine how liability or responsibility for damage or disruption will be dealt with. Contracts also include nondisclosure requirements and direct that the testers cannot retain any data, interim analysis results, and findings they gather or generate as a result of the test. By tightly specifying the nature and extent of the test, the parties keep it legal; as it’s all about strengthening the organization’s cyber defenses, this keeps it ethical.

The first major risk to be considered in pen-testing is that first and foremost, pen testers are trying to actively and surreptitiously find exploitable vulnerabilities in your information security posture and systems. This activity could disrupt normal business operations, which in turn could disrupt your customers’ business operations. For this reason, the scope of pen-testing activities should be clearly defined. Reporting relationships between the people doing the pen-testing, their line managers, and management and leadership within your own organization must be clear and effective.

Another risk comes into play when using external pen-testing consulting firms to do the testing, analyze the results, and present these results to you as the client. Sometimes, pen-testing firms hire reformed former criminal hackers (or hackers who narrowly escaped criminal prosecution), because they’ve got the demonstrated technical skills and hacker mindset to know how to conduct all aspects of such an attack. Yet, you are betting your organization’s success, if not survival, on how trustworthy these hackers might be. Can you count on them actually telling you about everything they find? Will they actually turn over all data, logs, and so forth that they capture during their testing and not retain any copies for their own internal use? This is not an insurmountable risk, and your contract with the pen-testing firm should be adamant about these sorts of risk containment measures. That said, it is not a trivial risk.

The SSCP exam will not go into much detail as it pertains to operational testing and evaluation or to penetration testing. You should, however, understand what each kind of ongoing or special security assessment, evaluation, and testing activities might be; have a realistic idea of what they can accomplish; and be aware of some of the risks associated with them.

Assessment-Driven Training

Whether security assessments are done via formalized penetration testing, as part of normal operational test and evaluation, or by any of a variety of informal means, each provides the SSCP an opportunity to identify ways to make end users more effective in the ways they contribute to the overall information security posture. Initial training may instill a sense of awareness, while providing a starter set of procedural knowledge and skills; this is good, but as employees or team members grow in experience, they can and should be able to step up and do more as members of the total information security team.

End user questions and responses during security assessment activities, or during debriefs of them, can illuminate such opportunities to improve awareness and effectiveness. Make note of each “why” or “how” that surfaces during such events, during your informal walk-arounds to work spaces, or during other dialogue you have with others in the organization. Each represents a chance to improve awareness of the overall information security need; each is an opportunity to further empower teammates be more intentional in strengthening their own security hygiene habits.

A caution is in order: some organizational cultures may believe that it’s more cost-effective to gather up such questions and indicators, and then spend the money and time to develop and train with new or updated training materials when a critical mass of need has finally arisen. You’ll have to make your own judgment, in such circumstances, whether this is being penny-wise but pound-foolish.

Manage the Architectures: Asset Management and Change Control

Think back to how much work it was to discover, understand, and document the information architecture that the organization uses, and then the IT architectures that support that business logic and data. Chances are that during your discovery phase, you realized that a lot of elements of both architectures could be changed or replaced by local work unit managers, group leaders, or division directors, all with very little if any coordination with any other departments. If that’s the case, you and the IT director, or the chief information security officer and the CIO, may have an uphill battle on your hands as you try to convince everyone that proper stewardship does require more central, coordinated change management and control than the company is accustomed to.

The definitions of these three management processes are important to keep in mind:

Asset management is the process of identifying everything that could be a key or valuable asset and adding it to an inventory system that tracks information about its acquisition costs, its direct users, its physical (or logical) location, and any relevant licensing or contract details. Classification and categorization (as we saw in Chapter 3) begin the process of bringing information assets under effective and secure management; in fact, the same risk assessment findings should drive the organization’s choices about how rigorously each different type of information asset should be tracked and managed. Asset management also includes processes to periodically verify that tagged property (items that have been added to the formal inventory) are still in the company’s possession and have not disappeared, been lost, or been stolen. It also includes procedures to make changes to an asset’s location, use, or disposition.
Change management, also called configuration management, is the process by which the organization decides what changes in controlled systems baselines will be made, when to implement them, and the verification and acceptance needs that the change and business conditions dictate as necessary and prudent. Change management decisions are usually made by a configuration management board, and that board may require impact assessments as part of a proposed change.
Configuration control is the process of regulating changes so that only authorized changes to controlled systems baselines can be made. Configuration control implements what the configuration management process decides and prevents unauthorized changes. Configuration control also provides audit capabilities that can verify that the contents of the controlled baseline in use today are in fact what they should be.

Since the IT and information security communities talk a lot about assets and asset management, let’s take a closer look at just what these concepts are and at how the security professional needs to be aware of and engaged with them.

The IT/OT Asset Management Lifecycle

In information technology (IT) and operational technology (OT) terms, an asset is an identifiable element of hardware, software, data, or a set of interfaces; by deciding that this element has some value to the organization, or is critical to achieving one of the organization’s goals or objectives (large or small), that element becomes an asset. Assets are not supplies, nor are they raw materials; they are not consumed by use, but can be reused many times. Many assets are actually large collections of lower-level assets themselves: a laptop computer is an asset, consisting of its hardware, its operating system, the many applications that are installed on it, and the contents of data files and databases that are also loaded onto that laptop. Thus, a single laptop identified with its property tag ID might actually consist of 50 or more assets bundled together and then issued to an employee for them to use.

Generally speaking, the lifecycle of an asset consists of the following major activities:

Planning: The organization identifies a need for the asset, and in doing so it aligns the asset with goals and objectives.
Assigning security needs: The alignment of the asset with goals and objectives also aligned the asset with risk assessment, which identifies the types of security protections the asset will need. Information assets (such as new databases, or proprietary algorithms and workflows for accomplishing specific tasks) may require security classification and categorization; other types of assets, such as servers, networks, or endpoints, may derive a security classification based on the most restrictive security needs of the information the asset will be used with.
Acquisition: This activity builds, buys, leases, or otherwise brings the asset into the organization, prepares it, and makes it useful.
Deployment: The asset (or controlled copies of it, in the case of software, data, and procedural knowledge) is moved into workspaces and environments where end users will apply it to their value-producing activities.
Management: This activity begins when planning starts; as the asset moves from idea to deployed system, asset management captures the specifics about the asset such as its identity, serial numbers for hardware, or build numbers for software. Configuration management and change control processes will be used throughout the management process to ensure that the only changes made to the asset are authorized ones.
Retirement and disposal: All assets will encounter a point in time when they no longer suit the needs of the organization. Requirements and use cases can make a properly functioning asset no longer necessary; or, it may no longer be economically justifiable to keep maintaining it. The asset will be retired, removing it from active use, removing any classified information from it, and then properly purging its hardware and media of any data remaining in the asset when power is removed.

We might call that the planned IT/OT asset management lifecycle model. Many organizations face a slightly different asset lifecycle situation, one in which the planning and management steps are markedly different. We’ll call this the unplanned or discovery IT asset management process. For many reasons, organizations may find that they have many different instances of unplanned, unmanaged hardware, software, and data elements that are part of their operational business activities. This can come about in numerous ways:

No management process in use: Entrepreneurial organizations and small to medium businesses (SMBs) frequently decide to build or buy IT or OT elements, start using them, and modify them as the need arises, all without keeping any type of asset inventory or security records.
Mergers and acquisitions: In both the public and private sectors, organizations are often merged together (either by administrative action to combine them, as in the public sector, or by purchase or other arrangement for private businesses). Often, the gaining organization finds it has just “inherited” an unmanaged or poorly managed collection of IT and OT systems.
Shadow IT or OT: This occurs when end users bring in hardware, software, data, or systems without formal management and security approval (regardless of whether the organization paid for them or not) or write their own software. They may write this software in a combination of formal programming languages, scripts, macros, HTML, or workflows that they build with office productivity and collaboration systems. Many software products and platforms come with such capabilities for users to build their own extensions; in fact, this is what gives these applications their versatility, power, and scalability to apply to such a wide range of use cases across the world. If such citizen programming is outside the reach of security and configuration management processes, it may put the organization or some of its objectives at risk. Bringing all such end-user extension under formal control, however, can defeat the purpose by imposing administrative friction (delays and extra process steps, mostly in the form of seeking permission) between the worker and their good ideas.
Poorly managed IT and OT supply chains: If the vendors that the organization buys or leases their IT and OT elements from does poor or no security and configuration management to produce them, the organization inherits those elements as is, with their internal state, configuration, and vulnerabilities largely unknown (and possibly unknowable). Deciding to buy or lease IT and OT systems elements from vendors implicitly requires the buyer to trust the provider: the buyer cannot know every step of how the provider designed, produced, or tested those elements, or how they were protected from tampering or other harm during production, storage, and shipment. Internationally, the trans-shipment process is becoming more controlled and secure. Trustworthy shippers are ones whose packaging, manifesting, loading, and transport of goods across national borders meets or exceeds standards to prevent and detect smuggling, tampering, or other violations of shipping safety laws. But these laws do not require systems builders or software developers to protect their production processes from harm, nor require them to demand that their suppliers deliver high-quality, reliable elements and use secure channels to deliver them through. Nor do they protect the buyer if incoming shipments can be tampered with while they sit in the receiving area, waiting to be processed.

Whatever the source of that unplanned asset, once it’s been discovered, management has to decide: do we bring it in under formal security and configuration management, or do we let it continue to operate as is, “out in the wild” but still within the boundaries of the organization? Either way involves a trade-off of risks and costs.

System security assessments and audits, network scans, and systems enumerations can often discover some of these undocumented and unmanaged assets. Security professionals then need to manage their way through the process of identifying them, discovering their owners and users, and making an initial triage-style security assessment. After all, until you actually look closely enough at that surprise package, you have no idea whether it’s part of an attacker’s set of hostile agents or a valuable contributor to your organization’s business activities.

The Data Security Lifecycle

Every element of data, information, or knowledge that the organization uses has a similar security lifecycle that it flows through, which follows that data item from its creation through all the steps of storage, usage, modification, sharing, archiving, and then disposal at end of life. Unlike hardware or software elements, however, data elements may be being “touched” by users or software (and hardware) entities thousands of times per second throughout their useful life. Some data is never archived or modified; other data may not even be stored except in the transient working memory of an instrument, sensor, or other endpoint.

One important way to view this lifecycle is to think about the three states that any data item may be in, at any given moment:

In use: Being actively displayed, included in a calculation or process, or being created, modified, or destroyed. Generally, data use takes place within the working memory of a CPU or other processor, or within the control logic of storage devices, endpoints, sensors, and transducers.
In motion: Being moved across any type of communications pathway, for any purpose. Data being sent from a desktop computer to its attached monitor is in motion, as is the data being sent to that computer from its (wired or wireless) keyboard, mouse, or other user input devices. Data traveling over Wi-Fi or other IP network connections is in motion.
At rest: Being stored in some form, whether on rotating disk memory, on tape, or in printed or punch card or punch tape form. Data stored in virtual file systems, either on local computers and servers or in cloud-hosted storage of any kind, is considered to be data at rest (even if the virtual storage system itself may, at any moment, choose to move that data from one storage device or extent to another, as part of managing that storage system).

The information classification and categorization processes identified types or classes of data items that need special protection (whether they are transient or not).

Bringing It Together: Asset Inventories and the Risk Register

Conceptually, this should be where the organization’s risk management activities come together. An inward-looking fusion center approach would provide cross-links and navigation tools that allow security personnel, risk managers, systems and applications builders, and other managers and leaders the ability to understand the inter-relationships of their many different business activities, the types of information they depend upon and produce, and the always-changing risk environment and threat landscape the organization faces.

Organizations that are just starting out on their IT and OT security management journeys need to grow toward this type of introspective fusion approach. One of the best ways to do that is to keep asking questions about your systems, the risks they face, and your dependence upon them. Most organizations cannot easily identify which of their systems or processes are highly susceptible to an insider threat, perhaps one where an employee might attempt to commit fraud or exfiltrate sensitive data. Answering such questions is akin to reverse engineering the organization from top to bottom. Yet, this is what a determined attacker does, as they fingerprint your systems, characterize your defenses, find a way inside your outer walls, and then move about searching for assets to target that meet their needs.

Keeping this knowledge base alive and useful—and keeping it safe and secure—is best done as an ongoing process, one with its own set of workflows and procedures. These workflows can position this knowledge base as a strong supporting element in configuration management and change control, in security assessment, and even in help desk support processes.

What’s at Risk with Uncontrolled and Unmanaged Baselines?

As an SSCP, consider asking (or looking yourself for the answers to!) the following kinds of questions:

How do we know when a new device, such as a computer, phone, packet sniffer, etc., has been attached to our systems or networks?
How do we know that one of our devices has gone missing, possibly with a lot of sensitive data on it?
How do we know that someone has changed the operating system, updated the firmware, or updated the applications that are on our end users’ systems?
How do we know that an update or recommended set of security patches, provided by the systems vendor or our own IT department, has actually been implemented across all of the machines that need it?
How do we know that end users have received updated training to make good use of these updated systems?

If you’re unable to get good answers to those kinds of questions, from policy and procedural directives, from your managers, or from your own investigations, you may be working in an environment that is ripe for disaster.

Auditing Controlled Baselines

To be effective, any management system or process must collect and record the data used to make decisions about changes to the systems being managed; they must also include ways to audit those records against reality. For most business systems, we need to consider three different kinds of baselines: recently archived, current operational, and ongoing development. Audits against these baselines should be able to verify that:

The recently archived baseline is available for fallback operations if that becomes necessary. If this happens, we also need to have an audited list of what changes (including security fixes) are included in it and which documented deficiencies are still a part of that baseline.
The current operational baseline has been tested and verified to contain proper implementation of the changes, including security fixes, which were designated for inclusion in it.
The next ongoing development baseline has the set of prioritized changes and security fixes included in its work plan and verification and test plan.

Audits of configuration management and control systems should be able to verify that the requirements and design documentation, source code files, builds and control systems files, and all other data sets necessary to build, test, and deploy the baseline contain authorized content and changes only.

We’ll address this in more detail in Chapters 9 and 10.

Ongoing, Continuous Monitoring

Prudent risk managers have been doing this for thousands of years. Guards would patrol the city and randomly check to see that doors were secured at the end of the workday and that gates were closed and barred. Tax authorities would select some number of taxpayers’ records and returns for audit, to look for both honest mistakes and willful attempts to evade payment. Merchants and manufacturers, shipping companies, and customers make detailed inventory lists and compare those lists after major transactions (such as before and after a clearance sale or a business relocation). Banks and financial institutions keep detailed transaction ledgers and then balance them against statements of accounts. These are all examples of regular operational use, inspection, audit, and verification that a set of risk mitigation controls are still working correctly.

We monitor our risk mitigation controls so that we can conclude that either we are safe or we are not. Coming to a well-supported answer to that question requires information and analysis, and that can require a lot of data just to answer “Are we safe today?” Trend analysis (to see if safety or security has changed over time, with an eye to discovering why) requires even more data. The nature of our business, our risk appetite (or tolerance), and the legal and regulatory compliance requirements we face may also dictate how often we have to collect such data and for how long we have to keep it available for analysis, audit, or review.

Where does the monitoring data come from? This question may seem to have an obvious answer, but it bears thinking about the four main types of information that we deliberately produce with each step of a business process:

First, we produce the outputs or results we require for business reasons. We calculate the new throttle setting; we transact the sale of an airline ticket; we post a debit to an account balance. Those are examples of required outputs that help achieve required outcomes of the business logic.
Next, we produce verification outputs—additional information that lets the end user and their quality management processes look at the primary process outputs so that they can verify that the process steps have run correctly. This verification is a routine part of the business logic. An example might be where the business logic requires a confirmation (of a credit or debit card transaction by the card processing agent) before it allows the next step to proceed.
Third, we look at safety and security requirements that add additional steps to our business logic. Administrative policy might require valid authentication and authorization of a user before they can access a customer file, and our access control systems enforce those policies. But it is the audit or accounting requirements that drive access control builders to log all attempts by all processes or people to access protected resources. From a safety perspective, we might have requirements that dictate systems are built with interlocks—hardware or software components that do not permit a potentially hazardous step being initiated if all of the safety prerequisite steps have not been met. If nobody else requires it, our liability insurers probably want us to keep good log information on each hazardous step—who initiated it, were all initial conditions correct, and what happened?
Finally, we consider diagnostic information, sometimes called fault detection and fault isolation (FDFI) information. Most hardware systems have features built into their design that facilitate finding failed hardware. Sometimes these built-in test equipment (BITE) systems use industry-standard communications and data protocols, such as what we see in modern computer-controlled automotive systems. Other times they use proprietary protocols and interfaces. Software, too, will often have test features built into its source code so that during development testing, the programmers can demonstrate that the software functions correctly. All of these debug features can be rich sources of systems security monitoring and assessment information.

Notice one important fact: no useful data gets generated unless somebody, somewhere, decided to create a process to get the data generated by the system, output in a form that is useful, and then captured in some kind of document, log file, or other memory device. When we choose to implement controls and countermeasures, we choose systems and components that help us deal with potential problems and inform us when problems occur.

Ongoing Monitoring at Small State University’s East of Ealing Campus

Pete works as the campus director at the East of Ealing campus at Small State University. The campus has a dozen classrooms and five administrative, staff, and faculty offices, all in one small building. It serves about 2,000 students in a variety of online and face-to-face classes and has about 40 faculty members and four part-time and full-time staff. Academic and administrative IT capabilities are all cloud-hosted by the main university organization. On site, classrooms and offices are equipped with Windows 10 systems (desktop, laptop, or tablet), all with a centrally supported standard set of operating systems, utilities, applications, and data resources. They are connected via managed switches that provide access to the Internet and the public switched telephone network; these switches are managed by university IT staff on the main campus. The building has a security alarm system that provides for intrusion detection, fire, and smoke and carbon monoxide event alarms, and is key-card controlled for after-hours entry and exit by faculty and staff.

What kind of monitoring data is generated? Where is that data kept? How is that data collected, collated, and used, and for what purposes?

The building security and fire alarm system is operated by a central office security firm, which maintains records of events (such as authorized or unauthorized access attempts). This central office has policies for when to use alarm data to send out a private patrol firm for onsite inspection or to notify emergency responders. It also has procedures for contacting university officials for specific kinds of events.
The communications and network systems, and the university-owned computers, are able to be remotely managed by university IT staff members. Log data (such as Windows Security Event logs or activity and security logs produced by the managed switches) can be used by IT staff on an as-required basis, but they are not routinely collected or aggregated by university IT.
University faculty and staff members have been given administrative instructions to “keep their eyes open” and report to the head of the local campus team if they detect anything unusual.

One morning, Pete comes into the office and boots up his university laptop, and something strikes him as odd. A bit of investigating reveals that a whole folder tree of data files has disappeared; the folders contained student admissions and enrollment data, all of which is considered by the Family Educational Records Privacy Act (FERPA) as private data requiring protection. Pete is pretty confident that the folder tree and files were there the night before, when he logged off, shut down, and (since there were no night classes scheduled) secured the building.

You are a member of the university IT team, and you get a somewhat panicky call from Pete. Where might you look for telltale information to help you determine whether this is an accidental deletion or a data breach incident?

Pete, by the way, has just read a book about cybersecurity, and he is asking you whether all of the elements and systems in the university’s cybersecurity systems make that system integrated and proactive. What do you think?

Exploiting What Monitoring and Event Data Is Telling You

All of that monitoring data does you absolutely no good at all unless you actually look at it. Analyze it. Extract from it the stories it is trying to tell you. This is perhaps the number one large-scale set of tasks that many cybersecurity and information security efforts fail to adequately plan for or accomplish. Don’t repeat this mistake.

Mistake number two is to not have somebody on watch to whom the results of monitoring and event data analysis are sent to so that when (not if) a potentially emergency situation is developing, the company doesn’t find out about it until the Monday morning after the long holiday weekend is over. Those watch-standers can be on call (and receive alerts via SMS or other mobile communications means) or on site, and each business will make that decision based on their mission needs and their assessment of the risks. Don’t repeat this mistake either.

Mistake number three is to not look at the log data at all unless some other problem causes you to think, “Maybe the log files can tell me what’s going on.”

These three mistakes suggest that we need what emergency medicine calls a triage process: a way to sort out patients with life-threatening conditions needing immediate attention from the ones who can wait a while (or should go see their physician during office hours).

Let’s look at the analysis problem from the point of view of those who need the analysis done and work backward from there to develop good approaches to the analytical tasks themselves. But let’s not repeat mistake number four, often made by the medical profession—that more often than not, when the emergency room triage team sends you back home and says “See your doctor tomorrow,” their detailed findings don’t go to your doctor with you.

What the Alert Team Needs

The alert team is watching over the deployed, in-use operational IT systems and support infrastructures. That collection of systems elements is probably supporting ongoing customer support, manufacturing, shipping, billing and finance operations, and website and public-facing information resources, as well as the various development and test systems used by different groups in the company. Their job is to know the status, state, and health of these in-use IT systems, but not necessarily the details of how or for what purpose any particular end user or organization is using those systems.

Who is the alert team? It might be a part of the day shift help desk team, the people everybody calls whenever any kind of IT issue comes up. In other organizations, the alert team is part of a separate IT security group, and their focus is on IT security issues and not normal user support activities.

What does this alert team do? The information security alert team has as their highest priority being ready and able to receive alerts from the systems they monitor and respond accordingly. That response typically includes the following:

Receive and review alarm, alert, and systems performance reporting data in real time.
Identify and characterize alarms as emergency or non-emergency, based on predetermined criteria.
Take immediate corrective or containment action as dictated by predetermined procedures, if any are required for the alarm in question.
Notify designated emergency responders, such as police, fire, and so forth, if required.
Notify designated technical support staff, or the internal computer emergency response team (CERT), if required.
Notify designated point of contact in management and leadership, if required.
Log this alarm event, and their disposition of it, in the alert team’s own logs.

What we can see from that list of alert team tasks is that we’re going to need the help of our systems designers, builders, and maintainers to help figure out

What data to look for in the monitoring and event data outputs
What logic to apply to the data to determine that an alarm state requiring urgent action is indicated
What, if any, immediate action is required or recommended

The immediacy of the alert team’s needs suggests that lots of data has to be summarized up to some key indicators, rather like a dashboard display in an automobile or an airplane. There are logical places on that dashboard for “idiot lights,” the sort of red-yellow-green indicators designed to get the operator’s attention and then direct them to look at other displays to be better informed. There are also valid uses on this dashboard for indicator gauges, such as throughput measures on critical nodes and numbers of users connected.

The alert team may also need to be able to see the data about an incident shown in some kind of timeline fashion, especially if there are a number of systems elements that seem to be involved in the incident. Timeline displays can call attention to periods that need further investigation and may even reveal something about cause and effect.

Before we jump to a conclusion and buy a snazzy new security information management dashboard system, however, take a look at what the other monitoring and event data analysis customers in our organization might need.

What IT Support Staff Need

The IT support team is actually looking at a different process: the process of taking user needs, building systems and data structures to meet those needs, deploying those systems, and then dealing with user issues, problems, complaints, and ideas for improvements with them. That process lends itself to a fishbone or Ishikawa diagram that takes the end users’ underlying value chain and reveals all of the inputs, the necessary preconditions, the processing steps, the outputs, and how outputs relate to outcomes. This process may have many versions of the information systems and IT baselines that it must monitor, track, and support at any one time. In some cases, some of those versions may be subsets of the entire architecture, tailor-made to support specific business needs. IT and the configuration management and control board teams will be controlling these many different product baseline versions, which includes keeping track of which help desk tickets or requests for changes are assigned to (scheduled to be built into) which delivery. The IT staff must also monitor and be able to report on the progress of each piece of those software development tasks.

Some of those “magic metrics” may lend themselves to a dashboard-style display. For large systems with hundreds of company-managed end-user workstations, for example, one such status indicator could be whether all vendor-provided updates and patches have been applied to the hardware, operating systems, and applications platform systems. Other indicators could be an aggregate count of the known vulnerabilities that are still open and in need of mitigation and the critical business logic affected by them.

Trend lines are also valuable indicators for the IT support staff. Averages of indicators such as system uptime, data or user logon volumes, accesses to key information assets, or transaction processing time can be revealing when looked at over the right timeframe—and when compared to other systems, internal, or external events to see if cause-and-effect relationships exist.

What End Users Need

What end users require may vary a lot depending on the needs of the organization and which users are focused on which parts of its business logic. That said, end users tend to need traffic-light kind of indications that tell them whether systems, components, platforms, or other elements they need are ready and available, down for maintenance, or in a “hands-off” state while a problem is being investigated. They may also appreciate being able to see the scheduled status of particular changes that are of interest to them. Transparent change management systems are ones in which end users or other interested parties in the business have this visibility into the planned, scheduled builds and the issues or changes allocated to them.

What Leadership and Management Need

We might rephrase “What do leadership and management need?” and ask how the analysis of monitoring and event data can help management and leadership fulfill their due care and due diligence responsibilities. Depending on the management and leadership style and culture within the organization, the same dashboard and summary displays used by the alert team and IT support staff may be just what they need. (This is sometimes called a “high-bandwidth-in” style of management, where the managers need to have access to lots of detailed data about what’s going on in the organization.) Other management and leadership choose to work with high-level summaries, aggregates, or alarm data as their daily feeds.

One key lesson to remember is suggested by the number of alert team tasks that lead to notifying management and leadership of an incident or alarm condition. Too many infamous data breach incidents became far too costly for the companies involved because the company culture discouraged late-night or weekend calls to senior managers for “mere” IT systems problems. (The data breach at retail giant Target, in 2013, suffered in part from this failure to properly notify and engage senior leadership before such incidents happen so that the company could respond properly when one occurred.)

How Much Monitoring Data?

Roy works for a typical medium-sized private university, which serves about 32,000 students with a staff and faculty team of about 2,000 people in 32 states across the United States. Each of those people use the university’s online resources throughout the day to participate in classes, build course material, manage student data, or just keep the university’s bills paid.

Let’s assume for planning purposes that each person has a typical Windows 10 computer. By a combination of university IT policy and manufacturer defaults, event logging for hardware, operating systems, security management, and applications are all turned on. These might generate up to 250 events (across multiple types of logs) every hour that the user is actively using the computer, whether for work or leisure.

When users access university online resources, every aspect of what they do is logged, both for security monitoring and transaction backup purposes. Logs are generated in all of the communications devices (modems, routers, switches, firewall systems, etc.) as well as within the servers themselves. Thus a typical online user might cause logging of up to another 1,000 events per hour of intensive work, scattered across dozens of machines.

Students are online an average of four hours per day; staff and faculty an average of eight hours per day. Students might be anywhere on Earth, and while most staff and faculty are within the United States, a good number of them are located in other time zones as well. Peak usage of systems by staff might be during normal working hours (in their own home time zones), whereas student peak usage will probably be in their evenings and weekends.

How many events are logged, on average, across a typical week?

Suppose an incident has occurred in which a number of students, staff, and faculty seem to have been able to access and redistribute files and email messages that according to administrative policy they should not have had access to or the right to share with others. How much event log data might Roy have to examine to be able to investigate who might have done what?

Incident Investigation, Analysis, and Reporting

At some point, the SSCP must determine that an incident of interest has occurred. Out of the millions of events that a busy datacenter’s logging and monitoring systems might take note of every 24 hours, only a handful might be worthy of sounding an alarm:

Unplanned shutdown of any asset, such as a router, switch, or server
Unauthorized attempts to elevate a user’s or process’s privilege state to systems owner or root level
Unauthorized attempts to extract, download, or otherwise exfiltrate restricted data from the facility
Unauthorized attempts to change, alter, delete, or replace any data, software, or other controlled elements of the baseline system
Unplanned or unauthorized attempts to initiate system backup or recovery tasks
Unplanned or unauthorized attempts to connect a device, cable, or process to the system
Unauthorized attempts to access system resources of any kind as part of trying to cause any of these events to occur, or to hide, alter, or mask data that would reveal these attempts
Alarms or alerts from malware, intrusion detection, or other defensive systems

That’s a pretty substantial list, but in a well-managed and well-secured datacenter, most of those kinds of incidents shouldn’t happen often. When they do (not if they do), several important things have to occur properly and promptly:

Alarm or notify the right first responders, whether they are normal IT staff, IT security staff, or a specialized CERT.
Perform immediate steps to characterize the incident and determine whether affected users should cease business operations as normal (but not log off or shut down their systems without IT responder direction!).
Alert appropriate management and leadership in case they need to make other decisions as part of responding to the incident.

Part of that initial triage kind of response involves determining whether the incident is sufficiently serious or disruptive that the organization should activate its incident response plans and procedures. We’ll cover these in Chapter 11 in more detail; for now, recognize that businesses have an abiding due diligence responsibility to think through what to do in an emergency well before that emergency first occurs!

Immediate response to an incident may mean that the first person to notice it has to make an immediate decision: is this an emergency that threatens life or property and thus requires initiating emergency alarms and procedures? Or is it “merely” an information systems incident not requiring outside emergency responders? Before you take on operational responsibilities, make sure you know how your company wants to handle these decisions.

Reporting to and Engaging with Management

We said at the onset of this book that the commitment by senior business leadership and management is pivotal to the success of the company’s information risk management and mitigation efforts. As an SSCP, you and the rest of the team went to great efforts to get those senior leaders involved, gain their understanding, and acceptance of your risk assessments. You then gained their pledges to properly fund, staff, and support your risk mitigation strategies, as well as your chosen risk countermeasures and controls.

Much like any other accountable, reportable function in the company, information security must make regular reports to management and leadership. The good news (no incidents of interest) as well as the bad news about minor or major breaches of security must be brought to the attention of senior leaders and managers. They need to see that their investments in your efforts are still proving to be successful—and if they are not, then they need to understand why, and be informed to consider alternative actions to take in the face of new threats or newly discovered vulnerabilities.

Management and leadership may also have legal and regulatory reporting requirements of their own to meet, and your abilities to manage security systems event data, incident data, and the results of your investigations may be necessary for them to meet these obligations. These will, of course, vary as to jurisdiction; a multinational firm with operating locations in many countries may face a bewildering array of possibly conflicting reporting requirements in that regard.

Whatever the reporting burden, the bottom line is that the information security team must report its findings to management and leadership. Whether those findings are routine good news about the continued secure good health of the systems or dead-of-night emergency alarms when a serious incident seems to be unfolding, management and leadership have an abiding and enduring need to know.

No bad news about information security incidents will ever get better by waiting until later to tell management about it.

Summary

We’ve spent Chapters 3 and 4 learning how to defend our information, our information systems (the business logic that uses information), and our information technology architectures from harm due to accident, Mother Nature, or hostile action by insiders or external actors alike. That has taken us from risk management through risk mitigation, as we’ve seen how the leadership, management, systems, and security teams must work together to make smart trade-offs between the possible pain of a risk becoming reality and the real costs incurred to purchase, install, and operate a control or countermeasure that prevents or reduces that possible loss.

Throughout, we have applied the basic concepts of confidentiality, integrity, and availability as the characteristics by which we assess our information security measures. In broad terms, this CIA triad helps us manage the risks. We’ve seen that without knowing and controlling our systems baselines, we have very little opportunity to detect a vulnerability becoming a disruptive event; thus, we’ve seen how managing our systems baselines and exerting a reasonable amount of change control keeps them safer. The underlying software and hardware of an unmanaged and uncontrolled system may have the same vulnerabilities as a well-managed, tightly controlled system using the same technologies; it is that lack of people-centric management and control processes that expose the unmanaged systems to greater probability of occurrence of an exploitation being attempted or succeeding.

Finally, we’ve seen that the understanding and involvement of all levels of organizational leadership and management are vital to making risk management pay off. Risk management is not free; it takes valuable staff time, intellectual effort, and analysis to pull all of the data; understand the business logic, processes, and architecture; and find the high-priority vulnerabilities. It takes more money, time, and effort to make changes that contain, fix, or eliminate the risks that those vulnerabilities bring with them. But by the numbers, we see that there are ways to make quantitative as well as qualitative assessments about risks, and which ones to manage or mitigate.

Exam Essentials

Know the major activities that are part of information risk mitigation. Risk mitigation is the set of activities that take identified risks and deal with them in ways management finds reasonable and prudent. Its input is the BIA, which characterizes identified risks and their likely impacts to the business. Risk mitigation planning next assesses the information and IT architectures the business depends on; assesses vulnerabilities in those architectures; and then recommends risk treatments. This leads to creating the risk mitigation implementation plan, which captures decisions as to risk treatments, implementation schedules, and verification testing needs. Once the chosen treatments (also called controls or countermeasures) are shown to be working correctly, the new security baseline (preexisting baseline plus risk reductions due to mitigations) is approved by senior leadership. Ongoing operational use is monitored, and logs and other data are reviewed, to determine the continued correct operation of the system and to maintain vigilance for new or heretofore unnoticed risks. Risk mitigation planning also identifies potential incidents of interest (which might be risks becoming reality), and the needs for alerts and alarms to initiate emergency responses and other management actions.
Know the important security differences between the information architecture and the information technology architecture. The information architecture focuses on how people use information to accomplish business objectives; thus, its principal security issues are involved with guiding, shaping, or constraining human behavior. Well-considered workforce education and training programs that align business objectives with information security and decision assurance needs are solid investments to make. By contrast, the IT architecture is perceived as needing predominantly logical or technical controls that require significant expertise and knowledge to deploy and maintain effectively. This perception is true as far as it goes, but it must be driven by the needs for technical security support to the security needs of the information architecture.

Know how to conduct an architecture assessment. The architecture assessment is both an inventory of all systems elements and a map or process flow diagram that shows how these elements are connected to form or support business processes and thereby achieve the needs of required business logic. This requires a thorough review and analysis of existing physical asset/equipment inventories, network and communications diagrams, contracts with service providers, software and systems change control logs, error reports, and change requests. It also should include data-gathering interviews with end users and support personnel.

Explain the purpose and value of a systems or architecture baseline for security purposes. The systems or architecture baseline, which the assessment documents, is both the reality we have to protect and the model or description of that reality. The baseline as documentation reflects the as-built state of the system today, and versions of the baseline can reflect the “build-to” state of the system for any desired set of changes that are planned. These provide the starting point for vulnerability assessments, change control audits, and problem analysis and error correcting.

Explain the importance of assessing “shadow IT” systems, standalone systems, and cloud-hosted services as part of a security assessment. Many organizations are more dependent on IT systems elements that are not in their direct configuration management and control. As such, formal IT management may not have detailed design information, and hence vulnerability insight, about such systems elements. The information security assessment needs to identify each instance of such systems elements, and based on the BIA, determine how much inspection, investigation, or analysis of these systems (and contracts related to them, if any) need to be part of the security assessment.

Know how to perform a vulnerabilities assessment. The vulnerabilities assessment gathers data about the information architecture and the IT architecture, including Common Vulnerabilities and Exposures (CVE) data from public sources. This data is analyzed in the context of the BIA’s prioritized impacts to determine critical vulnerabilities in these architectures. Threat modeling may also be useful in this process. The result is a list of known or suspected vulnerabilities, collated with the BIA’s priorities, for use in risk mitigation implementation planning.

Explain the role of threat modeling in vulnerability assessment. Threat modeling focuses your attention on the boundaries that separate systems from one another, and from the outside world, and thus on how any request for access, service, or information can cross such boundaries. These crossing points are where legitimate users and threat actors can conceivably enter your systems. These may be tunnels (VPN or maintenance trapdoors) left open by accident, for example. Threat modeling is an important component in a well-balanced vulnerability assessment.

Know how to include human elements in the architecture and vulnerability assessments. As the vulnerability assessment reviews business processes and the systems elements that support them, this may indicate process steps where end-user, manager, or other staff actions present vulnerabilities. These may be due to training deficiencies, or to weaknesses in administrative controls (such as a lack of policy direction and guidance), or they may indicate significant risks in need of physical or logical controls and countermeasures.

Explain the basic risk treatment options of accept, transfer or share, remediate, avoid, and recast. Once you’ve identified a vulnerability, you deal with (or treat) its associated risk with a combination of control options as required. Accepting the risk means you choose to go ahead and continue doing business in this way. Transferring the risk usually involves paying someone else to take on the work of repairs, reimbursements, or replacement of damaged systems if the risk event occurs; sharing a risk means that you transfer a portion of it to another, while the remaining (residual) risk stays with you to deal with. Remediation includes repairing or replacing the vulnerable system and is often called “fixing” or “mitigating” the risk. Avoiding a risk means to change a business process so that the risk no longer applies. The application of any risk controls may reduce the probability of occurrence or the nature of the impact of the risk, and thus you have recast (reassessed) the risk.

Know how to determine residual risk and relate it to information security gap analysis. Residual risk is the risk remaining after applying treatment options, and thus it is a recasting of the original risk. Residual risks are in essence gaps in our defenses; gap analysis uses the same approach as vulnerability assessment but is focused on these gaps to see which if any present unacceptable levels of exposure to risk.

Know how and why to perform an information security gap analysis. A gap analysis is similar to auditing a system’s requirements list against the as-built implementation; both seek to discover any needs (requirements) that are not addressed by an effective combination of system features, functions, and elements. An information security gap analysis can reveal missing or inadequate security coverage, and it is useful during vulnerability assessment and after mitigations have been implemented. It is performed by reviewing the baselined set of information security requirements (which should meet or exceed BIA requirements) against the baseline information and IT architectures, noting any unsatisfied or partially satisfied requirements.

Know how the physical, logical, and administrative aspects of risk controls work together. Each of these types of controls takes a decision about security policy and practice and implements it so that people, information technology, and physical systems behaviors fit within security-approved manners. An acceptable use policy, for example, may state that employee-owned devices cannot be brought into secure work areas; a physical search of handbags and so forth might enforce this, and logical controls that detect such devices when they attempt to connect to the networks are a further layer of detection and prevention. Almost all security starts with making decisions about risks; we then write requirements, objectives, plans, or other administrative (people-facing) documents to cause those decisions to be carried out and to monitor their effectiveness.

Explain the requirements for integrated command, control, and communications of risk treatments and countermeasures. Each element of our controls and countermeasures needs to be part of an interlocking, self-reinforcing whole in which elements constantly communicate information about their status, state, and health, or about any alert or alarm-worthy conditions. Systems security managers should have near-seamless, real-time visibility into this information, as well as the ability to remotely manage or command systems elements in response to a suspected or actual information security event. Without this, gaps become blind spots.

Explain the various uses of testing and verification for information assurance and security. Testing and verification are intended to verify that systems meet specified requirements. Testing is typically conducted in test environments, whereas verification can involve observations collected during testing or during ongoing operational use. Security testing and verification aim to establish how completely the information security requirements are satisfied in the deployed systems, including any risk mitigations, controls, or countermeasures that have been added to them since deployment. It validates that the confidentiality, integrity, and availability of the information systems meets or exceeds requirements in the face of ongoing risks and threats. It can also indicate that new threats, vulnerabilities, or risks are in need of attention, decision making, and possibly mitigation.

Know why we gather, analyze, and interpret event and monitoring data. Almost all systems are built around the principle of “trust, but verify.” Due diligence requires that we be able to monitor, inspect, or oversee a process and be able to determine that it is working correctly—and when it is not, to be able to make timely decisions to intervene or take corrective action. Due diligence dictates that systems be built in such ways that they provide not only outputs that serve the needs of business logic but also suitable diagnostic, malfunction, or other alarm indicators. Much of these are captured in event log files by the systems themselves. IT security personnel need to gather these event logs and other monitoring data and collate, analyze, and assess it to (a) be able to recognize that an event of interest is occurring or has occurred, and (b) verify that interventions or responses to this incident are having the desired effect.

Know the importance of elevating alerts and findings to management in a timely manner. Two time frames of interest dictate how information security teams elevate alerts and findings to management. The first is in real time or near-real time, when an event of possible interest is being detected and characterized. If such an event requires emergency responses, which quite often are disruptive to normal business operations, then the right levels of management should be engaged in this decision. When not faced with an emerging situation, management needs to be apprised when ongoing monitoring, assessment, or analysis suggests that the systems are behaving either in abnormal ways or in ways indicative of previously unrecognized risks. Further investigation may involve additional staff or other resources or be disruptive to normal operations; thus, management should be engaged in a timely manner.

Explain the role of incident management in risk mitigation. Risks express a probability of an event whose outcome we will likely find disruptive, if not damaging, to achieving our goals and objectives. Risk mitigation attempts to limit or contain risks and to notify us when a risk event seems to be imminent or is occurring. Incident management provides the ability in real time to decide when and how to intervene to prevent further damage, halt the incident, restore operational capabilities, and possibly request support from other emergency responders. All of those incident management actions help mitigate the effects of risk on our organization and its business processes.

Explain the importance of including operational technology (OT) systems in risk management and mitigation activities. Operational technology (OT) is the broad term for any kind of information systems device which physically interacts with the real world. These can include industrial process control (ICS), supervisory control and data acquisition (SCADA) systems, smart building and environmental management systems, and safety and security systems. Internet of Things (IoT) devices, along with autonomous and mobile robotic devices, are also considered OT. If the organization has invested in (or allows the use of) any of these technologies in its buildings, vehicles, processes, or products and services, it is dependent upon them to one degree or another; that means that the vulnerabilities inherent in the OT systems (and the IT systems that monitor and control them, and inform management about their operation) are at risk.

Explain the different functional types of risk controls. Risks can be controlled or mitigated by means of applying one or more functional controls to them. Directive controls issue commands or guidance, and along with deterrent controls, seek to change the behavior of users and potential attackers or intruders. Preventative controls place barriers or obstacles in the way of an intruder, which both delay the intrusion and raise the effort required to accomplish it. Detective controls observe signals from systems elements and raise an alarm if those signals indicate a potential incident (an intrusion, attack, or out of limits condition) needs attention. Reactive controls take separate, independent action to respond to the incident, such as shutting down servers or closing fireproof doors. Corrective controls, similar to reactive controls, take actions that attempt to nullify, contain, or limit the impacts of an incident. Recovery controls work to restore systems, facilities, or locations back to normal operating condition. Compensating controls may provide workarounds during recovery, act as a full or partial substitute for some other required control, or augment the mitigation efforts of another control. All of these functions work together, and many individual control devices or techniques may provide any or all of these functions in combination.

Explain the use of asset inventories and the risk register in risk mitigation. An asset inventory is a list of all information assets, including IT and OT elements, that the organization depends upon to achieve its goals and objectives. By ensuring that the inventory is complete, any hardware, software, communications pathways, or data that is discovered in or on the systems infrastructure that is not listed in the inventory is potentially suspect and needs to be investigated. Risk assessment, whether done as asset-based, outcomes-based, or process-based, will ultimately link risks (and prioritized goals and objectives) to assets, producing the risk register. Vulnerabilities associated with each asset, such as in published CVE or internal, proprietary data, are then added to the risk register. Together with the security baseline (reflecting classification and categorization decisions), this knowledge base informs change management, security assessment, and ongoing risk mitigation and security operations.

Review Questions

Which of the following activities are not part of information risk mitigation?
1. Implementing new systems features or capabilities to enhance product quality
2. Incident management and investigation after a suspected information security breach
3. Installing and testing new firewall, switch, and router systems and settings
4. Developing an information classification policy and process
An architecture assessment includes which of the following activities? (Choose all that apply.)
1. Review of risk mitigation plans and risk countermeasure log files
2. Ongoing monitoring of systems performance, event logs, and alert data
3. Review of problem reports, change requests, and change management information
4. Review of network and communications connectivity, diagrams, wiring closets, etc.
Which statement(s) about information architectures and IT architectures are most correct?
1. Securing the IT architecture first provides the fastest path to a prudent security posture; once that is achieved, a vulnerability assessment of the information architecture can be done to reveal other residual risks.
2. Business needs should drive administrative security policies based on the information architecture; the IT architecture then needs to have its administrative, logical, and physical controls driven to support the information architecture’s security needs.
3. The IT architecture’s security is primarily dependent on technical or logical controls; these need to be determined first, and then they will inform policy writers as they create or update administrative security requirements for the information architecture.
4. Without effective education and training of all members of the organization, the IT architecture cannot be made secure or kept secure.
How should IT services such as PaaS, IaaS, and SaaS be evaluated as part of a security assessment?
1. Since terms-of-service agreements cover your business’s use of these services, this transfers all of the information security risk to the cloud service provider and makes the security assessment a lot easier.
2. PaaS security needs should be adequately covered by the platform services provider, whereas IaaS may or may not provide strong enough security measures to meet your needs and thus should be avoided if possible.
3. The BIA and the architectural baselines should make clear what risks are transferred to the cloud services provider either in whole or in part, or where their services are assumed to be parts of the mitigation strategy. The security assessment should clearly identify this to as great a detail as it can, particularly for the risks identified in the BIA as of greatest concern.
4. Penetration testing, with the consent of the cloud services provider, would be the most reliable way of assessing the security of these services.
Why are shadow IT systems or elements a concern to information security specialists? (Choose all that apply.)
1. These are exploits and malware found on the dark Web and, as such, must be considered hostile to your organization’s goals and objectives. They should be banned from the business and its systems by policy.
2. Most are written by well-intended users and may be widely used by people in the organization, but quite often they are not subjected to even the most basic software quality assurance measures and are outside of configuration management and control. Hence, they pose potential risks to the IT architecture.
3. The more complex and dynamic these shadow systems become, the less confidence management should have in the reliability, integrity, and confidentiality of the results they produce.
4. As long as common vulnerabilities have been addressed (for example, by blocking the use of unsigned macros in Microsoft Office), shadow IT components are no more likely to introduce risks than other IT systems.
Which statement correctly describes the usefulness of CVE data as part of your risk mitigation planning?
1. It should provide most, if not all, of the vulnerability information you need to implement risk mitigation.
2. Since hackers use CVE data to aid in planning their attacks, this should be the first place you look for insight as you do emergency hardening of your IT systems. Once these obvious vulnerabilities have been mitigated, a more complete vulnerability assessment should be done.
3. It’s a great source of information for known systems elements and known vulnerabilities associated with them, but it does nothing for vulnerabilities that haven’t been reported yet or for company-developed IT elements.
4. Since the vast majority of systems in use are based on Windows, if your business does not use Windows platforms you can probably avoid the expense of investigating CVE for vulnerability information.
What is the role of threat modeling in performing a vulnerability assessment?
1. Threat modeling involves creating models of systems, their vulnerabilities, and possible exploits, as well as modeling or simulating attacks to determine which vulnerabilities are in fact most severe. This drives mitigation planning.
2. Threat modeling focuses attention on boundaries between systems elements and the outside world, and this may help you discover poorly secured VPN or maintenance features or tunnels installed by malware.
3. Threat modeling can be used to validate that your risk mitigation controls and countermeasures have been successfully implemented, and so it comes after the vulnerability assessment.
4. Threat modeling is a useful first step when planning penetration testing.
How should the SSCP assess the human elements in a system as part of vulnerability assessments? (Choose all that apply.)
1. Since the human user is the weakest element in any IT security system, the vulnerability assessment should start by examining all manual data entry, manipulation, or process interaction steps for possible vulnerabilities.
2. The organizational culture and context should determine whether senior leaders and managers create a climate of trust and empowerment or one of rigidly enforced controls and constraints. This sets the bounds within which the SSCP can examine manual interaction with and use of the IT systems for possible vulnerabilities.
3. Every step in every process, whether performed by people or machines, is a potential vulnerability and should be assessed in accordance with the BIA’s established priorities.
4. If the vulnerability assessment indicates that no amount of user training or administrative controls can reduce the risk of an incorrect human action to accessible levels, then further physical or logical controls, or a process redesign, may be needed.
What does it mean to accept a risk?
1. Accepting a risk is when management has reviewed and approved the vulnerability assessment prior to authorizing mitigation to proceed.
2. Accepting a risk means that management knows and understands the probability of occurrence, the possible impacts, and the possible costs of mitigation but chooses nonetheless to not make any changes to business processes or systems. This approach is, in effect, self-insuring against the risk.
3. Accepting a risk means that management has decided to get insurance coverage that will compensate for loss or damages if the risk event actually occurs.
4. Accepting a risk means the same thing as ignoring it.
Which of the following might be legitimate ways to transfer a risk? (Choose all that apply.)
1. Recognize that government agencies have the responsibility to contain, control, or prevent this risk, which your taxes pay them to do.
2. Pay insurance premiums for a policy that provides for payment of claims and liabilities in the event the risk does occur.
3. Shift the affected business processes to a service provider, along with contractually making sure they are responsible for controlling that risk or have countermeasures in place to address it.
4. Change the underlying business process to use more secure software and hardware systems.
What are some of the reasons you might recommend that risks be avoided? (Choose all that apply.)
1. It might cost more to mitigate or control a risk than the business stands to gain by operating with the risk in place.
2. Replacing a vulnerable set of processes with ones that are less vulnerable can be more effective and less costly than attempting to redesign or repair the vulnerable steps or elements.
3. In most cases, very few risks can be avoided; you really end up accepting them, ignoring them, or fixing things so that the risks are far less likely to occur or are less damaging if they do.
4. Avoidance means that you’re refusing to face the facts and trying to ignore what experience is showing you. This is much like ignoring a risk, which makes sense only for risks that are truly beyond the ordinary.
CVE data and your own vulnerability assessments indicate that many of your end-user systems do not include recent security patches released by the software vendors. You decide to bring these systems up to date by applying these patches. This is an example of which of the following?
1. Remediating or mitigating a risk
2. Transferring a risk
3. Avoiding a risk
4. Accepting a risk
How do physical, logical, and administrative controls interact with one another?
1. Usually, the only way these controls can interact is via postevent analysis.
2. Administrative controls should direct and inform people; logical controls implement those directions in the IT architecture; physical controls reinforce by preventing or deterring disruptions to the hardware, systems, and support infrastructures themselves.
3. After determining the physical security and asset protection needs and controls, the administrative and logical controls can be tailored to eliminate or reduce gaps in risk mitigation coverage.
4. It may seem like these should harmonize well, but in practice, that rarely happens since administrators seldom appreciate what IT security needs actually entail.
How might you keep a gap from becoming a blind spot in your information security defenses? (Choose all that apply.)
1. Transfer this risk to insurers or other parties.
2. Ensure that systems elements around the gap provide sufficient detection and reporting capabilities so that an event of interest occurring in the gap cannot spread without being detected.
3. Ensure that other systems elements can either detect or report when an event of interest is happening within the gap.
4. You can’t, as by definition the gap is where you have no appreciable security coverage, and this includes having no monitoring or detection capabilities.
What roles do testing and verification play in information security? (Choose all that apply.)
1. Provide continued confidence in the security of the information systems under test and verification
2. Highlight the need for further risk mitigation, controls, and countermeasures
3. Confirm that countermeasures and controls are still achieving the required degree of protection
4. Verify that penetration testing subcontractors have satisfactorily fulfilled their contract with the business
Which of the following most correctly address whether penetration testing is suitable for use during systems security verification or is best suited to ongoing monitoring and assessment? (Choose all that apply.)
1. Penetration testing is most revealing when performed against a baseline already in use for some time, because the risks of people becoming complacent and mitigation controls becoming out of date increase with time.
2. Penetration testing is not useful during verification testing or systems assessment, because by its nature penetration testing is a somewhat covert attempt to simulate a hostile attack, whereas verification testing is a formalized, planned, and monitored activity.
3. Penetration testing has a valid and valuable contribution to make at any point in the lifecycle of a system, from initial systems analysis throughout its deployed operational use.
4. Penetration testing is normally used during postdeployment systems assessment and starts with current knowledge of how threat actors attempt to reconnoiter, surveil, select, and penetrate a target; verification starts with a functional security requirements baseline and confirms (via audit, test, or inspection) that each requirement in that baseline still functions properly. Both techniques complement each other during ongoing operational assessment.
How do we perform ongoing monitoring of our IT systems to ensure that all risk mitigation controls and countermeasures are still protecting us? (Choose all that apply.)
1. Periodically, gather up all of the event logs and monitoring log files, collate them, and see if potential events of interest are apparent.
2. Routinely poll or ask users if abnormal systems behaviors have been noted.
3. Review systems performance parameters, such as throughputs, systems loading levels, resource utilization, etc., to see if they meet with expectations.
4. Review current postings in CVE and NVD systems to determine if the vulnerability assessment is still effective.
What important role does systems monitoring perform in support of incident management?
1. They are not related—monitoring is a routine task that uses trend analysis and data analytics to determine if past systems behavior and use have been within expected bounds.
2. The role is essential; by bringing together alert and alarm indicators from systems and their associated security controls and countermeasures, monitoring is the watchdog capability that activates incident response capabilities and plans.
3. Incident response includes its own monitoring and alarms capabilities, so systems monitoring provides a good backup or alternate path to determining whether an incident is occurring.
4. Ongoing, continuous monitoring is used to adjust or fine-tune alarm threshold settings so that false alarm rates can be better managed.
How are dashboards used as part of systems monitoring or incident response?
1. Dashboards typically display highly summarized key performance indicators, which are suitable for long-term business planning; as such, they’re not useful in real-time systems monitoring or incident response.
2. Dashboards can be useful in systems monitoring; they can flag IT staff when events are occurring that may indicate systems loading issues or even failures of systems components. But they are not usually suitable for detecting security incidents.
3. By summarizing systems status, such as which elements are healthy and which are nonresponsive, dashboards can be helpful in incident response decisions. But the details below the level of the dashboard are what ongoing monitoring depends on.
4. By combining highly summarized key performance parameters with ongoing and recent event data, systems managers can see at a glance whether systems are behaving within expected limits, detect whether subsystems have failed (or are under attack), and drill down to get further data to inform incident response decision making.
What is the role of incident response and management in risk mitigation and risk management?
1. Incident response and management are vital to risk mitigation; they provide the timely detection, notification, and intervention capabilities that contain the impact of a risk event and manage efforts to recover from it and restore operations to normal.
2. Although it comes after the risk assessment and mitigation planning, implementation, and verification, incident response is not part of risk mitigation or risk management.
3. Incident management is a part of risk mitigation but not part of risk management.
4. Incident management is a part of risk management, but not of risk mitigation.
Which control functions are performed by exterior floodlights controlled by motion detectors around a residential or commercial building? (Select all that apply.)
1. Directive
2. Deterrent
3. Preventative
4. Detective
5. Reactive
6. Corrective
7. Recovery
Your bank’s online services portal requires you to log in with username, password, additional security code, and a separate security app on your mobile phone. In addition, any attempts to submit a transaction will require you to answer additional security challenges, which might also require use of the mobile phone app. Which control functions are not performed by these features? (Select all that apply.)
1. Directive
2. Deterrent
3. Preventative
4. Detective
5. Reactive
6. Corrective
7. Recovery
You’re working as a one-person security operations team at a startup biotech firm. Your group has recently completed a vulnerability assessment of the systems and assets in the existing IT asset inventory, which included identifying any current CVE issues listed for the asset types listed on the inventory. A number of urgent issues were surfaced by this assessment. The chief operating officer (COO) for the company believes that vulnerability assessment must now be complete and wants to get back to “normal” business. Which statement or statements might be your best response? (Select all that apply.)
1. Suggest to the COO that there are many OT systems, such as building management and laboratory systems, which are not on the IT inventory, and that these should be assessed for vulnerabilities too.
2. Suggest to the COO that there are many OT systems, such as building management and laboratory systems, which are not on the IT inventory, but those systems are safe from intrusion or attack.
3. Suggest to the COO that some of the vulnerabilities that have been discovered may need immediate policy direction to all employees.
4. Agree, and ask for direction to proceed with planning and estimating the costs of the most urgently needed risk mitigations.