Risk management decides what risks to try to control; risk mitigation is how SSCPs take those decisions to the operational level. Senior leadership and management must drive this activity, supporting it with both resources and their attention span. These stakeholders, the business’s or organization’s leadership and decision makers, must lead by setting priorities and determining the acceptable cost–benefits trade. SSCPs, as they grow in knowledge and experience, can provide information, advice, and insight to organizational decision makers and stakeholders as they deliberate the organization’s information risk strategy and needs. Chapter 3, “Integrated Information Risk Management,” showed that this is a strategic set of choices facing any organization.
Risk mitigation is what SSCPs do, day in and day out. This is a tactical, near-term activity, as well as a set of tasks that translate tactical planning into operational processes and procedures. Risk mitigation delivers on the decision assurance and information security promises made by risk management, and SSCPs make those promises and expectations become operational reality. SSCPs participate in this process in many ways, as you’ll see in this chapter. First, we’ll focus on the “what” and the “why” of integrated defense in depth and examine how SSCPs carry out its tactics, techniques, and procedures. Then we’ll look in more detail at how organizational leadership and management need the SSCP’s assistance in planning, managing, administering, and monitoring ongoing risk mitigation efforts as part of carrying out the defense-in-depth strategic plan (discussed in Chapter 3). The SSCP’s role in developing, deploying, and sustaining the “people power” component of organizational information security will then demonstrate how all of these seemingly disparate threads can and should come together. We’ll close by looking at some of the key measurements used to plan, achieve, and monitor risk management and mitigation efforts.
Chapter 3 showed how organizations can use risk management frameworks, such as NIST SP 800-37 Rev. 2 or ISO 31000:2018, to guide their assessment of risks that face the organization’s information and information technology systems. Making such assessments guides the organization from the strategic consideration of longer-term goals and objectives to the tactical planning necessary to implement information risk mitigation as a vital part of the organization’s ongoing business processes. One kind of assessment, the impact assessment, characterizes how important and vital some kinds of information are to the organization. This prioritization may be because of the outcomes, assets, or processes that use or produce that information, or because of how certain kinds of threats or vulnerabilities inherent in those information processes put the organization itself at risk.
The next step in the assessment process is seeking out the critical vulnerabilities and determining what it takes to mitigate the risks they pose to organizational goals, objectives, and needs. This vulnerability assessment is not to be confused with having a “vulnerability-based” or “threat-based” perspective on risks overall. The impact assessment has identified outcomes, processes, or assets that must be kept safe, secure, and resilient. Even if we started the impact assessment by thinking about the threats, or the kinds of vulnerabilities (in broad terms) that such threats could exploit, we now have to roll up our sleeves and get into the details of just how the information work actually gets done day by day, week by week. Four key ideas helps SSCPs keep a balanced perspective on risk as she looks to translate strategic thinking about information risk into action plans that implement, operate, and assess the use of risk management controls:
Chapter 3 also showed something that is vital to the success of information security efforts: they must be integrated and proactive if they are to be even reasonably successful when facing the rapidly evolving threat space of the modern Internet-based, Web-enabled world. By definition, an integrated system is one that its builders, users, and maintainers manage. More succinctly: unmanaged systems are highly vulnerable to exploitation; well-managed systems are still vulnerable, but less so. We’ll look further into this paradigm both here and in subsequent chapters.
We are now ready to cross the boundary between strategic risk management and tactical risk mitigation. For you to fully grasp the speed and agility of thought that this requires, let’s borrow some ideas from the way military fighter pilots train to think and act in order to survive and succeed.
Let’s focus on the key difference between planning and operations. Planning is a deliberate, thoughtful process that we engage in well in advance of the time we anticipate we’ll need to do what our plans prescribe. It asks us to investigate; gather data; understand the stated and unstated assumptions, needs, constraints, and ideals—all of which we try to bring together into our plan. Planning is a balancing act; we identify tasks we need to do; we estimate the people, money, material, and time we’ll need to accomplish those tasks; and then we trim our plan to fit the resources, time, and people available to us. It’s an optimization exercise. Planning prepares us to take action; as Dwight D. Eisenhower, 34th president of the United States and Supreme Allied Commander, Europe, during World War II, famously said, “Plans are worthless, but planning is indispensable.” Making plans, reviewing them, exercising them, and evaluating and modifying them trains your mind to think about how tasks and resources, time, and space fit together. By planning, replanning, reviewing, and updating your plans as part of your “security normal,” you build an innate sense of how to make that “fit” achieve your objectives—and what to do when things just don’t fit!
Plans should lead to process engineering and design tasks, in which we thoughtfully create the procedures, processes, and tools that our workforce will use day to day. Planning should reveal the need for training and human resources development. Planning should bring these needs together and show us how to recognize the moment in which all of the physical, logical, and administrative steps have been taken, our people are trained, and testing has verified that we’re ready to open our doors and start doing business. Once that “initial operational capability” milestone has been reached, and once we’ve delivered the “minimum operational increment of capability” that our users can accept, we switch from planning to operations. We do what’s been planned.
Plans are a set of predictions that rest on assumptions. Plans address the future, and to date, none of us has 100% perfect foresight. Think about all of the assumptions made during the business impact analysis (BIA) process, which we worked through in Chapter 3, and ask, “What if most of them are wrong?” A clear case in point is the underlying assumption of cryptography; we can protect our information today by encoding it in ways that will take far more time, money, and effort than adversaries will find it worth their while to attempt to crack. (This is sometimes called “sprinkling a little crypto dust” over your systems, as if, by magic, it will fix everything.) Your super-secure password that might take a million years of CPU time just might crack on the first guess! (It’s not very probable…but not impossible!) Your thorough audit of your IT infrastructure just might miss a backdoor that a developer or engineer put in and “forgot” to tell you about. Your penetration testing contractor might have found a few more vulnerabilities than they’ve actually told you about. The list of surprises like this is, quite frankly, endless.
Since your plans cannot be perfect, you have to be able to think your way through a surprising situation. This requires you to take the time to think, especially in the heat of battle during an IT security incident.
And if your adversary can deny you that “thinking time,” if they can push you to react instead of thoughtfully considering the situation and the facts on hand and considering the situation in the context of your own objectives, you fall prey to your adversary outthinking you.
How do you avoid this?
The four steps of observe, orient, decide, and act, known as the OODA loop, provide a process by which you can keep from overreacting to circumstances. First observed in studies conducted by Colonel John Boyd, USAF, of U.S. combat fighter pilots during the Vietnam War, it has become a fundamental concept in fields as diverse as law enforcement training, business and leadership, cybernetics and control systems design, artificial intelligence, and information systems design and use. If you can master the OODA loop and make it part of your day-to-day operational kit bag, you can be the kind of SSCP who “keeps their head, when all about you are losing theirs and blaming it on you,” as Kipling put it so adroitly.
Figure 4.1 shows the OODA loop, its four major steps, and the importance of feedback loops within the OODA loop itself. It shows how the OODA loop is a continually learning, constantly adjusting, forward-leaning decision-making and control process.
FIGURE 4.1 John Boyd’s OODA loop
Think about Figure 4.1 in the context of two or more decision systems working in the same decision space, such as a marketplace. Suppliers and purchasers all are using OODA loops in their own internal decision making, whether they realize it or not. When the OODA loops of customers and suppliers harmonize with one another, the marketplace is in balance; no one party has an information advantage over the other. Now imagine if the customers can observe the actions of multiple suppliers, maybe even ones located in other marketplaces in other towns. If such customers can observe more information and think “around their OODA loop” more quickly than the suppliers can, the customers can spot better deals and take advantage of them faster than the suppliers can change prices or deliveries to the markets.
Let’s shift this to a less-than-cooperative situation and look at a typical adversary intrusion into an organization’s IT systems. On average, the IT industry worldwide reports that it takes businesses about 190 days to first observe that a threat actor has discovered previously unknown or unreported vulnerability and exploited it to gain unauthorized access to the business’s systems. It also takes about 170 days, on average, to find a vulnerability, develop a fix (or patch) for it, apply the fix, and validate that the fix has removed or reduced the risk of harm that the vulnerability could allow to occur. Best case, one cycle around the OODA loop takes the business from observing the penetration to fixing it; that’s 190 plus 170 days, or 12 months of being at the mercy of the intruder and any potential copycat attackers. By contrast, the intruder is probably running on an OODA loop that might take a few days to go from initially seeking a new target, through initial reconnaissance, to choosing to target a specific business. Once inside the target’s systems, the decision cycle time to seek information assets that suit the attacker’s objectives, to formulate actions, to carry out those actions, and then to cover their tracks might run into days or weeks. It’s conceivable that the attacker could have executed multiple exploits per week over those 12 months of “once-around-the-OODA” that the business world seems to find acceptable.
It’s worth emphasizing this aspect of the zero day exploit in OODA loop terms. The attacker does not need to find the vulnerability before anybody else does; she needs to develop a way to exploit it, against your systems, before that vulnerability has been discovered and reported through the normal, accepted vulnerability reporting channels, and before the defenders have had reasonable opportunity to become aware of its existence. Once you, as one of the white hats, could have known about it, it’s no longer a zero day exploit—just one you hadn’t implemented a control for yet.
In Chapter 1, “The Business Case for Decision Assurance and Information Security,” we introduced the concept of the value chain, which shows each major set of processes a business uses to go from raw inputs to finished products that customers have bought and are using. Each step in the value chain creates value—it creates greater economic worth, or creates more of something else that is important to customers. Business uses what it knows about its methods to apply energy (do work) to the input of each stage in the value chain. The value chain model helps business focus on improving the individual steps, the lag time or latency within each step and between steps, and the wastage or costs incurred in each step. But business and the making of valuable products is not the only way that value chain thinking can be applied.
Modern military planners adapted the value chain concept as a way to focus on optimally achieving objectives in warfare. The kill chain is the set of activities that show, step by step, how one side in the conflict plans to achieve a particular military objective (usually a “kill” of a target, such as neutralizing the enemy’s air defense systems). The defender need not defeat every step in that kill chain—all they have to do is interrupt it enough to prevent the attacker from achieving their goals, when their plans require them to.
It’s often said that criminal hackers and cyber threat actors only have to be lucky once, in order to achieve their objectives, but that the cyber defender must be lucky every day to prevent all attacks. This is no doubt true if the defender’s OODA loops run slower than those of their attackers. As you’ll see, it takes more than just choosing and applying the right physical, logical, and administrative risk treatments or controls to achieve this.
Let’s start by taking apart our definition of risk mitigation (from Chapter 3), and see what it reveals in the day-to-day of business operations.
Risk mitigation is the process of implementing risk management decisions by carrying out actions that contain, transfer, reduce, or eliminate risk to levels the organization finds acceptable, which can include accepting a risk when it simply is not practical to do anything else about it.
Figure 4.2 shows the major steps in the risk mitigation process we’ll use here, which continues to put the language of NIST SP 800-37 and ISO 31000:2018 into more pragmatic terms. These steps are:
FIGURE 4.2 Risk mitigation major steps
The boundary between planning and doing, as we cross from Step 3 into Step 4, is the point where the SSCP helps the organization fit its needs for risk treatment and control into its no-doubt very constrained budget of people, money, resources, and time. In almost all circumstances, the SSCP will have to operate within real constraints. No perfect solution will exist; after all of your effort to put in place the best possible risk treatments and controls, there will be residual risk that the organization has by default chosen to accept. If you and your senior leaders have done your jobs well, that residual risk should be within the company’s risk tolerance. If it is not, that becomes the priority for the next round of risk mitigation planning!
Let’s continue to peel the onion of defense in depth back, layer by layer, as we put information risk mitigation in action. We started with context and culture; now, we need to draw a key distinction between the organization’s information architecture (how people share information to make decisions and carry them out) and the information technology architecture (the hardware, software, and communications tools) that supports that people-centric sharing of information and decision.
Other chapters will look in greater technical and operational depth at specific layers of the information architecture or the technologies they depend on. In Chapter 5, “Communications and Network Security,” you’ll learn how the SSCP needs to address both the human and technological aspects of these important infrastructures. In Chapter 7, “Cryptography,” you’ll see how to apply and manage modern cryptographic techniques to almost every element of an information architecture. Chapter 8, “Hardware and Systems Security,” provides a closer look at systems security.
But before we get into the technological details, we first must map out the systems, processes, and information assets that are in use, right now, today, within our organization. All of those elements taken together are what we call the information architecture of the organization or business. Whether that architecture was well planned using the best design standards and templates, or it grew organically or haphazardly as users responded to changing needs and opportunities, is beside the point. The information architecture is what exists, and you as the SSCP must know it and understand it if you are to protect and preserve it. And if this statement holds for the information architecture, for that set of purposes, plans, ideas, and data, it holds doubly so for the underlying information technology architectures (note that many organizations don’t realize how many such architectures they really have!) that embody, support, and enable it.
The information architecture largely consists of the human and administrative processes that are the culture, context, process, and even the personality of the organization. You learned in Chapter 3 how vital it is to get this human-centric information architecture focused on the issue of information risk management. Now we need to consider how to take the results of that preparation activity, make them useful, and put them to use as we start developing risk mitigation plans.
No organization exists in a vacuum. It is a player in a marketplace; it is subject to written and unwritten norms and expectations that govern, shape, or constrain its actions and its choices. Laws and regulations also dictate what the organization can and cannot do, especially when it comes to how it keeps its information and decision systems safe, reliable, and secure. Laws and regulations may also require reporting or public disclosure of information, including information about information security incidents.
Organizational culture is the sum of all of the ways, written and unwritten, in which organizations make decisions and carry them out. Quite often, the organizational culture reflects the personalities and personal preferences of its founders, stakeholders, leaders, or key investors. Two key aspects of organizational culture that affect information security planning and operations are its willingness to accept or take risks and its need for control.
Being risk-averse or risk-tolerant is a measure of an appetite for risk, whether that risk is involved with trying something new or with dealing with vulnerabilities or threats. The higher the risk appetite, the more likely the organization’s decision makers are to accept risk or to accept higher levels of residual risk.
The need for control shows up in how organizations handle decision making. Hierarchically structured, top-down, tightly controlled organizations may insist that decisions be made “at the top” by senior leaders and managers, with rigidly enforced procedures dictating how each level of the organization carries out its parts of making those decisions happen. By contrast, many organizations rely on senior leaders to make strategic decisions, and then delegate authority and responsibility for tactical and operational decision making to those levels where it makes best sense. It is within the C-suite of officials (those with duty titles such as chief executive officer, chief financial or operations or human resources officer, or chief information officer) where critical decisions can and must be made if the organization is to attempt to manage information and information systems risk—let alone successfully mitigate those risks. The SSCP may advise those who advise the C-suite; more importantly, the SSCP will need to know what decisions were made and have some appreciation as to the logic, the criteria, and the assumptions that went into those decisions. Some of that may be documented in the BIA; some may not.
Let’s look at this topic by way of an example. Suppose you’re working for a manufacturing company that makes hydraulic actuators and mechanisms that other companies use to make their own products and systems. The company is organized along broad functional lines—manufacturing, sales and marketing, product development, purchasing, customer services, finance, and so on.
The company may be optimized along “just-in-time” lines so that purchasing doesn’t stockpile supplies and manufacturing doesn’t overproduce products in excess of reasonable customer demand forecasts. Nevertheless, asks the SSCP, should that mean that sales and marketing have information systems access to directly control how the assembly line equipment is manufacturing products today?
Let’s ask that question at the next level down—by looking at the information technologies that the organization depends on to make its products, sell them, and make a profit.
One approach might be that the company makes extensive use of computer-aided design and manufacturing systems and integrated planning and management tools to bring information together rapidly, accurately, and effectively. This approach can optimize day-to-day, near-term, and longer-term decision making, since it improves efficiency.
Another information architecture approach might rely more on departmental information systems that are not well integrated into an enterprise-level information architecture. In these situations, organizations must depend on their people as the “glueware” that binds the organization together.
Once again, the SSCP is confronted with needing insight and knowledge about what the organization does, how it does it, and why it does it that way—and yet much of that information is not written down. For many reasons, much of what organizations really do in the day-to-day of doing their business isn’t put into policies, procedures, or training manuals; it’s not built into the software that helps workers and managers get jobs done. This tacit, implied knowledge of who to go to and how to get things done can either make or break the SSCP’s information security plans and efforts. The SSCP is probably not going to be directly involved in what is sometimes called business process engineering, as the company tries to define (or redefine) its core processes. Nor will the SSCP necessarily become a knowledge engineer, who tries to get the tacit knowledge inside coworkers’ heads out and transform it into documents, procedures, or databases that others can use (and thereby transform it into explicit knowledge). It’s possible that the BIA provides the insights and details about major elements of the information technology architecture, in which case it provides a rich starting point to begin mitigation planning. Nonetheless, any efforts the company can make in these directions, to get what everybody knows actually written down into forms that are useful, survivable, and repeatable, will have a significant payoff. Such process maturity efforts can often provide the jumping-off point for innovation and growth. It’s also a lot easier to do process vulnerability assessments on explicit process knowledge than it is to do them when that knowledge resides only inside someone’s mind (or muscle memory).
With that “health warning” in mind, let’s take a closer look at what the organization uses to get its jobs done. In many respects, the SSCP will need to reverse engineer how the organization does what it does—which, come to think of it, is exactly what threat actors will do as they try to discover exploitable vulnerabilities that can lead to opportunities to further their objectives, and then work their way up to choosing specific attack tools and techniques.
The information technology architecture of an organization is more than just the sum of the computers, networks, and communications systems that the business owns, leases, or uses. The IT architecture is first and foremost a plan—a strategic and tactical plan—that defines how the organization will translate needs into capabilities; capabilities into hardware, software, systems, and data; and then manage how to deliver, support, and secure those systems and technologies. Without that plan—and without the commitment of senior leadership to keep that plan up to date and well supported—the collection of systems, networks, and data is perhaps little more than a hobby shop of individually good choices that more or less work together.
The development of an IT architecture (as a plan and as a system of systems) is beyond the scope of what SSCPs will need to know; there are, however, a few items in many IT architectures and environments that are worthy of special attention from the SSCP.
One good way of understanding what the organization’s real IT architecture is would be to do a special kind of inventory of all the hardware, software, data, and communications elements of that architecture, paying attention to how all those elements interact with one another and the business processes that they support. Such an information technology baseline provides the foundation for the information security baseline—in which the organization documents its information security risks, its chosen mitigation approaches, and its decisions about residual risk. The good news is that many software tools can help the SSCP discover, identify, and validate the overall shape and elements of the information technology baseline, and from that start to derive the skeleton of your information security baseline. The bad news? There will no doubt be lots of elements of both baselines that you will have to discover the old-fashioned way: walk around, look everywhere, talk with people, take notes, and ask questions.
Think back to the security baseline we defined in Chapter 3, which was a living repository of information asset classification, categorization, compliance, and controls information. Whether that baseline is expanded to include systems implementation and procedural details, as in the previous paragraph, or those details are kept separately but linked to the security baseline, is something that organizations should consider.
Key elements of the IT architecture that this baseline inventory should address would include:
Whether the SSCP is building the organization’s first-ever IT architecture baseline or updating a well-established one, the key behavior that leads to success is asking questions. What is it? Where is it? Why is it here (which asks, “How does it support which business process”)? Who is responsible for it? Who uses it? What does it connect to? When is it used? Who built it? Who maintains it? What happens when it breaks?
Let’s take a closer look at some of the special cases you may encounter as you build or update your organization’s IT architectural baseline.
Many organizations consider all of their IT assets, systems, software, and tools to be part of one large system, regardless of whether they are all plugged in together into one big system or network. Other organizations will have their IT systems reflect the way that work groups, departments, and divisions interact with one another. How the organization manages that system of systems often reflects organizational culture, decision-making styles, and control needs. Sadly, many organizational systems just grow organically, changing to meet the needs, whims, and preferences of individual users, departments, and stakeholders.
Let’s look at two classes of systems that might pose specific information risks:
Chapter 5 will address the key technical concepts and protocols involved with modern computer and communications networks. At this point, the key concept that SSCPs should keep in mind is that networks exist because they allow one computer, at one location, to deliver some kind of service to users at other locations. Those users may be people, software tasks running on other computers, or a combination of both people and software. Collectively, we refer to anything requesting a service from or access to an information asset as a subject; the service or asset they are requesting we call an object.
This idea or model of subjects requesting services is fundamental to all aspects of modern information technology systems—even standalone computers that support only a single person’s needs make use of this model. Once an organization is providing services over a network, the problem of knowing who is requesting what be done, when and how, and with what information assets, becomes quite complicated.
End users need to get work done to satisfy the needs of the organization. End users rely on services that are provided and supported by the IT architecture; that architecture is made up of service providers, who rely on services provided by other levels of providers, and on it goes. (“It’s services all the way down,” you might say.) Over time, many different business models have been developed and put into practice to make the best use of money, time, people, and technology when meeting all of these service needs.
The IT architecture baseline needs to identify any external person, agency, company, or organization that fulfills a service provider role. Most of these external relationships should have written agreements in place that specify responsibilities, quality of service, costs, billing, and support for problem identification, investigation, and resolution. These agreements, whether by contract, memoranda of understanding, or other legal forms, should also lay out each party’s specific information security responsibilities. In short, every external provider’s CIA roles and responsibilities should be spelled out, in writing, so that the SSCP can include them in the security baseline, monitor their performance and delivery of services, and audit for successful implementation and compliance with those responsibilities.
“Doing it in the cloud” is the most recent revolution in information technology, and like many such revolutions, it’s about ideas and approaches as much as it is about technologies and choices. At the risk of oversimplifying this important and complex topic just now, let’s consider that everything we do with information technology is about getting services performed. Furthermore, all services involve using software (which runs on some hardware, somewhere) to make other software (running on some other hardware, perhaps) to do what we need done, and give us back the results we need. We’ll call this the service provision model, and it is at the heart of how everything we are accustomed to when we use the Web actually works. Let’s operationalize that model to see how it separates what the end user cares about from the mere details of making the service happen.
Users care most about the CIA aspects of the service, seen strictly from their own point of view:
By contrast, the service provider has to care about CIA from both its users’ perspective and its own internal needs:
This brings us to consider the “as-a-service” set of buzzwords, which we can think of in increasing order of how much business logic they implement or provide to the organization:
We’ll examine this topic in more detail in Chapter 9, “Applications, Data, and Cloud Security”; for right now, it’s important to remember that ultimately, the responsibilities of due care and due diligence always remain with the owners, managing directors, or chief executives of the organization or business. For the purposes of building and updating the IT architecture baseline, it’s important to be able to identify and specify where, when, how, and by whom these buzzwords are implemented in existing service provider relationships.
With the baseline in hand, the SSCP is ready to start looking at vulnerability assessments.
We’ve looked at how badly it will hurt when things go wrong; now, let’s look at how things go wrong.
The IT architecture baseline links IT systems elements to business processes; the planning we’ve done so far then links key business processes to prioritized business goals and objectives. That linking of priorities to architectural elements helps the SSCP focus on which information assets need to be looked at first to discover or infer what possible vulnerabilities may be lurking inside. It’s time for you as the SSCP to use your technical insight and security savvy to look at systems and ask, “How do these things usually fail? And then what happens?”
“How do things fail?” should be asked at two levels: how does the business process fail, and how does the underlying IT element or information asset fail to support the business process, and thus cause that business process to fail?
This phase of risk mitigation is very much like living the part of a detective in a whodunit. The SSCP will need to interview people who operate the business process, as well as the people who provide input to it and depend on its outputs to do their own jobs. Examining any trouble reports (such as IT help ticket logs) may be revealing. Customer service records may also start to show some patterns or relationships—broken or failing processes often generate customer problems, and out-of-the-ordinary customer service needs often stress processes to the breaking point.
It’s all about finding the cause-and-effect logic underneath the “what could go wrong” parts of our systems. Recall from Chapter 3 our discussion of proximate cause and root cause:
Both are valuable ideas to keep in mind as we look through our systems for “Where can it break and why?” While you’re doing that, it’s also important to keep asking “how would we know this has failed?” In this way you combine your search for possible vulnerabilities with identifying candidate indicators of compromise (IOCs).
Think about the chain of events that can lead from proximate to root cause—from the first stone shaken loose on the mountaintop to the avalanche it triggers. Each of those events in a failure of your systems gives off signals; some of those signals may be clear enough to be useful as warning flags of an imminent or ongoing compromise, intrusion, attack, or failure of a security system. These are the IOCs you need to pay prompt attention to. These become the alarms that trigger your response systems and people into action.
In many respects, vulnerability assessment is looking at the as-built set of systems, processes, and data and discovering where and how quality design was not built into them in the first place! As a result, a lot of the same tools and processes we can use to verify correct design and implementation can help us identify possible vulnerabilities:
Even the company suggestion box should be examined for possible signs that particular business processes don’t quite work right or are in need of help.
That’s a lot of information sources to consider. You can see why the SSCP needs to use prioritized business processes as the starting point. A good understanding of the information architecture and the IT architectures it depends on may reveal some critical paths—sets of processes, software tools, or data elements that support many high-priority business processes. The software, procedural, data, administrative, and physical assets that are on those critical paths are excellent places to look more deeply for evidence of possible vulnerabilities.
As you saw in Chapter 2, you and your organization are not alone in the effort to keep your information systems safe, secure, resilient, and reliable. There are any number of communities of practice with which you can share experience, insight, and knowledge:
https://cybersecurity.ieee.org/center-for-secure-design/
for ideas and information that might help your business or organization.You also have resources such as Mitre’s Common Vulnerabilities and Exposures (CVE) system and NIST’s National Vulnerability Database that you can draw upon as you assess the vulnerabilities in your organization’s systems and processes. Many of these make use of the Common Vulnerability Scoring System (CVSS), which is an open industry standard for assessing a wide variety of vulnerabilities in information and communications systems. CVSS makes use of the CIA triad of security needs (introduced in Chapter 1, “The Business Case for Decision Assurance and Information Security”) by providing guidelines for making quantitative assessments of a particular vulnerability’s overall score. Scores run from 0 to 10, with 10 being the most severe of the CVSS scores. Although the details are beyond the scope of the SSCP exam, it’s good to be familiar with the approach CVSS uses—you may find it useful in planning and conducting your own vulnerability assessments.
As you can see at https://nvd.nist.gov/vuln-metrics/cvss
, CVSS consists of three areas of concern:
Each of these uses a simple scoring process—impact assessment, for example, defines four values from Low to High (and “not applicable or not defined”). Using CVSS is as simple as making these assessments and totaling up the values.
Note that during reconnaissance, hostile threat actors use CVE and CVSS information to help them find, characterize, and then plan their attacks. The benefits we gain as a community of practice by sharing such information outweighs the risks that threat actors can be successful in exploiting it against our systems if we do the rest of our jobs with due care and due diligence.
As you can see, the risk and vulnerability assessment process is an example of knowledge engineering or knowledge discovery in action: you and your associates are mining every source of information available to you to learn how your IT and OT systems can be broken, and what kind of loss or impact that might result in. Many organizations use a risk register as a central repository of all of this data, information, and knowledge. It is (or should be) a living repository, not a static document that’s produced as a report and then left to go out of date. Risk registers can take on many forms, and do need to be tailored to meet the organization’s needs. More powerful security systems such as SIEM (security information and event management) and SOAR (security orchestration and automated response), as well as managed security services, usually provide the capabilities you need to build a risk register to suit your purposes.
If you picture a diagram of your information architecture (or IT architecture), you’ll notice that you probably can draw boundaries around groups of functions based on the levels of trust you must require all people and processes to have in order to cross that boundary and interact with the components inside that space. The finance office, for example, handles all employee payroll information, company accounting, and accounts payable and receivable, and would no doubt be the place you’d expect to have access to the company’s banking information. That imaginary line that separates “here there be finance office functions” from the larger world is the threat surface—a boundary that threats (natural, accidental, or deliberate) have to cross in order to access the “finance-private” information inside the threat surface. The threat surface is the sum total of all the ways that a threat can cross the boundary:
You see the dilemma here: authorized users and uses cross the threat surface all the time, and in fact, you cannot achieve the “A” in CIA without providing that right of way. Yet the threat actors need to be detected when they try to cross the threat surface and prevented from getting across it—and if prevention fails, you need to limit how much damage they can do.
Threat modeling is the broad, general term given to the art and science of looking at systems and business processes in this way. It brings a few thoughts into harmony with one another in ways that the SSCP should aware of. First, it encourages you to encapsulate complex functions inside a particular domain, boundary, or threat surface. In doing so, it also dictates that you look to minimize ways that anything can cross a threat surface. It then focuses your attention on how you can detect attempts to cross, validate the ones that are authenticated and authorized, and prevent the ones that aren’t. Threat modeling also encourages you to account for such attempts and to use that accounting data (all of those log files and alarms!) both in real-time alert notification and incident response, and as a source of analytical insight.
As you grow as an SSCP, you’ll need to become increasingly proficient in seeing things at the threat surface.
“Trust, but verify” applies to the human element of your organization’s information processes too! You need to remember that every organization, large or small, can fall afoul of the disgruntled employee, the less-than-honorable vendor or services provider, or even the well-intended know-it-all on its staff who thinks that they don’t need to follow all of those processes and procedures that the rest of the team needs. The details of how such a personnel reliability program should be set up and operated are beyond the scope of the SSCP exam or this book. Part of this is what information security practitioners call the “identity and access control problem,” and Chapter 6 will delve into this in greater depth. From a vulnerability assessment perspective, a few key points are worth highlighting now.
The information security impact assessment is the starting point (as it is for all vulnerability assessments). It should drive the design of jobs so that users do not have capabilities or access to information beyond what they really need to have and use. In doing so, it also indicates the trustworthiness required for each of those jobs; a scheduling clerk, for example, would not have access to company proprietary design information or customer financial data, and so may not need to be as trustworthy as the firm’s intellectual property lawyers or its accountants. With the job defining the need for capabilities and information, the processes designed for each job should have features that enforce these constraints and notify information security officials when attempts to breach those constraints occur. The log files, alerts, and alarms or other outputs that capture these violations must be inspected, analyzed, and assessed in ways that give timely opportunity for a potential security breach (deliberate or accidental) to be identified and corrected before it is harmfully exploited.
Beyond (but hand in hand with) separation of duties, the business process owners and designers must ensure that no task is asking more of any system component—especially the human one—than it can actually be successful with. As with any computer-based part of your business logic, tasks that systems designers allocate to humans to perform must be something humans can learn how to do. The preconditions for the task, including the human’s training, prior knowledge, and experience, must be identified and achievable. Any required tools (be they hammers or database queries) must be available; business logic for handling exceptions, out-of-limits conditions, or special needs have to be defined and people trained in their use. Finally, the saying “if you can’t measure it, you can’t manage it” applies to assessing the reliability of the human component as much as it does to the software, systems, and other components of a business process.
This combination of ingredients—separation of duties, proper task design, meaningful performance monitoring and assessment, and ongoing monitoring to detect errors or security concerns—reduces the risks that an employee is overstressed, feels that they are undervalued, or is capable of taking hostile action if motivated to do so.
Chapter 9 will address other aspects of how the human resources the organization depends on can be more active and effective elements in keeping the organization’s information safe, secure, and resilient.
Even the most well-designed information system will have gaps—places where the functions performed by one element of the system do not quite meet the expectations or needs of the next element in line in a process chain. When we consider just how many varied requirements we place on modern IT systems, it’s no wonder there aren’t more gaps rather than fewer! In general terms, gap analysis is a structured, organized way to find these gaps. In the context of information systems security, you do gap analysis as part of vulnerability assessment.
Several different kinds of activities can generate data and insight that feed into a gap analysis:
This last brings up an interesting point about the human element: as any espionage agency knows, it’s quite often the lowest-level employees in the target organization who possess the most valuable insight into its vulnerabilities. Ask the janitors, or the buildings and grounds maintenance staff; talk with the cafeteria workers or other support staff who would have no official duties directly involved in the systems you’re doing the gap analysis for. Who knows what you may find out?
A strong word of caution is called for: the results of your gap analysis could be the most sensitive information that exists in the company! Taken together, it is a blueprint for attack—it makes targets of opportunity easily visible and may even provide a step-by-step pathway through your defenses. You’d be well advised to gain leadership’s and management’s agreement to the confidentiality, integrity, and availability needs of the gap analysis findings before you have to protect them.
We’ve mentioned before that the SSCP needs to help the organization find cost-effective solutions to its risk mitigation needs. Here’s where that happens. Let’s look at our terms more closely first.
Risk treatment involves all aspects of taking an identified risk and applying a set of chosen methods to eliminate or reduce the likelihood of its occurrence, the impacts it has on the organization when (not if) it occurs, or both. Note that we say “eliminate or reduce,” both for probability of occurrence and for the impact aspects of a given risk. The set of methods taken together constitute the risk controls that we are applying to that particular risk.
Unfortunately, the language about dealing with risks is not very precise. Many different books, official publications, and even widely accepted risk management frameworks like NIST SP 800-37 can leave some confusion. Let’s see if some simple language can help un-muddy these waters:
Chapter 3 introduced the need for risk managers to decide whether a particular risk should be deterred, prevented, detected, avoided, or treated. That’s not a one-time decision made only at strategic levels. As risk assessment looks more closely at each risk, those same types of decisions get made again.
When thinking about risk treatment at a more tactical or operational level, security and risk professionals often talk about whether to accept, transfer (or share), mitigate (or treat), and avoid or eliminate. These similar lists of terms, used at different levels of decision making, highlight that the same concept (deterring a risk, for example) may be a strategic choice first, followed later on by more granular choices about how deterrence as a security control approach is put to use to achieve that objective.
With that in mind, let’s take a closer look at the broad categories of risk treatment strategies, tactics, and techniques.
This risk treatment strategy means that you simply decide to do nothing about the risk. You recognize it is there, but you make a conscious decision to do nothing differently to reduce the likelihood of occurrence or the prospects of negative impact. This is known as being self-insuring—you assume that what you save on paying risk treatment costs (or insurance premiums) will exceed the annual loss expectancy over the number of years you choose to self-insure or accept this risk.
The vast majority of vulnerabilities in the business processes and context of a typical organization involve negligible damages, very low probabilities of occurrence, or both. As a result, it’s just not prudent to spend money, time, and effort to do anything about such risks. In some cases, however, the vulnerabilities can be extensive and the potential loss significant, even catastrophic, to the organization, but the costs involved to deal with the risk by means of mitigation or transfer are simply unachievable.
Another, more practical example can be found in many international business situations. Suppose your company chooses to open wholesale supply operations in an area where the telecommunications and transportation infrastructures can be unreliable. When these infrastructures deliver the services you need, your organization makes a profit and earns political and community support as nontangible rewards. That reliable delivery doesn’t happen all of the time, however. You simply cannot spend the money to install and operate your own alternative infrastructures. Even if you could afford to do it, you might risk alienating the local infrastructure operators and the larger political community, and you need all the goodwill from these people that you can get! As a result, you just decide to accept the risk.
Note that accepting a risk is not taking a gamble or betting that the risks won’t ever materialize. That would be ignoring the risk. A simple example of this is the risk of having your business (or your hometown!) completely destroyed by a meteor falling in from outer space. We know it could happen; we’ve even had some spectacular near misses in recent years, such as what happened over Chelyabinsk, Russia in February 2013. The vast majority of us simply choose to ignore this risk, believing it to be of vanishingly small probability of occurrence. We do not gather any data; we do not estimate probabilities or losses; we don’t even make a qualitative assessment about it. We simply ignore it, relegate it to the realm of big-box-office science fiction thrillers, and go on with our lives with nary another thought about it.
Proper risk acceptance is an informed decision by organizational leaders and stakeholders.
Transferring or sharing a risk means that rather than spend our own money, time, and effort to reduce, contain, or eliminate the risk, we assign responsibility for some or all of it to someone else. For example:
Other ways of transferring risk might involve taking the process itself (the one that could incur the risk) and transferring it to others to perform as a service. Pizza tonight? Carry-out pizza incurs the risk that you might get into an accident while driving to or from the pizza parlor, but having the pizza delivered transfers that risk of accident (and injury or damage) to the pizza delivery service.
In almost all cases, transferring a risk is about transforming the risk into something somebody else can deal with for you. You save the money, time, and effort you might have spent to treat the risk yourself and instead pay others to assume the risk and deal with it.
There is a real moral hazard in some forms of risk transference, and the SSCP should be on alert for these. Suppose your company says that it doesn’t need to spend a lot of money dealing with information security, because it has a really effective liability insurance plan that covers it against losses. If thousands (or millions!) of customers’ personally identifying information is stolen by a hacker, this insurance policy may very well pay for losses that the company entails; the customers would need to sue the company or otherwise file a claim against it to recover from their direct losses to having their identity compromised or stolen. The insurance may pay all of those claims or only a portion of them, but only after each customer discovers the extent of the damages they’ve suffered and goes through the turmoil, effort, and expense of repairing the losses they’ve suffered, and then of filing a claim with the company. Perhaps the better, more ethical (and usually far less costly!) solution would have been to find and fix the vulnerabilities that could be exploited in ways that lead to such a data breach in the first place.
Simply put, this means that we find and fix the vulnerabilities to the best degree that we can; failing that, we put in place other processes that shield, protect, augment, or bridge around the vulnerabilities. Most of the time this is remedial action—we are repairing something that either wore out during normal use or was not designed and built to be used the way we’ve been using it. We are applying a remedy, a cure, either total or partial, for something that went wrong.
Do not confuse taking remedial action to mitigate or treat a risk with making the repairs to a failed system itself. Mitigating the risk is something you aim to do before a failure occurs, not after! Such remediation measures might therefore include the following:
Some vulnerabilities are best mitigated or treated by applying the right corrective fix—for example, by updating a software package to the latest revision level so that you are reasonably assured that it now has all the right security features and fixes included in it. Providing uninterruptible power supplies or power conditioning equipment may eliminate or greatly reduce the intermittent outages that plague some network, communications, and computing systems. The first (applying the software update) might be directly treating the vulnerability (by replacing a faulty algorithm with a more robustly designed one); providing power conditioning equipment is making up for shortcomings in the quality and reliability of the commercial power system and is a good example of bridging around or augmenting a known weakness.
The logical opposite of accepting a risk is to make the informed decision to stop doing business in ways or in places that expose you to that risk. Closing a store in a neighborhood with a high crime rate eliminates the exposure to risk (a store you no longer operate cannot be robbed, and your staff who no longer work there are no longer at risk of physical assault during such a robbery).
You avoid a risk either by eliminating the activity that incurs the risk or moving the affected assets or processes to locations or facilities where they are not exposed to the risk. Suppose you work for a small manufacturing company in which the factory floor has some processing steps that could cause fire, toxic smoke, and so forth to spread rapidly through the building. The finance office probably does not need to be in this building—avoid the risks to your accountants, and avoid the possible financial disruption of your business, by moving those functions and those people to another building. Yet the safety systems that are part of your manufacturing facility probably can’t be moved away from the equipment they monitor and the people they protect; at some point, the business may have to decide that the risk of injury, death, destruction, and litigation just aren’t worth the profits from the business in the long run.
This term refers to the never-ending effort to identify risks, characterize them, select the most important ones to mitigate, and then deal with what’s left. As we’ve said before, most risk treatments won’t deal with 100% of a given risk; there will be some residual risk left over. Recasting the risk usually requires that first you clearly state what the new residual risk is, making it more clearly address what still needs to be dealt with. From the standpoint of the BIA, the original risk has been reduced—its nature, frequency, impact, and severity have been recast or need to be described anew so that future cycles of risk management and mitigation can take the new version of the risk into consideration.
This has been defined as the risk that’s left over, unmitigated, after you have applied a selected risk treatment or control. Let’s look at this more closely via the following example.
Once again, you see the trio of physical, logical (also called technical), and administrative (PLA) actions as possible controls you can apply to a given risk or set of risks. These controls are often put to best use in combinations that reflect some fundamental security architectural concepts: least privilege, need to know, and separation of duties being among the most effective and most commonly used approaches. You’ll see in Chapter 6 that this same set of concepts have important roles to play as you strive to ensure that only authenticated users are authorized to take actions with your information systems. In that respect, a physical access control, such as a locked door requiring multifactor identification to be verified to permit entry, is also a physical risk control.
Physical controls are combinations of hardware, software, electrical, and electronic mechanisms that, taken together, prevent, delay, or deter somebody or something from physically crossing the threat surface around a set of system components you need to protect. They do this by guiding, directing, and controlling the movement of physical items (people, machines, vehicles, containers, or property) both across the outermost perimeter of a facility or location and within it. Large-scale architectural features, such as the design of buildings, their location in an overall facility, surrounding roads, driveways, fences, perimeter lighting, and so forth, are visible, real, and largely static elements of physical control systems. You must also consider where within the building to put high-value assets, such as server rooms, wiring closets, network and communication provider points of presence, routers and Wi-Fi hotspots, library and file rooms, and so on. Layers of physical control barriers, suitably equipped with detection and control systems, can both detect unauthorized access attempts and block their further progress into your safe spaces within the threat surface.
Network and communications wiring, cables, and fibers are also physical system components that need some degree of physical protection. Some organizations require them to be run through steel pipes that are installed in such a way as to make it impractical or nearly impossible to uncouple a section of pipe to surreptitiously tap into the cables or fibers. Segmenting communications, network, and even power distribution systems also provides a physical degree of isolation and redundancy, which may be important to an organization’s CIANA+PS needs.
Note the important link here to other kinds of controls. Physical locks require physical keys, or actuators that are controlled by information systems; multifactor authentication requires logical and physical systems; both require “people power” to create and then run the policies and procedures (the administrative controls) that glue it all together, and keep all of the parts safe, secure, and yet available when needed.
Here is where you use software and the parameter files or databases that direct that software to implement and enforce policies and procedures that you’ve administratively decided are important and necessary. It is a bit confusing that a “policy” can be a human-facing set of rules, guidelines, and instructions, and a set of software features and their control settings. Many modern operating systems, and identity-as-a-service provisioning systems, refer to these internal implementations of rules and features as policy objects, for example. So we write our administrative “acceptable use” policy document, and use it to train our users so that they know what is proper and what is not; our systems administrators then “teach” it to the operating system by setting parameters and invoking features that implement the software side of that human-facing policy.
In general terms, anything that human organizations write, state, say, or imply that dictates how the humans in that organization should do business (and also what they should not do) can be considered an administrative control. Policy documents, procedures, process instructions, training materials, and many other forms of information all are intended to guide, inform, shape, and control the way that people act on the job (and to some extent, too, how they behave off the job).
Administrative controls are typically the easiest to create—but sometimes, because they require the sign-off of very senior leadership, they can be ironically the most difficult to update in some organizational cultures. It usually requires a strong sense of the underlying business logic to create good administrative controls.
Administrative controls can cover a wide range of intentions, from informing people about news and useful information, to offering advice, and from defining the recommended process or procedure to dictating the one accepted way of doing a task or achieving an objective.
For any particular risk mitigation need, an organization may face a bewildering variety of competing alternative solutions, methods, and choices. Do we build the new software fix in house or get a vendor to provide it? Is there a turn-key hardware/software system that will address a lot of our needs, or are we better off doing it internally one risk at a time? What’s the right mix of physical, logical, and administrative controls to apply?
It’s beyond the scope of this book, and the SSCP exam, to get into the fine-grain detail of how to compare and contrast different risk mitigation control technologies, produces, systems, or approaches. The technologies, too, are constantly changing. As you gain more experience as an SSCP, you’ll have the opportunity to become more involved in specifying, selecting, and implementing risk mitigation controls.
Controls, also called countermeasures, are the active steps we take to put technologies, features, and procedures in place to help prevent a vulnerability from being exploited and causing a harmful or disruptive impact. Controls can perform one or more functions, which we’ll express in their adjective form (such as reactive). The most common functions needed for controls include:
One further functional control type, compensating controls, is actually a set of three different types of functions, each of which in some way assists the actions of other controls in mitigating the potential impact of a vulnerability (or set of vulnerabilities) or recovering from those impacts. Generally, compensating controls are used:
We must remember that with each new control we install or each new countermeasure we adopt, we must also make it part of the command, control, and communications capabilities of our integrated information security and assurance systems. For example:
In many organizations, a spiral development process is used to manage risk mitigation efforts. A few high-priority risks are identified, and the systems that support them are examined for underlying vulnerabilities. Suitable risk mitigation controls are chosen and implemented; they are tested to ensure proper operation and correct results. End users are trained about the presence, purpose, and use of these controls, and they are declared operational. Then the next set of prioritized risks, and perhaps residual risks from this first set, are implemented in much the same way.
Note that even in this spiral or cyclic fashion, there really is a risk mitigation implementation plan! It may only exist as an agreed-to schedule by which the various builds or releases of risk mitigation controls will be specified, installed, tested, and made operational. The SSCP assists management by working to ensure that each increment of risk mitigation (each set of mitigation controls being installed, tested, and delivered to operational use) is logically consistent, that each control is installed correctly, and that users and security personnel know what to expect from it.
As with any implementation project, the choice to implement a particular set of risk mitigation controls should carry with it the documented need it is fulfilling. What is this new control required to actually do once we start using it? This statement of functional requirements forms the basis for verification and validation of our implementation, and it is also a basis for ongoing system security monitoring and assessment. The risk mitigation implementation plan should address these issues.
The implementation plan should also show how you’ll engage with the routine configuration management and change control processes that are used in the business. In many businesses and organizations, policies direct that changes to business processes, operational software, or security systems have to be formally requested and then reviewed by the right set of experts, who then recommend to a formal change control board that the request be approved. Configuration management board approval usually includes the implementation plan and schedule so that this change can be coordinated with other planned activities throughout the organization.
This step includes all activities to get the controls into day-to-day routine operational use. User training and awareness needs identified in the implementation plan must be met; users, security personnel, and the rest of the IT staff must be aware of the changes and how to deal with anything that seems strange in the “new normal” that the new controls bring with them. In most organizations, some level of senior leadership or management approval may be required to declare that the new controls are now part of the regular operational ways of doing business.
Detailed implementation of specific controls will be covered in subsequent chapters. For example, Chapter 5 will go into greater depth about technologies and techniques to use when securing voice, video, and public and internal social media, as well as how physical and logical segmentation of networks and systems should be achieved.
Keep in mind that “control” is just the middle element of command, control, and communications. The control devices or procedural elements have to communicate with the rest of the system so that we know what is going on. Some types of data that must be shared include but are not limited to:
All of those types of control data must be exchanged between systems elements, if the system is to accomplish its assigned tasks. Even systems that are purely people-powered exchange information as part of the protocols that bring those people together to form a team. (Think about a baseball game: the catcher signals to the pitcher, but the runner on second is trying to see the signals too, to see if now’s the time to attempt to steal third base.)
Recall that command is the process of deciding what to do and issuing directives or orders to get it done; control, on the other hand, takes commands and breaks them down into the step-by-step directions to work units, while it monitors those work units for their performance of the assigned task. All systems have some kind of command and control function, and the OODA loop model presented earlier in this chapter provides a great mental model of such control systems. Most human-built systems exist to get specific jobs done or needs met, but those systems also have to have internal control processes that keep the system operating smoothly, set off alarms when it cannot be operated safely, or initiate corrective actions if they can. We can think of command and control of systems as happening at three levels of abstraction: getting the job done, keeping the system working effectively, and keeping it safe from outside corruption, damage, or attack.
Industrial control systems give us a great opportunity to see the importance of effective command, control, and communications in action at the first two levels. Most industrial machinery is potentially dangerous to be around—if it moves the wrong way at the wrong time, things can get broken and people can be killed. Industrial control system designers and builders have wrestled with this problem for almost three centuries, such as those that control an oil refinery or an electric power generating station. Command systems translate current inputs (such as demands for electricity and price bids for its wholesale purchase) into production or systems throughput goals; then they further refine those into device-by-device, step-by-step manipulation of elements of the overall system. Most of the time, this is done by exchanging packets of parameter settings, rather than device commands specifically (such as “increase temperature to 450 degrees F” rather than “open up the gas valve some more”). Other control loops keep the system and its various subsystems operating within well-understood safety constraints. These Supervisory Control and Data Acquisition (SCADA) systems are a special class of network and systems devices for data sharing, command, and control protocols used throughout the world for industrial process control. Much of this marketplace is dominated by special-purpose computers known as programmable logic controllers (PLCs), although many Internet of Things devices and systems are becoming more commonplace in industrial control environments. NIST Special Publication 800-82 Rev. 2, Guide to Industrial Control System (ICS) Security, is an excellent starting point for SSCPs who need to know more about ICS security challenges and how they relate to information system risk management concepts in broader terms. It also helps map ICS or SCADA vulnerability information into the National Vulnerability Database (NIST Publication 800-53 Rev. 4).
Since the early 1990s, however, more and more industrial equipment operators and public utility organizations have had to deal with a third kind of command, control, and communications need: the need to keep their systems safe when faced with deliberate attacks directed at their SCADA or other command, control, and communications systems. It had become painfully clear that the vast majority of the lifeblood systems that keep a modern nation alive, safe, secure, well fed, and in business were hosted on systems owned and operated by private business, most of them using the Internet or the public switched telephone network (PSTN) as the backbone of their command, control, and communications system. In the United States, the President’s Commission on Critical Infrastructure Protection (PCCIP) was created by President Bill Clinton to take on the job of awakening the nation to the need for this third level of C3 systems—the ones that keep modern information-driven economies working correctly and safe from hostile attacks via those information infrastructures.
Security professionals and industrial systems engineers use the term operational technology (OT) as a collective way of referring to this fusion of information systems and physically sensing or changing the world around us. Many of the cyber attacks in 2019 through 2021 demonstrated that there was no air gap between the IT and SCADA or ICS systems—there was no effective isolation between the Internet world and the operational technologies that modern life relies upon so heavily.
In many respects, the need for SSCPs and the standards we need people to uphold as SSCPs was given birth by the PCCIP.
As we said in Chapter 3, risk management must start with the senior leaders of the organization taking full responsibility for everything related to risk management. “The captain goes down with the ship” may not literally require that the ship’s commander drown when the ship sinks, but it does mean that no matter what happens, when it happens, ultimately that captain or commander has full responsibility. Captains of ships or captains of industry (as we used to call such senior leaders) may share their due care and due diligence responsibilities, and they usually must delegate the authority and responsibility to achieve them. Regardless, the C-suite and the board of directors are the ones who operate the business in the names of the owners and stakeholders. They “own” the bad news when due diligence fails to protect the stakeholder’s interests
This has two vital spin-offs for risk management programs, plans, and processes:
That last does need a bit of clarification. Obviously, the best way to keep a secret is to not share it with anyone; the next-best way is to not tell anyone else that you have a secret. If senior leaders or stakeholders are making a lot of public noise about “our successful efforts to eliminate information risk,” for example, that might be just the attractive nuisance that a threat actor needs to come and do a little looking around for something exploitable that’s been overlooked or oversold.
Statements by senior leaders, and their appearance at internal and external events, all speak loudly. Having the senior leaders formally sign off on acceptance testing results or on the results of audits and operational evaluation testing are opportunities to confirm to everyone that these things are important. They’re important enough to spend the senior leadership’s time and energy on. The CEO and the others in the C-suite do more than care about these issues. They get involved with them; they lead them. That’s a very powerful silver bullet to use internally; it can pay huge dividends in gaining end-user acceptance, understanding, and willing compliance with information security measures. It can open everyone’s eyes—maybe just a little; perhaps just enough to spot something out of the ordinary before it becomes an exploited vulnerability.
There’s been a lot of hard work accomplished to get to where a set of information risk controls have been specified, acquired (or built), installed, tested, and signed off by the senior leaders as meeting the information security needs of the business or organization. The job thus far has been putting in place countermeasures and controls so that the organization can roll with the punches, and weather the rough seas that the world, the competition, or the willful threat actors out there try to throw at it. Now it’s on to the really hard part of the job—keeping this information architecture and its IT architectures safe, secure, and resilient so that confidentiality, integrity, and authorization requirements are met and stay met. How do we know all of those safety nets, countermeasures, and control techniques are still working the way we intended them to and that they’re still adequate to keep us safe?
The good news is that this is no different than the work we did in making our initial security assessments of our information architecture, the business logic and business processes, and the IT architectures and systems that make them possible. The bad news is that this job never ends. We must continually monitor and assess the effectiveness of those risk controls and countermeasures, and take or recommend action when we see they no longer are adequate. Putting the controls in place was taking due care; due diligence is achieved through constant vigilance.
More good news: the data sources you used originally, to gain the insight you needed to make your first assessments, are still there, just waiting for you to come around, touch base, and ask for an update. Let’s take a closer look at some of them.
As you selected and implemented each new or modified information risk mitigation control, you had to identify the training needs for end users, their managers, and others. You had to identify what users and people throughout the organization needed to know and understand about this control and its role in the bigger picture. Achieving this minimum set of awareness and understanding is key to acceptance of the control by everyone concerned. This need for acceptance is continual, and depending on the nature of the risk control itself, the need for ongoing refresher training and awareness may be quite great. Let’s look at how different risks might call for different approaches to establish initial user awareness and maintain it over time:
The key to keeping users engaged with risk management and risk mitigation controls is simple: align their own, individual interests with the interests the controls are supporting, protecting, or securing. Chapter 11, “Business Continuity via Information Security and People Power,” will show you some strategies and techniques for achieving and maintaining this alignment by bringing more of your business’s “people power” to bear on everybody’s CIA needs.
By this time, our newly implemented risk mitigation controls have gone operational. Day by day, users across the organization are using them to stay more secure, (hopefully) achieving improved levels of CIA in their information processing tasks. The SSCP and the information security team now need to shift their mental gears and look to ongoing monitoring and assessment of these changes. In one respect, this seems easy; the identified risk, and therefore the related vulnerability, focused us on changing something in our physical, logical, or administrative processes so that our information could be more secure, resilient, reliable, and confidential; our decisions should now be more assured.
Are they?
The rest of the world did not stand still while we were making these changes. Our marketplace continued to grow and change; no doubt other users in other organizations were finding problems in the underlying hardware, software, or platforms we use; and the vendors who build and support those systems elements have been working to make fixes and patches available (or at least provide a procedural workaround) to resolve these problems. Threat actors may have discovered new zero day exploits. And these or other threat actors have been continuing to ping away at our systems.
We do need to look at whether this new fix, patch, control, or procedural mitigation is working correctly, but we’ve got to do that in the context of today’s system architecture and the environment it operates in…and not just in the one in which we first spotted the vulnerability or decided to do something about the risk it engendered.
The SSCP may be part of a variety of ongoing security assessment such as penetration testing or operational test and evaluation (OT&E) activities, all intended to help understand what the security posture of the organization is at the time that the tests or evaluations are conducted. Let’s take a closer look at some of these types of testing. This kind of test and evaluation is not to be confused with the acceptance testing or verification that was done when a new control was implemented—that verification test is necessary to prove that you did that fix correctly. It should also be kept distinct in your mind from regression testing, the verification that a fix to one systems element did not break others. Ongoing security test and evaluation is looking to see if things are still working correctly now that the users—and the threat actors—have had some time to put the changes and the total system through their paces.
OT&E, in its broadest sense, is attempting to verify that a given system and the people-powered processes that implement the overall set of business logic and purpose actually get work done correctly and completely, when seen from the end users’ or operators’ perspective. That may sound straightforward, but quite often, it is a long, complex process that produces some insight rather than clear, black-and-white “succeed” or “fail” scorecard results. Without going into too much detail, this is mainly because unavoidable differences exist between the system that business analysts thought was needed and what operational users in the organization are actually doing, day by day, to get work done. Some of those differences are caused by the passage of time; if it takes months to analyze a business’s needs, and more months to build the systems, install, test, and deliver them, the business has continued to move on. Some reflect different perceptions or understanding about the need; it’s difficult for a group of systems builders to understand what a group of systems users actually have to do in order to get work done. (And quite often, users are not as clear and articulate as they think they are when they try to tell the systems analysts what they need from the new system. Nor are the analysts necessarily the good listeners that they pride themselves on being.)
OT&E in security faces the same kind of lags in understanding, since quite often the organization doesn’t know it has a particular security requirement until it is revealed (either by testing and evaluation, or by enemy action via a real incident). This does create circular logic: we think we have a pretty solid system that fulfills our business logic, so we do some OT&E on it to understand how well it is working and where it might need to be improved—but the OT&E results cause us (sometimes) to rethink our business logic, which leads to changes in the system we just did OT&E on, and in the meantime, the rest of the world keeps changing around us.
The bottom line is that operational test and evaluation is one part of an ongoing learning experience. It has a role to play in continuous quality improvement processes; it can help an organization understand how mature its various business processes and systems are. And it can offer a chance to gain insight into potentially exploitable vulnerabilities in systems, processes, and the business logic itself.
Ethical penetration testing is security testing focused on trying to actively find and exploit vulnerabilities in an organization’s information security posture, processes, procedures, and systems. There are some significant legal and ethical issues that the organization and its testers must address, however, before proceeding with even the most modest of controlled pen-testing. In most jurisdictions around the world, it is illegal for anyone to attempt to gain unauthorized entry into someone else’s information systems without their express written permission; even with that permission in hand, mistakes in the execution of pen-testing activities can expose the requesting company or the penetration testers to legal or regulatory sanctions. To avoid legal complications, organizations typically use third-party penetration test organizations and use specific, detailed contracts and test plans that clearly identify responsibilities, identify the purpose(s) of the testing, place boundaries or constraints on how the testing is carried out, and determine how liability or responsibility for damage or disruption will be dealt with. Contracts also include nondisclosure requirements and direct that the testers cannot retain any data, interim analysis results, and findings they gather or generate as a result of the test. By tightly specifying the nature and extent of the test, the parties keep it legal; as it’s all about strengthening the organization’s cyber defenses, this keeps it ethical.
The first major risk to be considered in pen-testing is that first and foremost, pen testers are trying to actively and surreptitiously find exploitable vulnerabilities in your information security posture and systems. This activity could disrupt normal business operations, which in turn could disrupt your customers’ business operations. For this reason, the scope of pen-testing activities should be clearly defined. Reporting relationships between the people doing the pen-testing, their line managers, and management and leadership within your own organization must be clear and effective.
Another risk comes into play when using external pen-testing consulting firms to do the testing, analyze the results, and present these results to you as the client. Sometimes, pen-testing firms hire reformed former criminal hackers (or hackers who narrowly escaped criminal prosecution), because they’ve got the demonstrated technical skills and hacker mindset to know how to conduct all aspects of such an attack. Yet, you are betting your organization’s success, if not survival, on how trustworthy these hackers might be. Can you count on them actually telling you about everything they find? Will they actually turn over all data, logs, and so forth that they capture during their testing and not retain any copies for their own internal use? This is not an insurmountable risk, and your contract with the pen-testing firm should be adamant about these sorts of risk containment measures. That said, it is not a trivial risk.
The SSCP exam will not go into much detail as it pertains to operational testing and evaluation or to penetration testing. You should, however, understand what each kind of ongoing or special security assessment, evaluation, and testing activities might be; have a realistic idea of what they can accomplish; and be aware of some of the risks associated with them.
Whether security assessments are done via formalized penetration testing, as part of normal operational test and evaluation, or by any of a variety of informal means, each provides the SSCP an opportunity to identify ways to make end users more effective in the ways they contribute to the overall information security posture. Initial training may instill a sense of awareness, while providing a starter set of procedural knowledge and skills; this is good, but as employees or team members grow in experience, they can and should be able to step up and do more as members of the total information security team.
End user questions and responses during security assessment activities, or during debriefs of them, can illuminate such opportunities to improve awareness and effectiveness. Make note of each “why” or “how” that surfaces during such events, during your informal walk-arounds to work spaces, or during other dialogue you have with others in the organization. Each represents a chance to improve awareness of the overall information security need; each is an opportunity to further empower teammates be more intentional in strengthening their own security hygiene habits.
A caution is in order: some organizational cultures may believe that it’s more cost-effective to gather up such questions and indicators, and then spend the money and time to develop and train with new or updated training materials when a critical mass of need has finally arisen. You’ll have to make your own judgment, in such circumstances, whether this is being penny-wise but pound-foolish.
Think back to how much work it was to discover, understand, and document the information architecture that the organization uses, and then the IT architectures that support that business logic and data. Chances are that during your discovery phase, you realized that a lot of elements of both architectures could be changed or replaced by local work unit managers, group leaders, or division directors, all with very little if any coordination with any other departments. If that’s the case, you and the IT director, or the chief information security officer and the CIO, may have an uphill battle on your hands as you try to convince everyone that proper stewardship does require more central, coordinated change management and control than the company is accustomed to.
The definitions of these three management processes are important to keep in mind:
Since the IT and information security communities talk a lot about assets and asset management, let’s take a closer look at just what these concepts are and at how the security professional needs to be aware of and engaged with them.
In information technology (IT) and operational technology (OT) terms, an asset is an identifiable element of hardware, software, data, or a set of interfaces; by deciding that this element has some value to the organization, or is critical to achieving one of the organization’s goals or objectives (large or small), that element becomes an asset. Assets are not supplies, nor are they raw materials; they are not consumed by use, but can be reused many times. Many assets are actually large collections of lower-level assets themselves: a laptop computer is an asset, consisting of its hardware, its operating system, the many applications that are installed on it, and the contents of data files and databases that are also loaded onto that laptop. Thus, a single laptop identified with its property tag ID might actually consist of 50 or more assets bundled together and then issued to an employee for them to use.
Generally speaking, the lifecycle of an asset consists of the following major activities:
We might call that the planned IT/OT asset management lifecycle model. Many organizations face a slightly different asset lifecycle situation, one in which the planning and management steps are markedly different. We’ll call this the unplanned or discovery IT asset management process. For many reasons, organizations may find that they have many different instances of unplanned, unmanaged hardware, software, and data elements that are part of their operational business activities. This can come about in numerous ways:
Whatever the source of that unplanned asset, once it’s been discovered, management has to decide: do we bring it in under formal security and configuration management, or do we let it continue to operate as is, “out in the wild” but still within the boundaries of the organization? Either way involves a trade-off of risks and costs.
System security assessments and audits, network scans, and systems enumerations can often discover some of these undocumented and unmanaged assets. Security professionals then need to manage their way through the process of identifying them, discovering their owners and users, and making an initial triage-style security assessment. After all, until you actually look closely enough at that surprise package, you have no idea whether it’s part of an attacker’s set of hostile agents or a valuable contributor to your organization’s business activities.
Every element of data, information, or knowledge that the organization uses has a similar security lifecycle that it flows through, which follows that data item from its creation through all the steps of storage, usage, modification, sharing, archiving, and then disposal at end of life. Unlike hardware or software elements, however, data elements may be being “touched” by users or software (and hardware) entities thousands of times per second throughout their useful life. Some data is never archived or modified; other data may not even be stored except in the transient working memory of an instrument, sensor, or other endpoint.
One important way to view this lifecycle is to think about the three states that any data item may be in, at any given moment:
The information classification and categorization processes identified types or classes of data items that need special protection (whether they are transient or not).
Conceptually, this should be where the organization’s risk management activities come together. An inward-looking fusion center approach would provide cross-links and navigation tools that allow security personnel, risk managers, systems and applications builders, and other managers and leaders the ability to understand the inter-relationships of their many different business activities, the types of information they depend upon and produce, and the always-changing risk environment and threat landscape the organization faces.
Organizations that are just starting out on their IT and OT security management journeys need to grow toward this type of introspective fusion approach. One of the best ways to do that is to keep asking questions about your systems, the risks they face, and your dependence upon them. Most organizations cannot easily identify which of their systems or processes are highly susceptible to an insider threat, perhaps one where an employee might attempt to commit fraud or exfiltrate sensitive data. Answering such questions is akin to reverse engineering the organization from top to bottom. Yet, this is what a determined attacker does, as they fingerprint your systems, characterize your defenses, find a way inside your outer walls, and then move about searching for assets to target that meet their needs.
Keeping this knowledge base alive and useful—and keeping it safe and secure—is best done as an ongoing process, one with its own set of workflows and procedures. These workflows can position this knowledge base as a strong supporting element in configuration management and change control, in security assessment, and even in help desk support processes.
As an SSCP, consider asking (or looking yourself for the answers to!) the following kinds of questions:
If you’re unable to get good answers to those kinds of questions, from policy and procedural directives, from your managers, or from your own investigations, you may be working in an environment that is ripe for disaster.
To be effective, any management system or process must collect and record the data used to make decisions about changes to the systems being managed; they must also include ways to audit those records against reality. For most business systems, we need to consider three different kinds of baselines: recently archived, current operational, and ongoing development. Audits against these baselines should be able to verify that:
Audits of configuration management and control systems should be able to verify that the requirements and design documentation, source code files, builds and control systems files, and all other data sets necessary to build, test, and deploy the baseline contain authorized content and changes only.
We’ll address this in more detail in Chapters 9 and 10.
Prudent risk managers have been doing this for thousands of years. Guards would patrol the city and randomly check to see that doors were secured at the end of the workday and that gates were closed and barred. Tax authorities would select some number of taxpayers’ records and returns for audit, to look for both honest mistakes and willful attempts to evade payment. Merchants and manufacturers, shipping companies, and customers make detailed inventory lists and compare those lists after major transactions (such as before and after a clearance sale or a business relocation). Banks and financial institutions keep detailed transaction ledgers and then balance them against statements of accounts. These are all examples of regular operational use, inspection, audit, and verification that a set of risk mitigation controls are still working correctly.
We monitor our risk mitigation controls so that we can conclude that either we are safe or we are not. Coming to a well-supported answer to that question requires information and analysis, and that can require a lot of data just to answer “Are we safe today?” Trend analysis (to see if safety or security has changed over time, with an eye to discovering why) requires even more data. The nature of our business, our risk appetite (or tolerance), and the legal and regulatory compliance requirements we face may also dictate how often we have to collect such data and for how long we have to keep it available for analysis, audit, or review.
Where does the monitoring data come from? This question may seem to have an obvious answer, but it bears thinking about the four main types of information that we deliberately produce with each step of a business process:
Notice one important fact: no useful data gets generated unless somebody, somewhere, decided to create a process to get the data generated by the system, output in a form that is useful, and then captured in some kind of document, log file, or other memory device. When we choose to implement controls and countermeasures, we choose systems and components that help us deal with potential problems and inform us when problems occur.
All of that monitoring data does you absolutely no good at all unless you actually look at it. Analyze it. Extract from it the stories it is trying to tell you. This is perhaps the number one large-scale set of tasks that many cybersecurity and information security efforts fail to adequately plan for or accomplish. Don’t repeat this mistake.
Mistake number two is to not have somebody on watch to whom the results of monitoring and event data analysis are sent to so that when (not if) a potentially emergency situation is developing, the company doesn’t find out about it until the Monday morning after the long holiday weekend is over. Those watch-standers can be on call (and receive alerts via SMS or other mobile communications means) or on site, and each business will make that decision based on their mission needs and their assessment of the risks. Don’t repeat this mistake either.
Mistake number three is to not look at the log data at all unless some other problem causes you to think, “Maybe the log files can tell me what’s going on.”
These three mistakes suggest that we need what emergency medicine calls a triage process: a way to sort out patients with life-threatening conditions needing immediate attention from the ones who can wait a while (or should go see their physician during office hours).
Let’s look at the analysis problem from the point of view of those who need the analysis done and work backward from there to develop good approaches to the analytical tasks themselves. But let’s not repeat mistake number four, often made by the medical profession—that more often than not, when the emergency room triage team sends you back home and says “See your doctor tomorrow,” their detailed findings don’t go to your doctor with you.
The alert team is watching over the deployed, in-use operational IT systems and support infrastructures. That collection of systems elements is probably supporting ongoing customer support, manufacturing, shipping, billing and finance operations, and website and public-facing information resources, as well as the various development and test systems used by different groups in the company. Their job is to know the status, state, and health of these in-use IT systems, but not necessarily the details of how or for what purpose any particular end user or organization is using those systems.
Who is the alert team? It might be a part of the day shift help desk team, the people everybody calls whenever any kind of IT issue comes up. In other organizations, the alert team is part of a separate IT security group, and their focus is on IT security issues and not normal user support activities.
What does this alert team do? The information security alert team has as their highest priority being ready and able to receive alerts from the systems they monitor and respond accordingly. That response typically includes the following:
What we can see from that list of alert team tasks is that we’re going to need the help of our systems designers, builders, and maintainers to help figure out
The immediacy of the alert team’s needs suggests that lots of data has to be summarized up to some key indicators, rather like a dashboard display in an automobile or an airplane. There are logical places on that dashboard for “idiot lights,” the sort of red-yellow-green indicators designed to get the operator’s attention and then direct them to look at other displays to be better informed. There are also valid uses on this dashboard for indicator gauges, such as throughput measures on critical nodes and numbers of users connected.
The alert team may also need to be able to see the data about an incident shown in some kind of timeline fashion, especially if there are a number of systems elements that seem to be involved in the incident. Timeline displays can call attention to periods that need further investigation and may even reveal something about cause and effect.
Before we jump to a conclusion and buy a snazzy new security information management dashboard system, however, take a look at what the other monitoring and event data analysis customers in our organization might need.
The IT support team is actually looking at a different process: the process of taking user needs, building systems and data structures to meet those needs, deploying those systems, and then dealing with user issues, problems, complaints, and ideas for improvements with them. That process lends itself to a fishbone or Ishikawa diagram that takes the end users’ underlying value chain and reveals all of the inputs, the necessary preconditions, the processing steps, the outputs, and how outputs relate to outcomes. This process may have many versions of the information systems and IT baselines that it must monitor, track, and support at any one time. In some cases, some of those versions may be subsets of the entire architecture, tailor-made to support specific business needs. IT and the configuration management and control board teams will be controlling these many different product baseline versions, which includes keeping track of which help desk tickets or requests for changes are assigned to (scheduled to be built into) which delivery. The IT staff must also monitor and be able to report on the progress of each piece of those software development tasks.
Some of those “magic metrics” may lend themselves to a dashboard-style display. For large systems with hundreds of company-managed end-user workstations, for example, one such status indicator could be whether all vendor-provided updates and patches have been applied to the hardware, operating systems, and applications platform systems. Other indicators could be an aggregate count of the known vulnerabilities that are still open and in need of mitigation and the critical business logic affected by them.
Trend lines are also valuable indicators for the IT support staff. Averages of indicators such as system uptime, data or user logon volumes, accesses to key information assets, or transaction processing time can be revealing when looked at over the right timeframe—and when compared to other systems, internal, or external events to see if cause-and-effect relationships exist.
What end users require may vary a lot depending on the needs of the organization and which users are focused on which parts of its business logic. That said, end users tend to need traffic-light kind of indications that tell them whether systems, components, platforms, or other elements they need are ready and available, down for maintenance, or in a “hands-off” state while a problem is being investigated. They may also appreciate being able to see the scheduled status of particular changes that are of interest to them. Transparent change management systems are ones in which end users or other interested parties in the business have this visibility into the planned, scheduled builds and the issues or changes allocated to them.
We might rephrase “What do leadership and management need?” and ask how the analysis of monitoring and event data can help management and leadership fulfill their due care and due diligence responsibilities. Depending on the management and leadership style and culture within the organization, the same dashboard and summary displays used by the alert team and IT support staff may be just what they need. (This is sometimes called a “high-bandwidth-in” style of management, where the managers need to have access to lots of detailed data about what’s going on in the organization.) Other management and leadership choose to work with high-level summaries, aggregates, or alarm data as their daily feeds.
One key lesson to remember is suggested by the number of alert team tasks that lead to notifying management and leadership of an incident or alarm condition. Too many infamous data breach incidents became far too costly for the companies involved because the company culture discouraged late-night or weekend calls to senior managers for “mere” IT systems problems. (The data breach at retail giant Target, in 2013, suffered in part from this failure to properly notify and engage senior leadership before such incidents happen so that the company could respond properly when one occurred.)
At some point, the SSCP must determine that an incident of interest has occurred. Out of the millions of events that a busy datacenter’s logging and monitoring systems might take note of every 24 hours, only a handful might be worthy of sounding an alarm:
That’s a pretty substantial list, but in a well-managed and well-secured datacenter, most of those kinds of incidents shouldn’t happen often. When they do (not if they do), several important things have to occur properly and promptly:
Part of that initial triage kind of response involves determining whether the incident is sufficiently serious or disruptive that the organization should activate its incident response plans and procedures. We’ll cover these in Chapter 11 in more detail; for now, recognize that businesses have an abiding due diligence responsibility to think through what to do in an emergency well before that emergency first occurs!
Immediate response to an incident may mean that the first person to notice it has to make an immediate decision: is this an emergency that threatens life or property and thus requires initiating emergency alarms and procedures? Or is it “merely” an information systems incident not requiring outside emergency responders? Before you take on operational responsibilities, make sure you know how your company wants to handle these decisions.
We said at the onset of this book that the commitment by senior business leadership and management is pivotal to the success of the company’s information risk management and mitigation efforts. As an SSCP, you and the rest of the team went to great efforts to get those senior leaders involved, gain their understanding, and acceptance of your risk assessments. You then gained their pledges to properly fund, staff, and support your risk mitigation strategies, as well as your chosen risk countermeasures and controls.
Much like any other accountable, reportable function in the company, information security must make regular reports to management and leadership. The good news (no incidents of interest) as well as the bad news about minor or major breaches of security must be brought to the attention of senior leaders and managers. They need to see that their investments in your efforts are still proving to be successful—and if they are not, then they need to understand why, and be informed to consider alternative actions to take in the face of new threats or newly discovered vulnerabilities.
Management and leadership may also have legal and regulatory reporting requirements of their own to meet, and your abilities to manage security systems event data, incident data, and the results of your investigations may be necessary for them to meet these obligations. These will, of course, vary as to jurisdiction; a multinational firm with operating locations in many countries may face a bewildering array of possibly conflicting reporting requirements in that regard.
Whatever the reporting burden, the bottom line is that the information security team must report its findings to management and leadership. Whether those findings are routine good news about the continued secure good health of the systems or dead-of-night emergency alarms when a serious incident seems to be unfolding, management and leadership have an abiding and enduring need to know.
No bad news about information security incidents will ever get better by waiting until later to tell management about it.
We’ve spent Chapters 3 and 4 learning how to defend our information, our information systems (the business logic that uses information), and our information technology architectures from harm due to accident, Mother Nature, or hostile action by insiders or external actors alike. That has taken us from risk management through risk mitigation, as we’ve seen how the leadership, management, systems, and security teams must work together to make smart trade-offs between the possible pain of a risk becoming reality and the real costs incurred to purchase, install, and operate a control or countermeasure that prevents or reduces that possible loss.
Throughout, we have applied the basic concepts of confidentiality, integrity, and availability as the characteristics by which we assess our information security measures. In broad terms, this CIA triad helps us manage the risks. We’ve seen that without knowing and controlling our systems baselines, we have very little opportunity to detect a vulnerability becoming a disruptive event; thus, we’ve seen how managing our systems baselines and exerting a reasonable amount of change control keeps them safer. The underlying software and hardware of an unmanaged and uncontrolled system may have the same vulnerabilities as a well-managed, tightly controlled system using the same technologies; it is that lack of people-centric management and control processes that expose the unmanaged systems to greater probability of occurrence of an exploitation being attempted or succeeding.
Finally, we’ve seen that the understanding and involvement of all levels of organizational leadership and management are vital to making risk management pay off. Risk management is not free; it takes valuable staff time, intellectual effort, and analysis to pull all of the data; understand the business logic, processes, and architecture; and find the high-priority vulnerabilities. It takes more money, time, and effort to make changes that contain, fix, or eliminate the risks that those vulnerabilities bring with them. But by the numbers, we see that there are ways to make quantitative as well as qualitative assessments about risks, and which ones to manage or mitigate.
Know the important security differences between the information architecture and the information technology architecture. The information architecture focuses on how people use information to accomplish business objectives; thus, its principal security issues are involved with guiding, shaping, or constraining human behavior. Well-considered workforce education and training programs that align business objectives with information security and decision assurance needs are solid investments to make. By contrast, the IT architecture is perceived as needing predominantly logical or technical controls that require significant expertise and knowledge to deploy and maintain effectively. This perception is true as far as it goes, but it must be driven by the needs for technical security support to the security needs of the information architecture.
Know how to conduct an architecture assessment. The architecture assessment is both an inventory of all systems elements and a map or process flow diagram that shows how these elements are connected to form or support business processes and thereby achieve the needs of required business logic. This requires a thorough review and analysis of existing physical asset/equipment inventories, network and communications diagrams, contracts with service providers, software and systems change control logs, error reports, and change requests. It also should include data-gathering interviews with end users and support personnel.
Explain the purpose and value of a systems or architecture baseline for security purposes. The systems or architecture baseline, which the assessment documents, is both the reality we have to protect and the model or description of that reality. The baseline as documentation reflects the as-built state of the system today, and versions of the baseline can reflect the “build-to” state of the system for any desired set of changes that are planned. These provide the starting point for vulnerability assessments, change control audits, and problem analysis and error correcting.
Explain the importance of assessing “shadow IT” systems, standalone systems, and cloud-hosted services as part of a security assessment. Many organizations are more dependent on IT systems elements that are not in their direct configuration management and control. As such, formal IT management may not have detailed design information, and hence vulnerability insight, about such systems elements. The information security assessment needs to identify each instance of such systems elements, and based on the BIA, determine how much inspection, investigation, or analysis of these systems (and contracts related to them, if any) need to be part of the security assessment.
Know how to perform a vulnerabilities assessment. The vulnerabilities assessment gathers data about the information architecture and the IT architecture, including Common Vulnerabilities and Exposures (CVE) data from public sources. This data is analyzed in the context of the BIA’s prioritized impacts to determine critical vulnerabilities in these architectures. Threat modeling may also be useful in this process. The result is a list of known or suspected vulnerabilities, collated with the BIA’s priorities, for use in risk mitigation implementation planning.
Explain the role of threat modeling in vulnerability assessment. Threat modeling focuses your attention on the boundaries that separate systems from one another, and from the outside world, and thus on how any request for access, service, or information can cross such boundaries. These crossing points are where legitimate users and threat actors can conceivably enter your systems. These may be tunnels (VPN or maintenance trapdoors) left open by accident, for example. Threat modeling is an important component in a well-balanced vulnerability assessment.
Know how to include human elements in the architecture and vulnerability assessments. As the vulnerability assessment reviews business processes and the systems elements that support them, this may indicate process steps where end-user, manager, or other staff actions present vulnerabilities. These may be due to training deficiencies, or to weaknesses in administrative controls (such as a lack of policy direction and guidance), or they may indicate significant risks in need of physical or logical controls and countermeasures.
Explain the basic risk treatment options of accept, transfer or share, remediate, avoid, and recast. Once you’ve identified a vulnerability, you deal with (or treat) its associated risk with a combination of control options as required. Accepting the risk means you choose to go ahead and continue doing business in this way. Transferring the risk usually involves paying someone else to take on the work of repairs, reimbursements, or replacement of damaged systems if the risk event occurs; sharing a risk means that you transfer a portion of it to another, while the remaining (residual) risk stays with you to deal with. Remediation includes repairing or replacing the vulnerable system and is often called “fixing” or “mitigating” the risk. Avoiding a risk means to change a business process so that the risk no longer applies. The application of any risk controls may reduce the probability of occurrence or the nature of the impact of the risk, and thus you have recast (reassessed) the risk.
Know how to determine residual risk and relate it to information security gap analysis. Residual risk is the risk remaining after applying treatment options, and thus it is a recasting of the original risk. Residual risks are in essence gaps in our defenses; gap analysis uses the same approach as vulnerability assessment but is focused on these gaps to see which if any present unacceptable levels of exposure to risk.
Know how and why to perform an information security gap analysis. A gap analysis is similar to auditing a system’s requirements list against the as-built implementation; both seek to discover any needs (requirements) that are not addressed by an effective combination of system features, functions, and elements. An information security gap analysis can reveal missing or inadequate security coverage, and it is useful during vulnerability assessment and after mitigations have been implemented. It is performed by reviewing the baselined set of information security requirements (which should meet or exceed BIA requirements) against the baseline information and IT architectures, noting any unsatisfied or partially satisfied requirements.
Know how the physical, logical, and administrative aspects of risk controls work together. Each of these types of controls takes a decision about security policy and practice and implements it so that people, information technology, and physical systems behaviors fit within security-approved manners. An acceptable use policy, for example, may state that employee-owned devices cannot be brought into secure work areas; a physical search of handbags and so forth might enforce this, and logical controls that detect such devices when they attempt to connect to the networks are a further layer of detection and prevention. Almost all security starts with making decisions about risks; we then write requirements, objectives, plans, or other administrative (people-facing) documents to cause those decisions to be carried out and to monitor their effectiveness.
Explain the requirements for integrated command, control, and communications of risk treatments and countermeasures. Each element of our controls and countermeasures needs to be part of an interlocking, self-reinforcing whole in which elements constantly communicate information about their status, state, and health, or about any alert or alarm-worthy conditions. Systems security managers should have near-seamless, real-time visibility into this information, as well as the ability to remotely manage or command systems elements in response to a suspected or actual information security event. Without this, gaps become blind spots.
Explain the various uses of testing and verification for information assurance and security. Testing and verification are intended to verify that systems meet specified requirements. Testing is typically conducted in test environments, whereas verification can involve observations collected during testing or during ongoing operational use. Security testing and verification aim to establish how completely the information security requirements are satisfied in the deployed systems, including any risk mitigations, controls, or countermeasures that have been added to them since deployment. It validates that the confidentiality, integrity, and availability of the information systems meets or exceeds requirements in the face of ongoing risks and threats. It can also indicate that new threats, vulnerabilities, or risks are in need of attention, decision making, and possibly mitigation.
Know why we gather, analyze, and interpret event and monitoring data. Almost all systems are built around the principle of “trust, but verify.” Due diligence requires that we be able to monitor, inspect, or oversee a process and be able to determine that it is working correctly—and when it is not, to be able to make timely decisions to intervene or take corrective action. Due diligence dictates that systems be built in such ways that they provide not only outputs that serve the needs of business logic but also suitable diagnostic, malfunction, or other alarm indicators. Much of these are captured in event log files by the systems themselves. IT security personnel need to gather these event logs and other monitoring data and collate, analyze, and assess it to (a) be able to recognize that an event of interest is occurring or has occurred, and (b) verify that interventions or responses to this incident are having the desired effect.
Know the importance of elevating alerts and findings to management in a timely manner. Two time frames of interest dictate how information security teams elevate alerts and findings to management. The first is in real time or near-real time, when an event of possible interest is being detected and characterized. If such an event requires emergency responses, which quite often are disruptive to normal business operations, then the right levels of management should be engaged in this decision. When not faced with an emerging situation, management needs to be apprised when ongoing monitoring, assessment, or analysis suggests that the systems are behaving either in abnormal ways or in ways indicative of previously unrecognized risks. Further investigation may involve additional staff or other resources or be disruptive to normal operations; thus, management should be engaged in a timely manner.
Explain the role of incident management in risk mitigation. Risks express a probability of an event whose outcome we will likely find disruptive, if not damaging, to achieving our goals and objectives. Risk mitigation attempts to limit or contain risks and to notify us when a risk event seems to be imminent or is occurring. Incident management provides the ability in real time to decide when and how to intervene to prevent further damage, halt the incident, restore operational capabilities, and possibly request support from other emergency responders. All of those incident management actions help mitigate the effects of risk on our organization and its business processes.
Explain the importance of including operational technology (OT) systems in risk management and mitigation activities. Operational technology (OT) is the broad term for any kind of information systems device which physically interacts with the real world. These can include industrial process control (ICS), supervisory control and data acquisition (SCADA) systems, smart building and environmental management systems, and safety and security systems. Internet of Things (IoT) devices, along with autonomous and mobile robotic devices, are also considered OT. If the organization has invested in (or allows the use of) any of these technologies in its buildings, vehicles, processes, or products and services, it is dependent upon them to one degree or another; that means that the vulnerabilities inherent in the OT systems (and the IT systems that monitor and control them, and inform management about their operation) are at risk.
Explain the different functional types of risk controls. Risks can be controlled or mitigated by means of applying one or more functional controls to them. Directive controls issue commands or guidance, and along with deterrent controls, seek to change the behavior of users and potential attackers or intruders. Preventative controls place barriers or obstacles in the way of an intruder, which both delay the intrusion and raise the effort required to accomplish it. Detective controls observe signals from systems elements and raise an alarm if those signals indicate a potential incident (an intrusion, attack, or out of limits condition) needs attention. Reactive controls take separate, independent action to respond to the incident, such as shutting down servers or closing fireproof doors. Corrective controls, similar to reactive controls, take actions that attempt to nullify, contain, or limit the impacts of an incident. Recovery controls work to restore systems, facilities, or locations back to normal operating condition. Compensating controls may provide workarounds during recovery, act as a full or partial substitute for some other required control, or augment the mitigation efforts of another control. All of these functions work together, and many individual control devices or techniques may provide any or all of these functions in combination.
Explain the use of asset inventories and the risk register in risk mitigation. An asset inventory is a list of all information assets, including IT and OT elements, that the organization depends upon to achieve its goals and objectives. By ensuring that the inventory is complete, any hardware, software, communications pathways, or data that is discovered in or on the systems infrastructure that is not listed in the inventory is potentially suspect and needs to be investigated. Risk assessment, whether done as asset-based, outcomes-based, or process-based, will ultimately link risks (and prioritized goals and objectives) to assets, producing the risk register. Vulnerabilities associated with each asset, such as in published CVE or internal, proprietary data, are then added to the risk register. Together with the security baseline (reflecting classification and categorization decisions), this knowledge base informs change management, security assessment, and ongoing risk mitigation and security operations.