Improving your processes and systems
Keeping your insurance costs down
Discovering who your leaders are
Following standards and regulations
Putting your business ahead of the pack
NASA’s work over the past forty years has supposedly led to the development of everyday products and services that millions of people enjoy today. Similarly, organizations that undertake disaster recovery planning enjoy a number of spin-off benefits that help the organization, even if a disaster never occurs.
No organization is immune from the effects of natural and man-made disasters. The only question is, does the organization invest in processes and systems that can assure its survival from any disaster?
Although I can’t guarantee that every organization with a solid DR plan can survive any disaster, an organization with a DR plan is far more likely to survive than an organization that doesn’t have such a plan. Contingency Planning and Management Magazine indicated that 40 percent of companies that had to shut down for three days or more failed within 36 months. This statistic should remind you that failure in your DR planning isn’t an option.
Carnegie Mellon University measures the maturity of business processes by using the Capability Maturity Model (CMM), developed by Carnegie Mellon’s Software Engineering Institute (SEI).
The CMM has five levels. According to the SEI, “Predictability, effectiveness, and control of an organization’s software processes are believed to improve as the organization moves up these five levels. While not rigorous, the empirical evidence to date supports this belief.”
Here are the CMM’s five levels:
Level 1 — Initial: Processes are usually ad hoc. You don’t even write them down, or you document them in informal ways. The organization abandons processes in times of stress. Success in the organization depends on heroics, not processes.
Level 2 — Repeatable: You more formally document processes and can repeat those processes, even in times of stress. You manage projects by using project plans.
Level 3 — Defined: You establish and improve processes over time. Processes drive consistency throughout the organization. Processes are only qualitatively predictable.
Level 4 — Quantitatively managed: You establish, periodically improve, and measure processes. You also establish objectives for process performance. Process performance is quantitatively predictable.
Level 5 — Optimizing: Incremental improvements and technology innovation improve processes. You can set process improvement objectives. You can address and measure causes of process variations, instead of simply considering those variations aberrations.
For more information about the Capability Maturity Model, check out www.sei.cmu.edu/cmmi.
Disaster recovery planning puts the microscope to the processes and procedures that support the most critical activities in an organization. Throughout the process of analyzing and developing disaster recovery plans for a process, you may experience one or more instances of, “Hey, we can be doing this better!”
The activities in which you most likely find process improvements are
Business Impact Analysis (BIA): Analysts take a close look at a business process to determine its participants, critical assets and systems that support the process, critical suppliers, and how you measure and manage the process. Indeed, a business process may get its closest scrutiny during a BIA. If the people performing the BIA are trained to look for improvement opportunities, they’ll find them.
Risk analysis: During the BIA, analysts seek and identify process risk areas, and they look for ways to reduce the risks that they find. Analysts probably identify the most significant improvements to processes and architecture during this process.
Recovery plan development: When you develop the actual recovery procedures for a process, you need to take a very close look at that process and how it’s performed in normal situations. If you’re paying attention, you might find flaws or opportunities for improvement in the process you’re working with.
Recovery plan walkthroughs: Staff members who perform walkthroughs may, as they ponder and discuss individual steps in recovery procedures, discover opportunities for improving day-to-day processes and procedures. You might hear someone say, “If we do this in a recovery procedure, we can do that during normal periods and save time (or money, mistakes, and so on).”
Simulations, parallel tests, and cutover tests: While performing actual recovery procedures in testing exercises, staff members may make crucial discoveries about the ways in which your organization performs processes, thus finding ways to optimize processes and reduce mistakes. One of the reasons that you need a scribe or record keeper during tests is to capture these opportunities.
One of the objectives of disaster recovery planning is to make systems more resilient so you can reduce disasters related to flaws in systems and architectures. When the DR planning team performs the Business Impact Analysis (BIA) and establishes key metrics, including the Maximum Tolerable Downtime (MTD), Recovery Time Objective (RTO), and Recovery Point Objective (RPO), team members can make key decisions to improve the architecture of IT systems so your organization can meet these objectives.
One of the objectives of disaster recovery planning is to improve the resiliency of the IT systems that support the organization’s most critical business processes. When you establish key time-related objectives, such as the Recovery Time Objective (RTO) and Recovery Point Objective (RPO), the DR project team should make improvements in the IT systems and infrastructure that support those critical business processes. Here are the changes you can make:
Improve storage systems, including RAID, mirroring, replication, and better backups.
Improve servers, including greater server consistency.
Improve hardware, including redundant power supplies.
Establish or improve the change management process to better manage changes that you make to servers and infrastructure.
Establish or improve configuration management capabilities to better track changes that you make to systems and other components.
Establish or improve server cluster utilization.
Establish or improve power management systems, including Uninterruptible Power Supplies (UPSs) and generators.
One of the objectives of disaster recovery planning is to improve the resiliency of the IT systems that support critical business processes. This improved resiliency leads to fewer disruptions of those business processes when these events occur:
Hard drive failure: No problem, you have one of these contingencies:
• You have RAID technology now and will replace the hard drive during the next maintenance period.
• You have on-site spares.
• You can recover from backups.
• The other server in the cluster can take over operations.
Power supply failure: No problem, either
• You have systems with redundant power supplies.
• The other server in the cluster can take over operations.
Short power outage: No problem, the UPS takes over instantaneously and can support critical systems and HVAC for up to 30 minutes.
Extended power outage: No problem, the UPS takes over instantaneously, and the electric generator starts momentarily and can supply power for up to two days. After that, you can get deliveries of fuel, if needed.
Fire in the data center: Quite a problem, but you have options:
• You can recover data in your alternate processing center tomorrow.
• Servers in the alternate processing center can take over momentarily.
Which option you choose depends on how quickly you need those recovery servers to be on-line.
Earthquake: Quite a problem, but you can set up one of two plans:
• You can recover data in your alternate processing center tomorrow.
• Servers in the alternate processing center can take over momentarily.
Your disaster scenario: Your systems architecture provides a defined level of resilience, and your recovery procedures can guide you to a predictable recovery well within established timelines.
When these disruptive events occur, your organization can survive them because you developed response and recovery plans that get your critical systems back online in whatever timelines you establish.
If your organization has purchased one or more policies that insure the organization against the losses associated with disasters, the insurance company that issues the premium probably offers discounts if you take certain measures to reduce the likelihood and impact of common disaster scenarios.
Having an insurance policy against disasters is an essential part of the overall plan because your organization may need the infusion of cash such a policy provides to get you through the events that unfold in a disaster. But you need the preparation and resilience that you develop during the DR project just as much.
A vital part of the disaster recovery planning lifecycle involves testing disaster response and recovery procedures. When you involve all the key personnel in testing, you may find some of the results pleasantly surprising:
Simulation testing: Sequester the recovery team for a day or longer, and put them through their paces in a realistic disaster scenario. The participants perform the procedures, starting with disaster declaration and continuing with emergency communications, disaster assessment, and commencement of recovery operations. Throughout this exercise, you may witness the natural leadership abilities of one or more people as they go through the paces.
Parallel testing: In this challenging endeavor, the disaster response team follows procedures to get recovery systems up and running. This test isn’t easy: The team has to deal with problems and challenges that no one anticipated during walkthrough testing. Often, the entire team needs to cooperate to overcome these barriers. Stress and challenge provide opportunities for leadership: Natural leaders step up and help the entire team successfully complete its objectives.
Cutover testing: This is the most stressful DR test because recovery systems actually support critical business processes. In this DR test, failure or delay isn’t an option. If you have any born leaders on the team, you may see their leadership in action as the team pushes through the barriers together to get recovery systems running and supporting the business.
I cover DR testing in Chapter 10.
Disaster recovery planning has historically been an optional endeavor for organizations that develop the will to survive a disaster. Increasingly, disaster recovery planning has progressed from being a good idea to being required by standards and regulations.
These common standards require a measure of business continuity planning and disaster recovery planning:
PCI DSS (Payment Card Industry Data Security Standard): Version 1.1 of PCI states in section 12.9.1, “Create the incident response plan to be implemented in the event of system compromise. Ensure the plan addresses, at a minimum, specific incident response procedures, business recovery and continuity procedures, data backup processes, roles and responsibilities, and communication and contact strategies (for example, informing the Acquirers and credit card associations).” (The emphasis is mine.)
HIPAA (Health Insurance Portability and Accountability Act): HIPAA’s Security Rule contains many requirements to protect electronic patient health information (EPHI) from unauthorized access. HIPAA also requires the availability of EPHI (to health care workers when needed), including a disaster recovery plan to ensure its availability. Section 164.308(a)(7)(i) includes the following language:
“Standard: Contingency plan. Establish (and implement as needed) policies and procedures for responding to an emergency or other occurrence (for example, fire, vandalism, system failure, and natural disaster) that damages systems that contain electronic protected health information.”
Section 164.308(a)(7)(ii)(B) specifies how you must carry out this requirement:
“Implementation specifications:
(A) Data backup plan (Required). Establish and implement procedures to create and maintain retrievable exact copies of electronic protected health information.
(B) Disaster recovery plan (Required). Establish (and implement as needed) procedures to restore any loss of data.
(C) Emergency mode operation plan (Required). Establish (and implement as needed) procedures to enable continuation of critical business processes for protection of the security of electronic protected health information while operating in emergency mode.
(D) Testing and revision procedures (Addressable). Implement procedures for periodic testing and revision of contingency plans.
(E) Applications and data criticality analysis (Addressable). Assess the relative criticality of specific applications and data in support of other contingency plan components.”
ISO27001: Section A.14 of this internationally known standard for information security management contains five distinct requirements for the establishment and testing of business continuity and disaster recovery plans.
Other regulations dealing with data privacy and protection imply the need for disaster recovery planning as a means to protect information from corruption and loss. Over time, I believe national, regional, and state/provincial laws in many countries will include more requirements concerning disaster recovery planning.
You can find the PCI standard at www.pcisecuritystandards.org.
HIPAA is available online at www.cms.hhs.gov/SecurityStandard/Downloads/securityfinalrule.pdf.
You can’t get ISO27001 except by purchasing it. You can buy it from the International Standards Organization at www.iso.org. Type 27001 into the Search text box and click Search.
Organizations are locked in an endless competitive struggle against their peers in the marketplace. Because more organizations are going global and adopting just-in-time processes that require continuous availability and service levels, organizations need to always be available and functioning.
Even though businesses need to be more and more available, disasters continue to occur, many of which you have no control over. Organizations that become more resilient do so through disaster recovery planning and business continuity planning.
Organizations that have mature DR plans can truly say to their customers, “We will be there for you, whenever you need us, even if a disaster strikes. You can rely on us — no matter what.” Thus, disaster recovery can become one more competitive differentiator. Organizations with strong DR plans can say, “We are better than our competitors because we have DR plans that will make us available in any circumstance.”