Creating a successful DR plan
Understanding why you’re never done with a DR project
Trying your luck (or not)
You need to understand many factors that lead to a successful project and disaster recovery plan before you begin a DR project. I start with executive sponsorship, which is probably the most vital factor. Then I cover other up-front formalities so your organization can understand the level of effort you need from the DR planning team to complete a successful DR project.
As go its leaders, so goes the organization.
An organization undertakes disaster recovery planning because company shareholders want the organization to survive through difficult times, including disasters that threaten its very existence. A DR project needs executive sponsorship in two key areas:
Prioritization of key subject matter experts: Disaster recovery planning requires the best and brightest minds in the organization. You need the employees who are the most familiar with business operations to perform the Business Impact Analysis and risk analysis, as well as develop disaster recovery procedures. They also need to perform walkthroughs, simulations, and parallel/cutover testing.
Without executive sponsorship, these individuals are pulled in too many directions at the same time, which can threaten to stall the entire effort.
Spending priorities: You need executive sponsorship to ensure that you can improve IT systems to support established Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs). You may need significant investments in IT systems, infrastructure, and software.
Before a DR project can get under way, you need to define the precise scope of the project. Spell out exactly which business processes are in and which are out.
You can best define the scope and other key points of your DR project in a charter document. A charter is a formal document that defines the project as follows:
Project definition: A statement, usually not exceeding a paragraph or two, that describes the project at a high level. A generic definition for a DR project might be something like, “Determine the priority of key business processes, including their Maximum Tolerable Downtime (MTD), Recovery Time Objective (RTO), and Recovery Point Objective (RPO). Define and fund necessary capital improvements in IT systems, and develop and test recovery procedures.”
Executive sponsors: The names of the executives who are sponsoring the project. These individuals are responsible for allocating resources (staff and budget) to support the DR project. They commit to its completion according to the key milestones that the charter defines.
Project objectives: The desired outcomes from the project.
Project scope: Defines which parts of the business you include in the project and which parts you don’t. When you complete this project successfully, you can start new DR projects that include other parts of the business.
Key milestones: Dates by which you want to accomplish key milestones. Here are some sample dates:
• Sept 30, 2009: Business Impact Analysis completed.
• March 30, 2010: Recovery procedures completed.
• June 30, 2010: Recovery procedures tested.
Key responsibilities: Key individuals who have specific responsibilities through the entire project.
Sources of funding: There’s no such thing as a free lunch. Where do you plan to get the money for this DR project?
Signatures: The signatures of the executive sponsors, as well as other key individuals named in the charter. Include the department heads from all departments whose resources you need for the project.
Make the charter a public document within the organization. Announce the DR project and make the charter available for reference.
A DR project without people to work towards its objectives isn’t a DR project at all, but only an idea. Ideas — even good ones — can’t save the business if disaster strikes, and the organization can’t benefit if the DR project doesn’t get any resources.
Management needs to make specific commitments of specific resources and adjust certain named individuals’ time and priorities. For instance, management may need to define a certain numbers of hours per week or percentage of hours worked by DR team members so the project can make the desired amount of forward progress. For instance, you could define the priorities for a system engineer’s time like this:
First priority: Critical system outages
Second priority: Top-priority service requests
Third priority: Disaster recovery procedure development
Fourth priority: Medium-priority service requests
The preceding list may be a little simplistic, but these kinds of priorities — in writing — help staff members make task decisions without constant management guidance.
If staff members account for their time in weekly status reports, they should include time spent on the DR project so you can track the actual effort required to sustain the project.
The ultimate success of your DR project rests in the accuracy and completeness of all the disaster recovery procedures. Those staff members and managers who are most familiar with critical business processes and the IT systems that support them need to develop and test the DR procedures.
If you place recovery procedure development in less capable hands, you end up with lower quality recovery procedures that take more time to review and improve. You’ll need more time to get the project to a point at which you have adequate recovery plans.
Management needs to determine when to bring in consultants, primarily for their expertise, but also for their ability to augment the DR plan development effort.
You need to plan any sizeable project in detail. By any measure, a DR project is a sizeable project: It requires participation from many people in different departments, and it needs to be successful.
You must identify every task required to complete the project, and identify and quantify all necessary resources — who, how much time, and how much funding you need, as well as dependencies between tasks. Only after you manage these details can you track the DR project week by week and manage it as a real project.
If you launch a DR project before you develop a plan, executive management can’t put much faith in the stated milestones. Don’t put a stake in the ground and declare that you’ll complete the project by a specific date before you know whether that date is realistic. Your project’s outcome may be compromised — it’s either of poor quality or late.
Remember this saying: Good, fast, or cheap — pick any two.
A DR project requires support not only from IT, but also from line managers, middle managers, and executive managers in the departments that operate critical processes. Other departments that may play a part in DR planning include
Project Management Office (PMO), if such a department exists and manages enterprise-wide projects
Human Resources (HR)
Facilities
Finance or Accounting
Legal
Security
External Affairs, or whoever’s responsible for communications to customers or shareholders
Define support from every key department in writing and include those definitions in the DR project charter described in the section “Well-Defined Scope,” earlier in this chapter, and in Chapter 1. Require the department heads from every department that manages resources the DR project to sign the charter, as well.
A disaster recovery plan that you haven’t tested is worth only the paper that you write it on (or the hard drive that you store it on). Until you thoroughly test all the recovery procedures, the organization shouldn’t expect those procedures to save it from ruin if a disaster strikes.
You need to perform five types of testing on all disaster recovery procedures:
Paper: An individual reads through a recovery procedure and makes any annotations or suggested corrections.
Walkthrough: A recovery team reviews the recovery procedure, step by step. Issues and discussions fill the day.
Simulation: A recovery team walks through a scripted simulation, discussing assessment and recovery procedures so they can determine whether a disaster recovery plan is reasonable.
Parallel: A recovery team tests recovery procedures by actually building or setting up recovery systems. The team also performs test transactions on the systems to see how well the procedures work and whether team members can actually build and operate the recovery systems.
Cutover: A recovery team performs a full cutover, in which recovery systems that the recovery team build or prepare on short notice support live business processes. This is the ultimate test of a DR plan.
I cover testing fully in Chapter 10.
You aren’t finished with the DR project when you successfully complete the last test. That’s only the end of the first trip around the lifecycle. You must perpetually commit to the following DR plan activities:
Periodic testing: Test all DR procedures regularly, according to a schedule that fits the risks associated with the individual business processes being supported. For example, life-support processes probably deserve weekly or monthly cutover testing, but you can test less critical processes less often. This testing process includes not only repeated walkthroughs, but also scheduled simulations, parallel tests, and cutover tests. You should perform a parallel or cutover test at least once per year.
Periodic review: Have subject matter experts review disaster recovery procedures at least two to four times each year to ensure that those procedures are still relevant and accurate. Review emergency contact lists monthly.
Periodic revisions: Periodic testing and review indicate when you need to update recovery plans and emergency contact lists.
Business Impact Analysis and risk analysis review: Review the BIA and risk analysis documents at least once per year to ensure that key objectives, such as the Recovery Time Objective (RTO) and Recovery Point Objective (RPO), are still adequate.
Integration into business processes: Business activities such as system upgrades, mergers and acquisitions, and new product or service launches should include routine reviews of BIA, risk analysis, and other DR documents to ensure that they remain current and relevant.
Don’t make DR planning an island that you visit only now and again. If your organization considers disaster recovery planning a one-off or overlay, it’s short-changing itself and missing opportunities to improve its DR plans. DR must become a way of life for many in the organization.
Several business processes should automatically include review of and possible revisions to the BIA (Business Impact Analysis), risk analysis, and disaster recovery procedures:
Major application upgrade or migration: Any time you upgrade an IT application that supports a critical business process or migrate that application to a new platform, you should, at the very least, review and update disaster recovery procedures, as needed. During a migration to a new IT application, you may also need to revise the BIA and risk analysis. A major upgrade or migration project should include review and revisions to DR documents; otherwise, the organization may wait far too long to upgrade these documents. If a disaster occurs before you upgrade your DR documents, you drastically reduce your ability to recover those systems.
Business relocation: If the business changes locations or adds more office space in the same or a different city, you need to revisit recovery plans to make sure they’re still valid.
Merger or acquisition: If another organization acquires your organization, or your organization acquires another organization or merges with an organization, the fundamental mission and financial profile for the organization changes considerably. You need to conduct a top-down revision to the entire DR plan, starting with the BIA and risk analysis, and continuing with revisions to all recovery procedures.
Changing market conditions: Fundamental changes in the market, such as the entry or exit of a major competitor or a change in the structure of the market, call for a reassessment of the Business Impact Analysis and risk assessment. You may also need to make revisions to recovery procedures.
New service or product launch: In addition to requiring disaster recovery capabilities, a new service or product launch may reposition other business processes, making some more important and others less important. You may have to deal with fundamental changes in your organization’s investment in recovery capabilities.
Change in senior or executive management: Changes in upper management sometimes lead to strategic changes in direction. Document any such changes and analyze their impact on the existing BIA and risk assessments.
Wiktionary defines luck as, “Something that happens to someone by chance, a chance occurrence.”
The role of luck in disaster recovery planning deals with whether your organization experiences a disaster that threatens its survival. And your organization is pretty lucky if no disaster occurs, at least until you complete your DR plan.
Seneca, the Roman dramatist, supposedly said, “Luck is what happens when preparation meets opportunity.” The preparation involves your DR planning, which gets you ready to face the opportunity of a disaster. And yes, I do mean opportunity. Disasters and other difficulties give you opportunities for greatness.
This quote from Edna Mote in the Disney film The Incredibles says it all: “Luck favors the prepared, darling.”