9

Planning and Reporting

Both were perplexed about what they had done and what they were to do. “Am I taken prisoner, or have I taken him prisoner?” each of them thought.

Leo Tolstoy, War and Peace

Once the decision has been made, close your ear even to the best counterargument: sign of a strong character. Thus an occasional will to stupidity.

Friedrich Nietzsche, Epigrams and Interludes

In Chapter 4 I suggested that IT spending cannot be considered in isolation; it affects other budget categories and depends on an organization’s strategic intentions. A marginal dollar spent on IT might result in more than a marginal dollar of revenue in one business unit or cost reduction in another. Dollars spent in IT might also have consequences for future revenues and costs. A marginal dollar to reduce technical debt or build some other kind of agility might create an option that has much more than a dollar of impact later on.

So how can an enterprise know how much to spend? How can it reduce costs, or make sure that every dollar it spends earns as much value as possible? And how can it measure and report on the results of its IT investments?

Reducing IT Costs

The main driver of IT spending should always be the company’s operational and strategic needs. This point is easily neglected when we think of IT as separate from the business, a painful and unavoidable cost. In that case it makes sense to consider reducing its total budget to drive efficiencies within IT. But as we pull IT deeper into the heart of the enterprise, our focus needs to shift. Instead of just cutting total costs, our goal is to eliminate waste so that every dollar spent on IT is as effective as possible, and then to adjust spending based on the organization’s business goals. The IT financial strategy for the digital age is to focus on leanness.

A good place to look for waste is in administrative overhead. With large initiatives that don’t produce product until late in the effort—the traditional approach—there is a lot of risk to manage, resulting in a lot of administrative overhead. Documents are produced, discussions conducted, statuses reported, decisions signed off on, steering groups convened, and fingernails bitten. This risk-mitigating administrative activity is expensive. And when budgets are tightened, this activity isn’t the part that is cut—after all, with a tight budget, we have even more need to make sure that every dollar is well spent.

But in an environment where leanness is prized, such activities have a painful impact on lead times and cost. I’m not suggesting that we stop overseeing our investments responsibly. What I am suggesting is that we make the oversight lean, that we remove any ineffective activities from it, and that we make sure the cost of oversight is appropriate given the amount of risk. And then we find ways to reduce risk, so we can reduce administrative overhead.

In Chapter 8: Bureaucracy and Culture, I showed how the DHS Analysis of Alternatives document was wasteful. It was only one of about one hundred documents that are prepared for a project. The thirteen required gate reviews included one to make sure there was an actual need for the project, another to verify that the plan was in place, one to make certain the requirements had been locked down, yet another to make sure the system design had been completed . . . you get the picture. Each document required signatures from a number of stakeholders and every gate review had dozens of attendees.

Since much of our work was done by contractors, we also spent time negotiating contract terms to protect us against performance risk. Overly cautious stakeholders wanted applications to be tested in ways that were barely effective in finding defects. Program reviews were conducted by the GAO—and by the Inspector General, the Office of Management and Budget, the DHS CIO’s office, assorted consultants, and of course, the financial auditors.

These are expensive mechanisms, especially in terms of lead times. They’re the obvious costs of risk management—for that is their intention, right?—but there are more subtle costs as well. These include the frequent meetings to discuss the status of the project, its requirements, the project plan, when exactly to deploy the system, whether the stakeholders are ready, and whether the system has been tested thoroughly. When projects are going poorly, the length and frequency of these meetings increases, thereby worsening the problem.

Altogether, I’m fairly certain that in government IT we were spending about ten dollars to mitigate the risk of every one dollar of actual engineering spend.* That’s the government. But how much is your enterprise spending? What is the right proportion to spend on risk management, and how much effect should we let it have on lead times? And, more importantly, how much of it goes to mitigate the risk that IT won’t behave, as the contractor-control model fears?

There is a vicious circle in heavy-handed oversight. Large projects incur heavy oversight because they’re risky. The oversight then becomes so much of a burden that everyone tries to avoid it. More and more is added to each project so that employees won’t have to go through the oversight process more than once. The result is that each project becomes larger and riskier.

The best way to reduce the cost of oversight is to break this vicious circle—to conduct only small initiatives that return value quickly, thereby reducing risk and requiring less oversight. The risk-reducing practices I’ve discussed throughout this book—staging investments, using fast feedback cycles to test ideas, deploying automated controls—all are cost-effective ways to accomplish the same goals.

Because these oversight costs aren’t confined to the IT budget but spread across the entire enterprise, it can be difficult to identify the potential savings. It’s only by looking at IT delivery holistically, rather than looking at organizational budget categories, that we can spot the potential for eliminating waste.

One of the metrics I think is most important for gauging the efficiency of IT processes is the ratio of administrative overhead to actual creation work. Product creation and operation—engineering work—is what actually adds value for the enterprise. All other effort is just there to support it and can be thought of as overhead, whether it’s performed by the engineers or others. Not all overhead can be eliminated, but it’s the part of the IT budget that should be purposefully minimized.

Remember the sources of waste that come from managing IT as if it were an independent contractor. There is the overhead of negotiating schedules with IT; documenting requirements in a bulletproof, no-scope-creep-allowed way; managing change requests when requirements are discovered to be unsatisfactory; not to mention the effort IT spends justifying its value and administering chargeback models, for example. Each is a cost that increases that important ratio of administrative costs to total costs. All are costs we’ve assumed we have to bear because of an old mental model about how enterprise IT works.

The largest area for savings is in removing feature bloat; that is, in following the Agile principle of “maximizing the amount of work not done.” In the model where IT is handed a set of requirements and told to deliver on it, IT cannot take part in reducing this cost. Remember that the amount of waste in unneeded features and features that don’t actually accomplish their goals can be tremendous—as much as two-thirds of the spending on IT delivery.

The best way to eliminate feature bloat is with a Lean Startup approach:

  1. 1.First build a bare-bones minimum viable product.
  2. 2.Add to it incrementally, prioritizing the features that will contribute most to accomplishing the goal.
  3. 3.Continue with step 2 until diminishing returns suggest that you stop.

Jim Highsmith’s advice is:

Do less: cut out or cut down projects, cut out overhead that doesn’t deliver customer value, cut out or cut down features during release planning, cut out or cut down stories [requirements] during iteration planning, cut down work-in-process to improve throughput. At the same time, focus on delighting the customer by frequent delivery of value.1

A popular IT metric is the percentage of its spending used to create or acquire new capabilities (“innovation”) versus the spending necessary for “keeping the lights on (KTLO).” This metric is often promoted by CIOs and IT leaders to explain the constraints they face in a statement like, “Well, we couldn’t accomplish that much this year because we needed 70% of our budget just for keeping the lights on.”

KTLO is assumed to be something akin to waste; it is the remainder of the budget that is not thought to be adding business value. I have trouble accepting this. KTLO generally includes items such as maintenance of existing software systems, licensing or maintenance fees for using off-the-shelf software and hardware, network and telecom costs, and cloud computing charges. These are the costs of actually doing the company’s work—a good thing! Existing systems account for the company’s current revenue and operations—they run the company day-to-day. Paying for them is a joyous thing; we know they work (the company is running today) and we can use them as a springboard for new capabilities.

It’s true that many of the items in this bucket we would like to manage downward for a given level of operating capability. But in my experience, “maintaining” software often turns out to be making changes to its capabilities—enhancements, improvements, or changes to keep up with changes in the business. These changes are really innovation work, and are in fact some of its most cost-effective sources, since the enterprise builds them incrementally on an existing system rather than having to create a new one from scratch.

In truth, there is no such thing as “maintaining” a piece of software. After you buy a new car, you must continue to spend money so it continues to function as it did when you bought it. But software continues to function as purchased even without maintenance spending. The problem is we don’t want the software to continue to function as it did when purchased; we want it to change as the enterprise changes. The costs for those changes are often in the KTLO bucket, although they deserve to be in the innovation bucket.

The belief that we should maximize the non-KTLO percentage comes from thinking that KTLO spending is non-discretionary, something the company is stuck with. But that is a sunk cost fallacy. That we have bought a piece of software doesn’t mean we have to keep paying licensing or maintenance or support fees for it. We can stop using it. The decision to continue spending on it is a positive affirmation of its value to the company—or spending that should be discontinued. As in many cases it should be.§

Not only do we often build unnecessary features but we also over-constrain solutions. In our requirements documents we used to say things like, “The system must be available 99.3% of the time,” or “Response time must be less than 2.4 seconds.” Or we established service level agreements (SLAs) with the IT department. Yes, if we’re contracting work out, we need to have criteria for non-performance. But when we’re working with an internal IT organization, the calculus is different.

In theory, the 99.3% availability requirement is calculated in a business case showing how much it’ll cost the company if the system is down more than 0.7% of the time—how much business will be lost, how far operations will be set back. Even if we could have confidence in this calculation, it’s missing an important component: the marginal cost. The relevant calculation is the marginal cost for achieving 99.3% availability over whatever level we have versus the benefit. In other words, if we get to 99.2% and discover that it’ll cost an additional $10 million to get to 99.3%, is it worth it? And what if it only costs an additional penny to get it to 99.99%? Shouldn’t we “require” that?

When we’re planning an initiative it’s virtually impossible to calculate marginal costs in advance. Does anyone really know exactly what will be needed to achieve 99.3% availability rather than 99.2% or 99.4%? It’s only through continuous feedback and adaptation that we can make a good decision on marginal costs and benefits.

Cloud infrastructure is an interesting case for managing costs. If you’re not working in the cloud, you have to buy a fixed amount of infrastructure based on your projections of how much usage your system will get. But your usage will vary over time—all the more so if we’re talking about an internet-facing site. Tax software gets used a lot in tax season, but rarely otherwise.3 Fitness apps are used a lot in January, after New Year’s resolutions.** To avoid a fiasco like Healthcare.gov,†† you have to buy the amount of infrastructure needed to handle your peak usage, plus some. Most of that infrastructure will be underused most of the time, and there will still be a risk of larger peaks than you’ve planned for.

In the cloud, on the other hand, you pay for exactly the amount of infrastructure you need at any given moment. If your employees go home at night, you can turn off some of the servers that were running during the day and stop paying for them. If your software is only used during tax season, then you can turn off servers during the rest of the year.

This is optimal from the cost standpoint, but seemingly less predictable from the budget standpoint. It isn’t really, because you can set spending limits in the cloud. But what if you’ve reached your budget and suddenly get an unexpected surge in usage? What if your digital service is doing better in the marketplace than you had budgeted for? Do you want to stick to your budget and have the system crash, or exceed your budget and serve the customers? Probably the latter.

The unpredictability doesn’t really come from using the cloud, but from the unpredictability of the market itself. At least you have the choice to respond to that unpredictability, whereas if you had bought fixed infrastructure, there would be no practical way to quickly expand it. Spending more on cloud infrastructure often just means that you’re successful with your product. Although there has been a lot of talk about how the cloud turns capital expenditures (CAPEX) into operational expenditures (OPEX), the more interesting point may be that it turns a fixed cost into a variable cost. When paying by the drink in the cloud, you don’t know how many drinks you’ll be taking. That is a good thing, as you want to be free to decide—responsibly, of course.

The theme here is making incremental or marginal spending decisions. With an Agile project, you can make a decision at any time to continue past the originally planned cost if there are still value-adding features to be built. Or you can decide not to finish a planned scope of work if the marginal value of remaining work is too low. With staged investments, you can determine whether a marginal commitment is worthwhile. With cloud infrastructure, you can decide to increase or decrease infrastructure from where it is at any given moment, depending on which direction has marginal value. This is business agility: most of the interesting decisions are made at the margins.

Budgeting and Planning

Considering the IT budget independently from the rest of the business can lead to inefficiency. If IT’s budget is constrained so it can’t make a one hundred dollar investment that will reduce marketing’s costs by twice that amount, or if IT is constrained so that the internals of a system aren’t as flexible as they should be, then business value is destroyed. Once we agree that IT is no longer just a simple cost of doing business, its interrelationships with other parts of the enterprise become important in choosing spending levels.

How should you budget for IT? One way is to base your budget on empirical data, where historical spending levels represent the company’s actual lived experience. Let’s say that the company funded twenty cross-functional delivery teams last year, and they were allocated among its work streams. Based on last year’s experience, you know approximately what that amount of capacity can accomplish. Given your best guesses and the company’s objectives this year, do you need more capacity? You can make your best estimate and finance to that level, then IT can allocate its teams to work streams.

Sometimes it may be possible to “do more with less,” but an empirical, data-based approach would suggest that you can do “the same amount with the same amount.” That said, IT and the rest of the business should constantly be looking for ways to make processes leaner and negotiate better prices with suppliers.

I’m a little uncomfortable with the idea of setting capacity, then managing demand for it, because I have trouble with IT constantly going to the rest of the business and saying, “Sorry, we can’t do everything you need. You’ll have to prioritize.” I hear this often and it doesn’t feel right to me. The organization should want what it has the capacity to do, and IT must generally be able to meet all of the company’s needs. I suspect this really is an artifact of a model where IT provides customer service to competing parts of the enterprise.

In Chapter 4 I proposed that IT’s efforts should be cascaded from high-level, enterprise-wide objectives. Demand for IT services should, in that case, be more or less what the enterprise decided at the outset it was going to try to accomplish and what it budgeted for. Excess demand would imply that managers are asking for things that aren’t necessary for meeting the objectives. If the enterprise doesn’t budget enough for IT to support its objectives, that’s a failure of leadership and the budgeting process.

I’ve also pointed out that when it comes to financing IT projects, traditional project management makes it unlikely that an initiative will stay within its planned costs. That’s because projects start with an immutable set of requirements and continue spending until the requirements are complete. If we really care about budget, we must hold the budget fixed and vary the requirements. The case for doing so are especially strong because we know that some of the requirements are less valuable than others, and some have no value at all. After all, that’s the way we treat budgets in most corners of the organization—if we don’t have enough money to do all the things we want to do, we just do less of them. But IT projects have been treated differently, because we have assumed that requirements are . . . required.

A vicious circle can occur in the annual planning process, as Boston Consulting Group (BCG) has pointed out.6 The uncertainty in the environment sometimes leads enterprises to plan more carefully, especially if plans have gone awry in the past. As a result, they start the planning process earlier. But forecasting earlier just extends the period they must forecast for, which increases the amount of uncertainty, which makes the plan less accurate, which causes them to spend even more time on planning the following year. Once the volatility of the digital world sets this dynamic in motion, the costs of planning increase and the plans become less useful.

Instead, BCG says “the overall focus for planning must move away from precise forecasting and toward more strategic, top-down ambition-setting that is validated with bottom-up business insight.”7 They suggest that companies plan in less detail so they can retain more flexibility, set targets top-down, and remain nimble and adaptable.

An annual budgeting process necessarily limits agility during the year. In Implementing Beyond Budgeting, Bjarte Bogsnes explains the alternative planning practices he implemented at companies such as Equinor, Scandinavia’s largest company. The problem with budgets, he says, is that they’re entirely focused on what you put in, not what you get back.8 In other words, the budgeting process puts a cap on what a division can spend, without reference to what return it would get from any marginal spend—which could be substantial. “What we want is not necessarily the lowest possible cost level,” he says. “What we want is the optimal cost level, the one that maximizes value creation.”9

The problem isn’t just that budgets prevent managers from spending when they should, but that they also allow managers to keep spending until their budget is exhausted, whether the spending is worthwhile or not. The budget number is viewed as an entitlement, and it’s rare for a manager to “return” unused money, especially since the unspent money will probably be taken out of the following year’s budget.

The Beyond Budgeting movement, whose ideas are practiced worldwide by companies large and small across a range of industries, recommends a number of ways to make the budgeting process more agile, including planning on shorter horizons, doing rolling planning through the course of the year, and fostering extreme transparency into budgets and spending. The idea is to give the company agility to deal with uncertainty and change, while at the same time setting motivating targets, providing good forecasts, and effectively allocating scarce resources.10

I’ve pointed out the persistent confusion in IT between estimates and commitments, or between forecasts and execution plans. Fundamentally, a budget is a way of turning forecasts into targets. In an environment of uncertainty, we have to ask, “How reliable are our forecasts that reach eighteen months into the future?”

Eric Ries offers an amusing account of forecasting at a startup:

I found out that some investors actually believed the forecast. They would even try to use it as a tool of accountability—just like Alfred Sloan. If a startup failed to match the numbers in the original business plan, the investors would take this as a sign of poor execution. As an entrepreneur, I found this reaction baffling. Didn’t they know that those numbers were entirely made up?11

It might be an exaggeration to say that the numbers are made up, but his point is that what good investors look for in a startup is not meeting forecasted targets, which are created under conditions of great uncertainty, but adapting well to market conditions. A startup is well managed if it conducts good tests and adapts to what it learns; it’s a good investment if it proves that there is customer value and a ready market.

We try to use budgets to control spending based on our forecasts. But spending in the digital world isn’t always amenable to that sort of control; it should instead be determined by actual, evolving circumstances. Control can still be established through continuous transparency and adaptation. You can’t have it both ways—encouraging employees to innovate and adapt while also asking them to make a plan and stick to it. Clayton Christensen, in The Innovator’s Dilemma, explains that this is why enterprises miss out on innovation opportunities:

Companies whose investment processes demand quantification of market sizes and financial returns before they can enter a market get paralyzed or make serious mistakes when faced with disruptive technologies. They demand market data when none exists and make judgments based upon financial projections when neither revenues or costs can, in fact, be known. Using planning and marketing techniques that were developed to manage sustaining technologies in the very different context of disruptive ones is an exercise in flapping wings.12

CAPEX and OPEX

In that old world, we thought of an IT system as a product—something you either built or bought, and rolled out for use as a whole. It fit neatly into our mental model of what a capital asset looked like. But today we build systems incrementally and put each piece into use as it’s completed, possibly on the order of a hundred times a day. Determining when a system is “finished” has also become difficult; IT systems continue to evolve throughout their lifetimes, as we maintain a continuous backlog of further work and continue drawing from it. In theory, an IT system can survive forever—if the company keeps making changes to it, replacing it piece by piece, changing its functionality as the business changes, and changing its internals as necessary to fight entropy.

It is becoming less and less clear what that asset is that we are capitalizing. Its boundaries are now difficult to identify, since IT systems are increasingly made by combining small components—called microservices—each of which is reusable and forms part of other systems as well. Its infrastructure can be difficult to identify since it may be ephemeral and exists only in the cloud where it is obtained on demand from a cloud provider and can be changed, supplemented, reduced, or disposed of as the company’s needs change. It might, moreover, consist of services provided by the cloud, also on an on-demand basis, such as artificial intelligence, analytics, or even call center capabilities.

All of these changes have occurred because they are improvements. That systems are amorphous and constantly changing helps ensure they support the business as it evolves. The ability to consume infrastructure and services on demand reduces costs and risks. Breaking down systems into microservices saves money, increases reliability, and speeds time to market. And the fact that systems are delivered incrementally and continuously reduces the risk inherent in large projects and also lets the company harvest the value of IT work more quickly.

But it does complicate the picture from an accounting point of view, doesn’t it? Our old model of capitalizing systems for internal use or external sale fits awkwardly with these new developments. We have always expensed our costs for establishing feasibility of a new system, capitalized the costs of building or buying it, then depreciated that asset and expensed the maintenance costs associated with it. How does that fit with today’s IT practices?

I’m not an accountant and cannot offer any accounting guidance. I can only wonder about how these developments will affect the accounting treatment of IT costs. Some questions suggest themselves:

Measuring Success

If we’re no longer assessing our IT department by its cost and schedule adherence, cost reduction, IT budget as a percentage of revenue, customer satisfaction, percentage of KTLO spending, or any of the other metrics I’ve rejected throughout this book, then how do we assess it?

Primarily, IT’s success is measured by that of the company. Since IT is meant to support business initiatives and operations, it’s successful to the extent that it supports them well. If business objectives are cascaded down to multi-functional teams that include both business and IT people—as I’ve suggested—then it’s the success of these teams that really matters. That’s hard to break down into its IT component. And since IT is responsible for maintaining the agility of its systems and processes so the company can be agile in the future, it’s the long-term success of the company that reflects IT’s performance.

When we look at improving IT processes, the important metric to focus on is lead time from concept to delivery. Notice that much of that is external to IT or at the boundary where IT meets the rest of the business. Then again, why should that matter, since it’s internal to the business as a whole and IT is an integral part of the business?

Lead time is a very important metric because reducing it:

Lead time tells us how efficient a company is at processing each requirement once it’s identified. It’s a replacement for, and an improvement over, on-time delivery as a metric. It applies not just to software, but to IT capability delivery in general—for example, provisioning a laptop for a new employee, resetting a forgotten password, or coming to the rescue when audiovisual equipment isn’t working in a conference room.

Lead time is speed, and speed is essential in an environment of uncertainty.

The IT performance construct created by DevOps Research and Assessment (DORA) is an excellent way to assess IT performance, especially since DORA showed it to predict business success. Their construct includes metrics that are easily measurable, very much actionable, and relevant to success in the digital environment. DORA’s software delivery and operation (SDO) construct includes:

Availability—the percentage of time IT systems are operational and available to be used—is another important metric studied by DORA in its latest report. Availability standards are now very high, as customers and employees have become used to the “always on” availability of services from such providers as Amazon and Google.

Other metrics could be useful but may be more difficult and more expensive to measure. Escaped defects are bugs that make their way into production and affect users. Since code is vetted by automated tests throughout its lifespan, only its undetected defects pose a problem; in any case, they’re the ones that result in costly rework.

Process agility is a more abstract performance metric. If the company decided to stop producing bobbleheads and start producing toenail clippers tomorrow, how quickly could IT adjust? If IT is working on project A when project B suddenly becomes a higher priority, how quickly could it change focus and how much waste would there be?

Then there is the amount of technical debt—how much gunk there is in the IT asset that will slow down future work.

These metrics serve as a basis for continuous improvement. They’re not intended for assessing any one person’s performance or that of a particular group. For one thing, there is no correct value for any of the metrics. For another, they can be gamed if used for performance measurement. For example, the number of deployments per day can be increased by deploying increasingly smaller changes (which, fortunately, is a desirable behavior anyway!). No, there are too many interacting, determinant factors for them to be useful in attempting to allocate blame or award praise.

How then can you measure your IT organization’s productivity, or that of, say, a given software developer within that organization? In the old waterfall contractor-control model, on-time performance was considered a productivity measure. I hope I’ve convinced you that it is not—if anything, it’s a measure of how stubborn the technologists were in sticking to the original plan and how much padding existed in the original estimates. But Agile and DevOps approaches seem to remove even that as a way to gauge productivity.

Jim Highsmith is on target when he says, “Productivity measures in general make little sense in knowledge work.”13 Designing good software and infrastructure, solving business problems, and creating IT strategies are knowledge work.

That doesn’t mean they can’t be done poorly or too slowly. Highsmith’s comment aside, how do you measure the performance of other knowledge workers in the enterprise? In fact, how do you measure the performance of most employees? Generally, it’s not by measuring delivery against a pre-planned schedule. Managers evaluate the performance of their employees by staying close enough to their activities to observe their productivity, quality of work, quality of communications, and so on. The same holds in IT—managers manage their employees.

Ultimately, the performance of the IT organization is reflected in a company’s business results. This should be what motivates and incentivizes the employees of IT. If it isn’t, then any improvement in lead times, escaped defects, availability, and other measures I listed are beside the point; even if these show a good trend, the organization won’t become more successful. Notice—it’s business results that should motivate IT, not any internal IT metrics. You could almost take that to be the definition of a digital enterprise.

* I made up that number, but it is consistent with my intuitions and seems fair to others in the government I have mentioned it to.

Based not just on the Standish study, which I have my doubts about, but on the Microsoft study that I referenced in Chapter 1 that showed that one third of ideas don’t affect the metric they were intended to improve, one third make it worse, and only one third succeed.

According to Peter Weill and Jeanne W. Ross, the average KTLO spend in 2007 was 71% of the IT budget.2

§ For the philosophy geeks out there, you might say that the enterprise continuously chooses what it wants to be, IT-wise. See Sartre, Being and Nothingness, and take the sunk cost fallacy as analogous to Sartre’s idea of bad faith.

I’m not saying don’t do it, because many people I respect are practitioners of the ITIL framework, which considers service level agreements (SLAs) between the business and IT to be important. They are wrong, but I still respect them.

** Yes, really. Ask UnderArmour about the use of their Fitness Connect mobile app, for example.4

†† The Obama Administration’s signature health insurance initiative was set back after its launch in October 2013 when usage was five times the expected volume.5