Figure 6.1 visualizes the production chain of application service delivered to end users: software is hosted on physical infrastructure which is powered by electricity; the application service is monitored by application service provider's OAMP systems and staff; and the resulting service is delivered to end users across the Internet.
Figure 6.2 highlights how application service providers are challenged to have sufficient inventory of service capacity online to meet instantaneous customer demand without wasting money on carrying excess inventory of compute, memory, storage, and infrastructure hosting unneeded online service capacity. Note the similarities of the application capacity management problem to the newsvendor problem1: instead of deciding how many of each day's newspapers to purchase, the application service provider must decide how much application capacity to hold online for the next few minutes.
This chapter analyzes application capacity management as an inventory management problem to gain deeper understanding of the fundamental business, operational, and technical considerations in three parts:
Figure 6.1 visualizes application service production chain. Walking “up” the service production chain from end user consumption shows the following links:
End users generally prefer applications that are instantly available on demand (e.g., like electric light available at the flip of a switch) rather than, say, pizza that they expect to wait 15 to 30 minutes for. After all, how many seconds will you wait for a webpage to load before simply surfing to some competitor's site? Characteristics of traditional application deployment models led application service providers to adopt a supply-driven model in which they would initially deploy a large supply (think inventory) of application capacity and wait for demand to materialize. This supply-driven (a.k.a., “push”) deployment model was often sensible because:
Thus, application service providers would often engineer to peak forecast demand, plus a safety margin, and then focus on minimizing the cost per unit of service capacity for the peak plus safety margin capacity. After having brought all of that application capacity into service, the application service provider would often unsuccessfully struggle to generate customer demand to utilize all the deployed service capacity. Traditional application deployment on dedicated hardware often made it commercially impractical to salvage excess capacity and repurpose it for some other application, so once installed, excess application capacity was often stranded as a sunk cost to the application service provider. However, if the application service provider underestimated application demand and thus deployed too little traditional capacity, then the lead time of weeks or months to grow capacity meant that unforecast demand for an application beyond deployed capacity had to be made to wait or be turned away outright, thereby creating an opportunity for competitors to seize market share.
Thus, application service providers would routinely make a high-stake business gamble about the level of demand for an application several months or quarters in advance and then invest to deploy that much capacity (often plus a safety margin). If they guessed too low then they risked turning away business and market share; if they guessed too high then they would be carrying capacity that would never generate revenue. Application service providers thus had the traditional supply-driven inventory problem: they piled capacity high so they could sell it cheap, but if customer demand did not materialize then they were left with a potentially huge pile of unsold inventory to essentially scrap. As a result, application service providers were more reluctant to deploy new and unproven services and applications because the upfront investments in capacity might be lost if the service was not successful.
The essential characteristics of cloud computing largely nullify the factors from the previous section that traditionally drove application service providers to take a supply-driven approach to their inventory of service capacity. Consider those factors one at a time:
After discarding these historic assumptions, application service providers can abandon the capital intensive and commercially risky supply-driven capacity/inventory deployment model in favor of a more agile demand-driven capacity/inventory model. Rather than engineering for a peak long-term forecast demand plus a safety margin, a demand-driven model engineers capacity for near-term forecast of cyclical demand, plus a safety margin for random variations, forecasting errors, capacity fulfillment issues, and other contingencies. As cyclical (e.g., daily, weekly, monthly) application demand grows (and eventually shrinks), the application's online capacity grows (and eventually shrinks) with it. Instead of focusing on the lowest cost per unit of application capacity in inventory (which may or may not ever sell), demand-driven capacity deployment focuses on the lowest cost per user actually served.
An application service provider's revenue is generally tied to the user service demand that is actually fulfilled and charged for. By shifting the application service provider's costs of production from a largely fixed capacity-driven approach to a usage-based, demand-driven model de-risks the application service provider's business case by more closely aligning their costs with their revenues. As cloud infrastructure providers enable demand-driven capacity management of the compute, memory, storage, and networking infrastructure – as well as the physical data centers that host that equipment – that supports application services, one naturally considers driving other costs of service production to also track with service demand:
Transforming an application service business from a traditional capacity-driven model to an agile demand-driven model requires a significant shift in the application service provider's policies, practices, and business models. The general nature of this transformation can be understood by considering the parallels between running a retail store and a demand-driven application service business. Instead of delivering application service from cloud data centers over the Internet to end users, imagine that the application service provider is offering some products to customers out of retail stores.
A retailer places stores in optimal locations to serve their target market and then must assure that sufficient inventory is stocked in each store to serve all customer demand (with service level probability, see Section 6.4.3: Service Level). Effectively, the application capacity management problem can be viewed as an inventory management problem, albeit inventory management of a regenerative asset like a hotel room rather than a consumable inventory item like a sweater on a store shelf. To illustrate this point, consider how the following inventory management statements apply equally well to both management of trendy retail stores and application capacity management. The square brackets “[]” give formal definitions of inventory management terminology from Wiley.
Figure 6.3 illustrates the parallel of inventory management by replacing “capacity decision and planning” processes from the canonical capacity management diagram of Figure 2.4 with “inventory management process” and “configuration change process” with suppliers and distributors who fulfill inventory orders. The inputs that the inventory management process uses are slightly different from capacity management decision and planning processes. Both current, historic, and forecast demand, and policies and pricing remain relevant inputs; but instead of resource usage and alarms, the inventory management processes consider their current inventory position.
This analogy is explored in the following sections:
The notion of locational marginal value was introduced in Section 5.8: Location of Production Considerations. Different businesses have different locational marginal value sensitivity. For example, coffee shops and convenience stores have high locational sensitivity as few customers will travel very far for those offerings; amusement parks have lower sensitivity to locational marginal value as many parents and children will willingly (if reluctantly) endure a long ride to a big amusement park. Likewise, highly interactive applications with strict real-time responsiveness expectations place a high value in being located physically close to end users, while batch-oriented and offline applications are far less sensitive to the geographic separation between end users and the data center that hosts the application instance that serves them.
Inventory or stock is often considered either:
Demand is defined by Wiley as “the amount of materials wanted by customers,” demand is often called offered workload in the context of application elasticity. Figure 6.4 visualizes cycle stock, safety stock, capacity, and demand for a regenerative asset, like online application capacity.
Service level is defined by Wiley as “a measure of the proportion of customer demand met from stock (or some equivalent measure).” Traditionally, service level is one minus the probability that a users' demand will not be promptly served due to stock out.
In the context of application capacity, service level is related to service accessibility (see Section 1.3.1: Application Service Quality) which is the probability that an attempt to access a service will be successful because sufficient application capacity is available online, as opposed to the application instance being in overload or otherwise unavailable for service. Note that the reliability, latency, and overall quality of the service delivered by the application to the end user is a performance management concern rather than a capacity management concern; after all, if the application and allocated resources are delivering poor service quality, then that is a performance (or service assurance) problem. The root cause of a performance problem could be faulty capacity management, but the service quality problem would be diagnosed via performance or fault management mechanisms rather than via capacity decision, planning, or fulfillment mechanisms.
Cost models for physical infrastructure are mature and well understood to include some or all of the following costs:
These prices charged by infrastructure service providers might be discounted based on resource reservations/commitments, purchase volume, time-of-day, day-of-week, or other factors. Note that the cloud business is still immature and actual infrastructure pricing models (e.g., what transactions and resources are separately billed and what are included, and thus “free” to the application service provider) are likely to evolve over time. While public cloud service providers necessarily offer transparent market-based pricing, the cost and pricing models for private cloud infrastructure are not well understood or mature yet.
In addition to direct costs or chargebacks to infrastructure service supplier(s), application service providers may also be subject to indirect costs:
Application service provider's overall objective is similar to what the electric power industry calls economic dispatch, which essentially maps to: operation of application instances to deliver acceptable quality service to end users at the lowest cost, recognizing any operational limits of software, infrastructure, or networking facilities.
Inventory management is a critical factor for a retailer, just as capacity management is a critical factor for an application service provider. Critical order fulfillment characteristics that impact inventory decision and planning for both physical inventory and application capacity are:
There are a number of inventory management models to decide timing and size of orders ranging from ad hoc or rule-of-thumb approaches to more sophisticated safety stock and scientific methods.
Lead time is defined by Wiley as “the total time between ordering materials and having them delivered and available for use.” Lead time for elastic application capacity growth has four primary components:
The nature of the service provided by the elastically grown component will determine how quickly new capacity can be engaged with user traffic; while components offering stateless, connectionless service can be engaged nearly instantly, components offering stateful and/or connection-oriented service often ramp up with new sessions and service requests rather than engaging pre-existing user sessions/service requests.
The lead time to start up a new application instance is likely to be materially longer from the lead time to add capacity to a preexisting application instance, because more components must be started, more thorough testing is generally prudent, and more elaborate synchronization is required to assure that the application instance is fully operational before user traffic can be safely applied. Lead time to instantiate release updates, upgrades, and retrofits can be materially longer than for ordinary application instance start-up because application configuration and user data might require conversion or migration processing.
Both shrinking application capacity and gracefully shutting down an online application instance also take time to complete, but the time is not in the critical path of user service delivery.
The importance of order completeness is easily understood in the context of traditional manufacturing: an automobile needs both a driver's side rear view mirror and a passenger's side rearview mirror, and one cannot finish building a car without both mirrors. A supplier delivering a partial order with only driver's side mirrors today probably does not meet the automobile manufacturer's business needs because the manufacturer needs a complete order with both passenger's and driver's side mirrors to build cars.
Order completeness has a direct analogy to application capacity management in that complex applications may require that several components be grown simultaneously to increase service capacity, such as growing both processing capacity to directly serve a user and storage capacity to host the users' volatile and persistent work products; providing processing capacity without necessary storage or storage capacity with necessary processing capacity does not permit acceptable service to be delivered to users, so it does not meet the application service provider's business needs.
Infrastructure capacity change orders will occasionally fail outright such as due to resources delivered by the infrastructure service provider being dead on arrival or otherwise inoperable. Thus, application service provider elasticity decision and planning processes must be prepared to mitigate occasional failures of requested capacity change orders. As the industry matures the reliability of capacity fulfillment actions is likely to improve significantly.
Store owners often strive for an agile portfolio of products where new items (think stock keeping units or SKUs) are trialed with limited inventory. If customer demand materializes, then demand-driven inventory management will increase the stock to serve that demand. If customer demand fails to materialize, then the residual inventory will be disposed of and another new product offering will replace it on the shelf.
Agility for a product retailer is inherently simpler than agility for a service provider because while a retailer must source and stock a new product from a manufacturer or distributor, a service provider often needs to develop, integrate, and test a new service before offering it to end users. Sophisticated application service providers are likely to strive for similar service agility in trialing new service variants and options. Agile development and delivery processes are key to enabling service agility, but elastic scalability of resource capacity is a key enabler. Capacity for popular service variants and options will quickly be ramped up to serve increasing demand; capacity for unpopular offerings will be retired.
Some innovative retailers have succeeded by changing buying patterns of their customers, such as how “big box” retailers enticed customers to purchase previously unheard of quantities of products to gain greater discounts (e.g., 24 rolls of toilet paper or 12 rolls of paper towels in a single retail package). Rapidly elastic capacity undoubtedly creates new opportunities to change – and hopefully expand – users' patterns of service demand, but those innovations are beyond the scope of this paper.