Demand management enables peaks in demand to be smoothed by shifting the pattern of consumption, such as shifting airline passengers from an overbooked flight to a later flight that has available seats. Demand management techniques enable resource utilization of cloud infrastructure equipment to be increased significantly beyond what is practical with server virtualization alone. For example, a vertically integrated service provider who owns and operates both many applications and the underlying physical compute, memory, storage, and network infrastructure can minimize aggregate capital costs and operational expenses by strategically smoothing aggregate demand to enable higher utilization of a smaller pool of resources than would otherwise be possible.
As explained in Section 1.5: Demand Variability, application workloads often have both cyclical patterns of demand and random variations of demand. Applications that directly serve human users often have cyclical demand patterns tied to human cycles of sleep, work, travel, and leisure, while batch-oriented and machine-to-machine applications are often not necessarily tied to human cycles of demand. Different types of applications have different tolerances to random infrastructure service variations as well. For example, real-time interactive applications like conversational voice/video have very strict resource scheduling requirements to assure bearer traffic is delivered isochronously; in contrast, software backup applications are far more tolerant of occasional resource stalls and curtailment. By intelligently shaping demand of many individual applications, aggregate infrastructure demand can be smoothed to achieve significantly higher physical resource utilization than what is possible with virtualization technologies alone.
A key business challenge is to appropriately balance the infrastructure service provider's benefit from demand management (i.e., higher resource utilization of less capital equipment) against the inconvenience and trouble imposed on virtual resource consumers (i.e., application service providers) who have their patterns of service use altered. Thus, the lean goal of sustainably achieving the shortest lead time, best quality and value, and highest customer delight at the lowest cost is applicable to demand management as it is to capacity management. Demand management is considered in several sections:
As shown in Figure 7.1, infrastructure service providers have a range of techniques to regulate capacity in order to try and smooth demand variations from microseconds to seconds to minutes to hours to days to months. Many of these techniques need to be very carefully considered before applying because:
Infrastructure service providers have a range of techniques, so smooth demand variations from microseconds to seconds to minutes to hours to days to months:
Note that the response timeframe of demand management techniques dictates whether they are largely automatic or largely human-driven. Techniques that operate over seconds or less (e.g., resource scheduling and curtailment) must operate automatically based on preconfigured policies. Demand management techniques that operate across hours or days (e.g., resource pricing, maintenance scheduling, voluntary demand shaping) often rely on human decisions, and thus have a slower response time.
Time shared operating systems have been time slice multiplexing access for multiple applications onto finite physical hardware via context switching for decades. Each application gets slices of time, and assuming that the scheduling is prompt, the end user seldom noticed that their application is actually sharing finite hardware resources with several other independent application instances. Queuing and buffering infrastructure requests like network packets to be transmitted to another entity or processed as input is a mechanism for enabling more efficient resource scheduling. Time shared operating systems and virtualization technologies rely on this technique.
When demand outstrips supply, service providers often curtail service delivery until supply catches up to demand or demand declines. Some services curtail delivery based on technical rather than policy factors, like activating rate limiting mechanisms that slow service delivery during periods of congestion or when a customer's service usage exceeds some threshold. Other services curtail resources based on policy decisions, like a supplier allocating a greater share of limited inventory to their best customers.
Managed resource curtailment policies often take the form of different grades of service (GoS) in which one class of consumers (e.g., a supplier's “best customers”) might be treated differently from others. Different applications have different sensitivities to resource curtailment, and thus different application service providers accrue different costs for resource curtailment. For example, interactive real-time communications service quality is far more sensitive to bursts of packet loss than offline or batch applications like backup or distribution of software updates which can tolerate additional latency required to timeout and retransmit lost packets. Thus, an application service provider offering interactive real-time communications is likely to value minimal packet loss (i.e., dropping packets is a type of resource curtailment) more than a provider of an offline or batch application that is less sensitive to resource curtailment like packet loss. The ICT industry often uses the notion of GoS to differentiate the relative sensitivities of application demands to resource curtailment; “high” GoS are assured minimal service curtailment at the expense of lower GoS which endure greater curtailment. By charging higher prices for higher GoS, infrastructure service providers can differentiate applications that can technically and commercially accept resource curtailment when necessary from applications that cannot tolerate resource curtailment. When appropriate prices are set for both non-curtailable and curtailable GoS, and cloud infrastructure reliably assures full resource deliver to non-curtailable resources and bounded resource curtailment of other GoS, then all parties can efficiently maximize their business value.
We shall call demand management actions that are taken unilaterally by the infrastructure service provider without explicit prior consent of the impacted application service provider mandatory. While mandatory demand shaping mechanisms are inherently faster and more predictable for the infrastructure service provider than voluntary demand shaping actions, they can negatively impact application service providers' satisfaction because mandatory mechanisms coerce their applications' compliance and may not provide sufficient lead time to gracefully reduce their demand. The fundamental challenge with mandatory demand shaping mechanisms is assuring that application user service is not materially impacted when demand shaping is activated. Applications that cannot support impromptu (from the application's perspective) activation of the infrastructure service provider's demand management mechanism should not be coerced with mandatory demand management under normal circumstances.
Beyond resource curtailment, mandatory demand shaping actions that an infrastructure service provider can take fall into several broad categories:
Service providers sometimes request customers to voluntarily reduce service demand, such as:
There is inherently a lag time with voluntary demand shaping mechanisms because after the service provider decides to request voluntary demand shaping action, the following actions must occur:
Thus, actual timing, magnitude, and shape of voluntary demand shaping actions are inherently unpredictable.
Infrastructure service providers can schedule planned maintenance events during off-peak periods to reduce demand during busy periods or execute repairs or capacity growth actions on an emergency basis to maximize operational capacity.
Resource pricing can influence patterns of demand over the longer term. Stronger pricing signals – such as deeper discounts for resources in off-peak periods – will often shape demand more dramatically than weaker price signals.
As shown in Figure 7.2, application demand management actions operate over several time horizons from microseconds to seconds, minutes to hours, days to months:
Applications use queues and buffers to smooth out random application workload variations at the finest time scales.
Load balancer components can implement a broad range of policies ranging from simple round-robin workload distribution to dynamic and adaptive workload distribution based on policies, measurements, and other factors. Load balancers can intelligently shift workloads away from components with lower performance or higher latency (e.g., because infrastructure resources have been explicitly or implicitly curtailed to those components) to optimally balance the workload to naturally mitigate some level of throughput or performance variation across fungible instances in a load balanced pool of components.
Inevitably, user demand will occasionally exceed an application's online capacity, meaning that the full demand cannot be served with acceptable quality; this condition is called overload. Well-engineered applications will automatically detect overload conditions and engage congestion control mechanisms to:
Congestion control mechanisms may:
Application service providers can execute explicit demand management action such as:
In extreme circumstances application service providers can impose service quotas or other restrictions on service usage and not accept new customers.
Application service providers often have some flexibility in scheduling software release management actions (e.g., patching, upgrade, update) and trials of prototype and new application services and releases. As each release instance carries resource overhead that runs alongside the production instance, these release management actions create additional infrastructure demand that can often be rescheduled for the convenience of the infrastructure service provider. Some scheduled maintenance actions like software release management or preventive maintenance of infrastructure servers will remove target infrastructure capacity from service for at least a portion of the scheduled maintenance period. Thus, non-emergency infrastructure maintenance activities are routinely scheduled for off-peak periods.
If application service providers offer discounts to end users that are aligned with the infrastructure service provider's pricing discounts, then some end users will voluntarily alter their demand patterns.
To smooth aggregate infrastructure demand for an infrastructure service provider, one must balance the costs of deploying and activating demand management mechanisms, including potential loss of customer good will if their quality of experience is materially impacted, against the larger organization's savings for deploying and operating less infrastructure equipment. Ideally, lean cloud capacity management is a win/win/win in that:
More efficient operation by both infrastructure service provider and application service providers lowers their costs, and some of those savings can be shared with stakeholders via lower costs.
Figure 7.3 offers a basic methodology to achieve win/win/win with lean demand management:
Demand planning factors to consider for each application workload in the organization's portfolio include:
Different applications will have different sensitivities or tolerance to different infrastructure demand management strategies. Before one can determine the optimal demand management strategy, one needs to understand how tolerant each application is to different infrastructure demand management strategies. This should enable the infrastructure service provider to identify the cheapest and best demand management actions. Just as cyclical and random patterns of demand vary over a broad range of time scales, applications have different tolerances to demand management techniques that work across different time frames:
Robust applications are designed to automatically detect and mitigate failure scenarios, including overload and component failures. These robustness mechanisms are likely to offer some resilience when confronted with aggressive infrastructure demand management actions.
A perfect infrastructure pricing model is one that simultaneously offers attractive prices to application service provider organizations and gives the infrastructure service provider sufficient demand management flexibility to smooth aggregate resource demand while delivering acceptable service quality to customers while minimizing overall costs to all organizations. Thus, infrastructure pricing should be set so that application demand that is more flexible and manageable get greater discounts, while application components with the strictest real-time resource service needs to pay full price during peak usage periods. Ideally, the infrastructure pricing model motivates both application service providers and infrastructure service providers to squeeze non-value-added and wasteful activities out of the end-to-end process and both parties share the savings.
Operationally, the infrastructure service provider may have a small number of GoS for resources (e.g., virtual machine or Linux container instances) such as:
In private cloud or other situations where fine grained pricing models may not be practical, infrastructure service provider diktat can certainly coerce aggressive demand management practices. However, less coercive arrangements in which application service providers benefit somehow from proactively smoothing their cyclical pattern of demand and offering the infrastructure service provider some degree of direct (e.g., mandatory) or indirect (e.g., voluntary) on-the-fly demand management are more appropriate in the win:win partnership across the service delivery chain that lean strives for. For example, quality of service settings for virtualized compute, networking, and storage could be tied to support of demand management techniques: application service provider organizations that reject even voluntary demand management mechanisms might be assigned a lower grade of service.
Infrastructure service providers can deploy sufficient physical infrastructure to serve peak cyclical demand in the infrastructure capacity lead time interval, plus a margin of safety capacity. The infrastructure service provider's unit commitment process assures that sufficient infrastructure capacity is online to serve real-time demand (see Chapter 9: Lean Infrastructure Commitment). If resource demand approaches or exceeds online or physical capacity, then appropriate resource scheduling, curtailment, mandatory and/or voluntary demand management actions are activated. In the case of voluntary demand management actions, appropriate status information is pushed to application service provider's management systems and staff to alter workload placement policies or take other actions. If voluntary demand management actions are insufficient, then mandatory demand management mechanisms can be activated.